liu.seSearch for publications in DiVA
Change search
Refine search result
1234567 51 - 100 of 362
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 51.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Optimizing Xilinx designs through primitive instantiation2010In: FPGAworld '10 Proceedings of the 7th FPGAworld Conference, New York: ACM , 2010, p. 20-27Conference paper (Refereed)
    Abstract [en]

    This paper is intended as a guideline for people who are interested in manual instantiation of FPGA primitives as a way of improving the performance of an FPGA design. The focus of the paper is on designs where slice primitives like flip-fops and lookup tables are instantiated. Guidelines on how to develop a design with manual instantiation are presented together with a case study of a high performance bitserial two's complement divider where a majority of the area is manually instantiated. This divider is capable of reaching a maximum frequency of 345 MHz in the fastest Virtex-4 while utilizing less than 150 LUTs thanks to the high amount of manual optimizations. An open source library containing modules intended to promote the structured development of modules with manually instantiated components is also presented.

  • 52.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Performance driven FPGA design with an ASIC perspective2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    FPGA devices are an important component in many modern devices. This means that it is important that VLSI designers have a thorough knowledge of how to optimize designs for FPGAs. While the design flows for ASICs and FPGAs are similar, there are many differences as well due to the limitations inherent in FPGA devices. To be able to use an FPGA efficiently it is important to be aware of both the strengths and oweaknesses of FPGAs. If an FPGA design should be ported to an ASIC at a later stage it is also important to take this into account early in the design cycle so that the ASIC port will be efficient.

    This thesis investigates how to optimize a design for an FPGA through a number of case studies of important SoC components. One of these case studies discusses high speed processors and the tradeoffs that are necessary when constructing very high speed processors in FPGAs. The processor has a maximum clock frequency of 357~MHz in a Xilinx Virtex-4 devices of the fastest speedgrade, which is significantly higher than Xilinx' own processor in the same FPGA.

    Another case study investigates floating point datapaths and describes how a floating point adder and multiplier can be efficiently implemented in an FPGA.

    The final case study investigates Network-on-Chip architectures and how these can be optimized for FPGAs. The main focus is on packet switched architectures, but a circuit switched architecture optimized for FPGAs is also investigated.

    All of these case studies also contain information about potential pitfalls when porting designs optimized for an FPGA to an ASIC. The focus in this case is on systems where initial low volume production will be using FPGAs while still keeping the option open to port the design to an ASIC if the demand is high. This information will also be useful for designers who want to create IP cores that can be efficiently mapped to both FPGAs and ASICs.

    Finally, a framework is also presented which allows for the creation of custom backend tools for the Xilinx design flow. The framework is already useful for some tasks, but the main reason for including it is to inspire researchers and developers to use this powerful ability in their own design tools.

    List of papers
    1. Using low precision floating point numbers to reduce memory cost for MP3 decoding
    Open this publication in new window or tab >>Using low precision floating point numbers to reduce memory cost for MP3 decoding
    2004 (English)In: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, p. 119-122Conference paper, Published paper (Refereed)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

    Place, publisher, year, edition, pages
    IEEE Xplore, 2004
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16559 (URN)10.1109/MMSP.2004.1436435 (DOI)0-7803-8578-0 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    2. An FPGA based Open Source Network-on-chip Architecture
    Open this publication in new window or tab >>An FPGA based Open Source Network-on-chip Architecture
    2007 (English)In: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, p. 800-803Conference paper, Published paper (Refereed)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

    Place, publisher, year, edition, pages
    IEEE, 2007
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16560 (URN)10.1109/FPL.2007.4380772 (DOI)978-1-4244-1060-6 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    3. Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    Open this publication in new window or tab >>Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    2007 (English)In: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Conference paper, Published paper (Refereed)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16561 (URN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    4. A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    Open this publication in new window or tab >>A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    2008 (English)In: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, p. 599-602Conference paper, Published paper (Refereed)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16562 (URN)10.1109/FPL.2008.4630018 (DOI)978-1-4244-1960-9 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    5. High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 4
    Open this publication in new window or tab >>High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 4
    2008 (English)In: IET Computers and digital techniques, ISSN 1751-8601, Vol. 2, p. 305-313Article in journal (Refereed) Published
    Abstract [en]

    There is increasing interest about floating-point arithmetics in field programmable gate arrays (FPGAs) because of the increase in their size and performance. FPGAs are generally good at bit manipulations and fixed-point arithmetics, but they have a harder time coping with floating-point arithmetics. An architecture used to construct high-performance floating-point components in a Virtex-4 FPGA is described in detail. Floating-point adder/subtracter and multiplier units have been constructed. The adder/subtracter can operate at a frequency of 377 MHz in a Virtex-4SX35 (speed grade -12).

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16563 (URN)10.1049/iet-cdt:20070075 (DOI)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    6. An ASIC Perspective on High Performance FPGA Design
    Open this publication in new window or tab >>An ASIC Perspective on High Performance FPGA Design
    2009 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16564 (URN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
  • 53.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Karlström, Per
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA2008In: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, p. 599-602Conference paper (Refereed)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

  • 54.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Eilert, Johan
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Comparison of Three FPGA Optimized NoC Architectures2007In: Swedish System-on-Chip Conference, SSoCC,2007, 2007Conference paper (Other academic)
  • 55.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Network on Chip based gigabit Ethernet router implemented on an FPGA2006In: SSoCC Swedish System-on-Chip Conference,2006, 2006Conference paper (Other academic)
  • 56.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    An Asic Perspective on FPGA Optimizations2009In: 19th International Conference on Field Programmable Logic and Applications (FPL), 2009, p. 218-223Conference paper (Refereed)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

  • 57.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    An FPGA based Open Source Network-on-chip Architecture2007In: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, p. 800-803Conference paper (Refereed)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

  • 58.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Benchmarking network processors2004In: Swedish System-on-Chip Conference,2004, 2004Conference paper (Other academic)
  • 59.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Flexible Route Lookup Using Range Search2005In: The Third IASTED International Conference on Communications and Computer Networks,2005, 2005Conference paper (Refereed)
  • 60.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Thinking outside the flow: Creating customized backend tools for Xilinx based designs2007In: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Conference paper (Refereed)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

  • 61.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Siverskog, Jacob
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Using Partial Reconfigurability to aid Debugging of FPGA Designs2011Conference paper (Refereed)
    Abstract [en]

    This paper discusses the use of partial reconfigurability in Xilinx FPGA designs in order to aid debugging. A debugging framework is proposed where the use of partial reconfigurability can allow for added flexibility by allowing a debugger to decide at run time what debugging module to use. This paper also presents an open source debugging tool which allows a user to read-out the contents of memory blocks in Xilinx designs without needing to use any JTAG adapter. This allows a user to debug an FPGA in situations which would otherwise be difficult, i.e. in the field.

  • 62.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wiklund, Daniel
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Feasibility study of a core router based on a network on chip2005In: Swedish System on Chip Conference SSoCC,2005, 2005Conference paper (Other academic)
    Abstract [en]

    In this paper we investigate the feasibility of creating a core router based upon a network on chip. The investigated architecture uses 16x10-Gbit Ethernet ports. The purpose of this is to show that it is possible to create such a solution considering current process technologies. This is done through an analysis of the required chip area, clock frequencies, and pin count. The results show that such a solution is feasible and can be implemented as a single chip.

  • 63.
    Ehrenstråhle, Carl
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Polynomial Expansion-Based Displacement Calculation on FPGA2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This thesis implements a system for calculating the displacement between two consecutive video frames. The displacement is calculated using a polynomial expansion-based algorithm. A unit tested bottoms-up approach is successfully used to design and implement the system. The designed and implemented system is thoroughly elaborated upon. The chosen algorithm and its computational details are presented to provide context to the implemented system. Some of the major issues and their impact on the system are discussed.

  • 64.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    ASIP for Wireless Communication and Media2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    While general purpose processors reach both high performance and high application flexibility, this comes at a high cost in terms of silicon area and power consumption. In systems where high application flexibility is not required, it is possible to trade off flexibility for lower cost by tailoring the processor to the application to create an Application Specific Instruction set Processor (ASIP) with high performance yet low silicon cost.

    This thesis demonstrates how ASIPs with application specific data types can provide efficient solutions with lower cost. Two examples are presented, an audio decoder ASIP for audio and music processing and a matrix manipulation ASIP for MIMO radio baseband signal processing.

    The audio decoder ASIP uses a 16-bit floating point data type to reduce the size of the data memory to about 60% of other solutions that use a 32-bit data type. Since the data memory occupies a major part of the silicon area, this has a significant impact on the total silicon area, and thereby also the static and dynamic power consumption. The data width reduction can be done without any noticeable artifacts in the decoded audio due to the natural masking effect ofthe human ear.

    The matrix manipulation SIMD ASIP is designed to perform various matrix operations such as matrix inversion and QR decomposition of small complex-valued matrices. This type of processing is found in MIMO radio baseband signal processing and the matrices are typically not larger than 4x4. There have been solutions published that use arrays of fixed-function processing elements to perform these operations, but the proposed ASIP performs the computations in less time and with lower hardware cost.

    The matrix manipulation ASIP data path uses a floating point data type to avoid data scaling issues associated with fixed point computations, especially those related to division and reciprocal calculations, and it also simplifies the program control flow since no special cases for certain inputs are needed which is especially important for SIMD architectures.

    These two applications were chosen to show how ASIPs can be a suitable alternative and match the requirements for different types of applications, to provide enough flexibility and performance to support different standards and algorithms with low hardware cost.

    List of papers
    1. Using low precision floating point numbers to reduce memory cost for MP3 decoding
    Open this publication in new window or tab >>Using low precision floating point numbers to reduce memory cost for MP3 decoding
    2004 (English)In: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, p. 119-122Conference paper, Published paper (Refereed)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

    Place, publisher, year, edition, pages
    IEEE Xplore, 2004
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16559 (URN)10.1109/MMSP.2004.1436435 (DOI)0-7803-8578-0 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    2. Efficient Complex Matrix Inversion for MIMO Software Defined Radio
    Open this publication in new window or tab >>Efficient Complex Matrix Inversion for MIMO Software Defined Radio
    2007 (English)In: International Symposium on Circuits and Systems, ISCAS,2007, IEEE , 2007, p. 2610-2613Conference paper, Published paper (Refereed)
    Abstract [en]

    Complex matrix inversion is a very computationally demanding operation in advanced multi-antenna wireless communications. Traditionally, systolic array-based QR decomposition (QRD) is used to invert large matrices. However, the matrices involved in MIMO baseband processing in mobile handsets are generally small which means QRD is not necessarily efficient. In this paper, a new method is proposed using programmable hardware units which not only achieves higher performance but also consumes less silicon area. Furthermore, the hardware can be reused for many other operations such as complex matrix multiplication, filtering, correlation and FFT/IFFT.

    Place, publisher, year, edition, pages
    IEEE, 2007
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-39855 (URN)10.1109/ISCAS.2007.377850 (DOI)51537 (Local ID)1-4244-0920-9 (ISBN)51537 (Archive number)51537 (OAI)
    Conference
    nternational Symposium on Circuits and Systems (ISCAS 2007), 27-20 May, New Orleans, Louisiana, USA
    Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2011-02-04
    3. Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding
    Open this publication in new window or tab >>Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding
    Show others...
    2007 (English)In: IEEE Sarnoff Symmposium,2007, 2007, p. 1-5Conference paper, Published paper (Refereed)
    Abstract [en]

    This paper studies efficient complex valued matrix manipulations for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) is proposed for the inversion of large complex matrices that are based on Alamouti sub-blocks. Another method using a variant of Givens rotation is proposed for fast QR decomposition of this kind of matrices. Our solutions significantly reduce the number of operations which makes them more than 4 times faster than several other solutions in the literature. Furthermore, compared to fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware is programmable and it can be reused for many other operations such as filtering, correlation and FFT/IFFT. Besides the analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix manipulations.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-39861 (URN)10.1109/SARNOF.2007.4567354 (DOI)51543 (Local ID)978-1-4244-2483-2 (ISBN)51543 (Archive number)51543 (OAI)
    Conference
    Sarnoff Symposium, April 30-May 2, Nassau Inn, Princeton, NJ, USA
    Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2011-02-04
    4. Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM
    Open this publication in new window or tab >>Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM
    2008 (English)In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP,2008, IEEE , 2008, p. 5396-5399Conference paper, Published paper (Refereed)
    Abstract [en]

    This paper presents a linear minimum mean square error (LMMSE) symbol detector for MIMO-OFDM enabled mobile terminals. The detector is implemented using a programmable baseband processor aimed for software-defined radio (SDR). Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of detection. The programmable solution not only supports different transmit/receive antenna configurations, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional solutions, the one proposed in this paper is smaller, more flexible and faster.

    Place, publisher, year, edition, pages
    IEEE, 2008
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-42734 (URN)10.1109/ICASSP.2008.4518880 (DOI)68460 (Local ID)978-1-4244-1483-3 (ISBN)68460 (Archive number)68460 (OAI)
    Conference
    IEEE International Conference on Acoustics, Speech and Signal Processing, March 31-April 4, Las Vegas, NV, USA
    Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2011-02-04Bibliographically approved
    5. Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor
    Open this publication in new window or tab >>Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor
    2008 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    This paper presents a space-time block coding decoder for MIMO-OFDM enabled mobile terminals. The decoder is implemented using a programmable baseband processor aimed for software-defined radio (SDR). The dynamic range supplied by the floating-point SIMD datapath allows special algorithms to significantly reduce the computational latency of decoding. The programmable solution not only supports different transmit/receive antenna configuration, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional ASIC solutions in literature, the one proposed in this paper is by far the smallest, fastest and with more flexibility.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-42763 (URN)10.1109/ICCSC.2008.65 (DOI)68620 (Local ID)978-1-4244-1707-0 (ISBN)68620 (Archive number)68620 (OAI)
    Conference
    4th IEEE International Conference on Circuits and Systems for Communications, 26-28 May, Shanghai, China
    Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2011-02-04Bibliographically approved
  • 65.
    Eilert, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Using low precision floating point numbers to reduce memory cost for MP3 decoding2004In: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, p. 119-122Conference paper (Refereed)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

  • 66.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Ehliar, Andreas
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Design of a Floating Point DSP for Full Precision MPEG-I Layer II and III Decoding2005In: Swedish System on Cihip Conference SSoCC,2005, 2005Conference paper (Other academic)
  • 67.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Early Exploratioin of MIPS Cost and Memory Cost Trade-off for Media DSP Media Processor2006In: SSoCC Swedish System-on-Chip Conference,2006, 2006Conference paper (Other academic)
  • 68.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Efficient Complex Matrix Inversion for MIMO Software Defined Radio2007In: International Symposium on Circuits and Systems, ISCAS,2007, IEEE , 2007, p. 2610-2613Conference paper (Refereed)
    Abstract [en]

    Complex matrix inversion is a very computationally demanding operation in advanced multi-antenna wireless communications. Traditionally, systolic array-based QR decomposition (QRD) is used to invert large matrices. However, the matrices involved in MIMO baseband processing in mobile handsets are generally small which means QRD is not necessarily efficient. In this paper, a new method is proposed using programmable hardware units which not only achieves higher performance but also consumes less silicon area. Furthermore, the hardware can be reused for many other operations such as complex matrix multiplication, filtering, correlation and FFT/IFFT.

  • 69.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM2008In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP,2008, IEEE , 2008, p. 5396-5399Conference paper (Refereed)
    Abstract [en]

    This paper presents a linear minimum mean square error (LMMSE) symbol detector for MIMO-OFDM enabled mobile terminals. The detector is implemented using a programmable baseband processor aimed for software-defined radio (SDR). Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of detection. The programmable solution not only supports different transmit/receive antenna configurations, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional solutions, the one proposed in this paper is smaller, more flexible and faster.

  • 70.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor2008Conference paper (Refereed)
    Abstract [en]

    This paper presents a space-time block coding decoder for MIMO-OFDM enabled mobile terminals. The decoder is implemented using a programmable baseband processor aimed for software-defined radio (SDR). The dynamic range supplied by the floating-point SIMD datapath allows special algorithms to significantly reduce the computational latency of decoding. The programmable solution not only supports different transmit/receive antenna configuration, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional ASIC solutions in literature, the one proposed in this paper is by far the smallest, fastest and with more flexibility.

  • 71.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wang, Dandan
    Al-Dhahir, Naofal
    Minn, Hlaing
    Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding2007In: IEEE Sarnoff Symmposium,2007, 2007, p. 1-5Conference paper (Refereed)
    Abstract [en]

    This paper studies efficient complex valued matrix manipulations for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) is proposed for the inversion of large complex matrices that are based on Alamouti sub-blocks. Another method using a variant of Givens rotation is proposed for fast QR decomposition of this kind of matrices. Our solutions significantly reduce the number of operations which makes them more than 4 times faster than several other solutions in the literature. Furthermore, compared to fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware is programmable and it can be reused for many other operations such as filtering, correlation and FFT/IFFT. Besides the analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix manipulations.

  • 72.
    Einemo, Jonas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Lundqvist, Magnus
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Selection of H.264 Encoder Components Implemented and Benchmarked on a Multi-core DSP Processor2010Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    H.264 is a video coding standard which offers high data compression rate at the cost of a high computational load. This thesis evaluates how well parts of the H.264 standard can be implemented for a new multi-core digital signal processing processor architecture called ePUMA. The thesis investigates if real-time encoding of high definition video sequences could be performed. The implementation consists of the motion estimation, motion compensation, discrete cosine transform, inverse discrete cosine transform, quantization and rescaling parts of the H.264 standard. Benchmarking is done using the ePUMA system simulator and the results are compared to an implementation of an existing H.264 encoder for another multi-core processor architecture called STI Cell. The results show that the selected parts of the H.264 encoder could be run on 6 calculation cores in 5 million cycles per frame. This setup leaves 2 calculation cores to run the remaining parts of the encoder.

  • 73.
    Englund, Madeleine
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Hybrid Floating-point Units in FPGAs2012Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Floating point numbers are used in many applications that  would be well suited to a higher parallelism than that offered in a CPU. In  these cases, an FPGA, with its ability to handle multiple calculations  simultaneously, could be the solution. Unfortunately, floating point  operations which are implemented in an FPGA is often resource intensive,  which means that many developers avoid floating point solutions in FPGAs or  using FPGAs for floating point applications.

    Here the potential to get less expensive floating point operations by using ahigher radix for the floating point numbers and using and expand the existingDSP block in the FPGA is investigated. One of the goals is that the FPGAshould be usable for both the users that have floating point in their designsand those who do not. In order to motivate hard floating point blocks in theFPGA, these must not consume too much of the limited resources.

    This work shows that the floating point addition will become smaller withthe use of the higher radix, while the multiplication becomes smaller by usingthe hardware of the DSP block. When both operations are examined at the sametime, it turns out that it is possible to get a reduced area, compared toseparate floating point units, by utilizing both the DSP block and higherradix for the floating point numbers.

  • 74.
    Eriksson, Henrik
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronic Devices.
    Henriksson, Tomas
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Larsson-Edefors, Per
    Chalmers.
    Full-custom vs standard-cell based design - An adder comparison.2002In: Swedish System-on-Chip conference.,2002, 2002Conference paper (Other academic)
  • 75.
    Eriksson, Henrik
    et al.
    Dept of Computer Engineering Chalmers tekniska högskola.
    Henriksson, Tomas
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Larsson-Edefors, Per
    Dept of Computer Engineering Chalmers tekniska högskola.
    Svensson, Christer
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronic Devices.
    Full-Custom vs. Standard-Cell Design Flow - An Adder Case Study2003In: Asia South Pacific Design Automation Conference,2003, 2003, p. 507-Conference paper (Refereed)
  • 76.
    Ferdeen, Mats
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Reducing Energy Consumption Through Image Compression2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The energy consumption to make the off-chip memory writing and readings are aknown problem. In the image processing field structure from motion simpler compressiontechniques could be used to save energy. A balance between the detected features suchas corners, edges, etc., and the degree of compression becomes a big issue to investigate.In this thesis a deeper study of this balance are performed. A number of more advancedcompression algorithms for processing of still images such as JPEG is used for comparisonwith a selected number of simpler compression algorithms. The simpler algorithms canbe divided into two categories: individual block-wise compression of each image andcompression with respect to all pixels in each image. In this study the image sequences arein grayscale and provided from an earlier study about rolling shutters. Synthetic data setsfrom a further study about optical flow is also included to see how reliable the other datasets are.

  • 77.
    Flordal, Oskar
    et al.
    Axis Communications AB .
    Flordal, Oskar
    Axis Communications AB .
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Accelerating CABAC Encoding for Multi-standard Media with Configurability2006In: IEEE IPDPS,2006, 2006Conference paper (Refereed)
  • 78.
    Fries, Jakob
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Johansson, Simon
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Modular 3D Graphics Accelerator for FPGA2011Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    A modular and area-efficient 3D graphics accelerator for tile based rendering in FPGA systems has been designed and implemented. The accelerator supports a subset of OpenGL, with features such as mipmapping, multitexturing and blending. The accelerator consists of a software component for projection and clipping of triangles, as well as a hardware component for rasterization, coloring and video output. Trade-offs made between area, performance and functionality have been described and justified. In order to evaluate the functionality and performance of the accelerator, it has been tested with two different applications.

  • 79.
    Frisk, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, Faculty of Science & Engineering.
    Krysander, Mattias
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Residual Selection for Consistency Based Diagnosis Using Machine Learning Models2018In: IFAC PAPERSONLINE, ELSEVIER SCIENCE BV , 2018, Vol. 51, no 24, p. 139-146Conference paper (Refereed)
    Abstract [en]

    A common architecture of model-based diagnosis systems is to use a set of residuals to detect and isolate faults. In the paper it is motivated that in many cases there are more possible candidate residuals than needed for detection and single fault isolation and key sources of varying performance in the candidate residuals are model errors and noise. This paper formulates a systematic method of how to select, from a set of candidate residuals, a subset with good diagnosis performance. A key contribution is the combination of a machine learning model, here a random forest model, with diagnosis specific performance specifications to select a high performing subset of residuals. The approach is applied to an industrial use case, an automotive engine, and it is shown how the trade-off between diagnosis performance and the number of residuals easily can be controlled. The number of residuals used are reduced from original 42 to only 12 without losing significant diagnosis performance. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

  • 80.
    Frisk, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, Faculty of Science & Engineering.
    Krysander, Mattias
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Jung, Daniel
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, Faculty of Science & Engineering.
    A Toolbox for Analysis and Design of Model Based Diagnosis Systems for Large Scale Models2017In: IFAC PAPERSONLINE, ELSEVIER SCIENCE BV , 2017, Vol. 50, no 1, p. 3287-3293Conference paper (Refereed)
    Abstract [en]

    To facilitate the use of advanced fault diagnosis analysis and design techniques to industrial sized systems, there is a need for computer support. This paper describes a Matlab toolbox and evaluates the software on a challenging industrial problem, air-path diagnosis in an automotive engine. The toolbox includes tools for analysis and design of model based diagnosis systems for large-scale differential algebraic models. The software package supports a complete tool-chain from modeling a system to generating C-code for residual generators. Major design steps supported by the tool are modeling, fault diagnosability analysis, sensor selection, residual generator analysis, test selection, and code generation. Structural methods based on efficient graph theoretical algorithms are used in several steps. In the automotive diagnosis example, a diagnosis system is generated and evaluated using measurement data, both in fault-free operation and with faults injected in the control-loop. The results clearly show the benefit of the toolbox in a model-based design of a diagnosis system. Latest version of the toolbox can be downloaded at faultdiagnosistoolbox.github.io. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

  • 81.
    Frisk, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, Faculty of Science & Engineering.
    Krysander, Mattias
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Åslund, Jan
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, Faculty of Science & Engineering.
    Analysis and Design of Diagnosis Systems Based on the Structural Differential Index2017In: 20th IFAC World Congress, ELSEVIER SCIENCE BV , 2017, Vol. 50, no 1, p. 12236-12242Conference paper (Refereed)
    Abstract [en]

    Structural approaches have shown to be useful for analyzing and designing diagnosis systems for industrial systems. In simulation and estimation literature, related theories about differential index have been developed and, also there, structural methods have been successfully applied for simulating large-scale differential algebraic models. A main contribution of this paper is to connect those theories and thus making the tools from simulation and estimation literature available for model based diagnosis design. A key step in the unification is an extension of the notion of differential index of exactly determined systems of equations to overdetermined systems of equations. A second main contribution is how differential-index can be used in diagnosability analysis and also in the design stage where an exponentially sized search space is significantly reduced. This allows focusing on residual generators where basic design techniques, such as standard state-observation techniques and sequential residual generation are directly applicable. The developed theory has a direct industrial relevance, which is illustrated with discussions on an automotive engine example. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

  • 82.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A New Representation of FFT Algorithms Using Triangular Matrices2016In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 63, no 10, p. 1737-1745Article in journal (Refereed)
    Abstract [en]

    In this paper we propose a new representation for FFT algorithms called the triangular matrix representation. This representation is more general than the binary tree representation and, therefore, it introduces new FFT algorithms that were not discovered before. Furthermore, the new representation has the advantage that it is simple and easy to understand, as each FFT algorithm only consists of a triangular matrix. Besides, the new representation allows for obtaining the exact twiddle factor values in the FFT flow graph easily. This facilitates the design of FFT hardware architectures. As a result, the triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.

  • 83.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    The Feedforward Short-Time Fourier Transform2016In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, no 9, p. 868-872Article in journal (Refereed)
    Abstract [en]

    This brief presents the feedforward short-time Fourier transform (STFT). This new approach is based on reusing the calculations of the STFT at consecutive time instants. This leads to significant savings in hardware components with respect to fast Fourier transform based STFTs. Furthermore, the feedforward STFT does not have the accumulative error of iterative STFT approaches. As a result, the proposed feedforward STFT presents an excellent tradeoff between hardware utilization and performance.

  • 84.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Andersson, Rikard
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, The Institute of Technology.
    Qureshi, Fahad
    Tampere University of Technology, Finland.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Multiplierless Unity-Gain SDF FFTs2016In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 24, no 9, p. 3003-3007Article in journal (Refereed)
    Abstract [en]

    In this brief, we propose a novel approach to implement multiplierless unity-gain single-delay feedback fast Fourier transforms (FFTs). Previous methods achieve unity-gain FFTs by using either complex multipliers or nonunity-gain rotators with additional scaling compensation. Conversely, this brief proposes unity-gain FFTs without compensation circuits, even when using nonunity-gain rotators. This is achieved by a joint design of rotators, so that the entire FFT is scaled by a power of two, which is then shifted to unity. This reduces the amount of hardware resources of the FFT architecture, while having high accuracy in the calculations. The proposed approach can be applied to any FFT size, and various designs for different FFT sizes are presented.

  • 85.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Angel Sanchez, Miguel
    University of Politecn Madrid, Spain.
    Luisa Lopez-Vallejo, Maria
    University of Politecn Madrid, Spain.
    Grajal, Jesus
    University of Politecn Madrid, Spain.
    A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices2017In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, no 1, p. 375-379Article in journal (Refereed)
    Abstract [en]

    This brief presents a novel 4096-point radix-4 memory-based fast Fourier transform (FFT). The proposed architecture follows a conflict-free strategy that only requires a total memory of size N and a few additional multiplexers. The control is also simple, as it is generated directly from the bits of a counter. Apart from the low complexity, the FFT has been implemented on a Virtex-5 field programmable gate array (FPGA) using DSP slices. The goal has been to reduce the use of distributed logic, which is scarce in the target FPGA. With this purpose, most of the hardware has been implemented in DSP48E. As a result, the proposed FPGA is efficient in terms of hardware resources, as is shown by the experimental results.

  • 86.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Chen, Sau-Gee
    Natl Chiao Tung Univ, Taiwan.
    Feedforward FFT Hardware Architectures Based on Rotator Allocation2018In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 65, no 2, p. 581-592Article in journal (Refereed)
    Abstract [en]

    In this paper, we present new feedforward FFT hardware architectures based on rotator allocation. The rotator allocation approach consists in distributing the rotations of the FFT in such a way that the number of edges in the FFT that need rotators and the complexity of the rotators are reduced. Radix-2 and radix-2(k) feedforward architectures based on rotator allocation are presented in this paper. Experimental results show that the proposed architectures reduce the hardware cost significantly with respect to previous FFT architectures.

  • 87.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Chen, Sau-Gee
    National Chiao Tung University, Taiwan.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    The Serial Commutator FFT2016In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, no 10, p. 974-978Article in journal (Refereed)
    Abstract [en]

    This brief presents a new type of fast Fourier transform (FFT) hardware architectures called serial commutator (SC) FFT. The SC FFT is characterized by the use of circuits for bit-dimension permutation of serial data. The proposed architectures are based on the observation that, in the radix-2 FFT algorithm, only half of the samples at each stage must be rotated. This fact, together with a proper data management, makes it possible to allocate rotations only every other clock cycle. This allows for simplifying the rotator, halving the complexity with respect to conventional serial FFT architectures. Likewise, the proposed approach halves the number of adders in the butterflies with respect to previous architectures. As a result, the proposed architectures use the minimum number of adders, rotators, and memory that are necessary for a pipelined FFT of serial data, with 100% utilization ratio.

  • 88.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Källström, Petter
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Kumm, Martin
    University of Kassel, Germany.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    CORDIC II: A New Improved CORDIC Algorithm2016In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, no 2, p. 186-190Article in journal (Refereed)
    Abstract [en]

    In this brief, we present the CORDIC II algorithm. Like previous CORDIC algorithms, the CORDIC II calculates rotations by breaking down the rotation angle into a series of microrotations. However, the CORDIC II algorithm uses a novel angle set, different from the angles used in previous CORDIC algorithms. The new angle set provides a faster convergence that reduces the number of adders with respect to previous approaches.

  • 89.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Lopez-Vallejo, Maria Luisa
    Tech Univ Madrid, Spain.
    Chen, Sau-Gee
    Natl Chiao Tung Univ, Taiwan.
    Guest Editorial: Special Section on Fast Fourier Transform (FFT) Hardware Implementations2018In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 90, no 11, p. 1581-1582Article in journal (Other academic)
    Abstract [en]

    n/a

  • 90.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Qureshi, Fahad
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low-Complexity Multiplierless Constant Rotators Based on Combined Coefficient Selection and Shift-and-Add Implementation (CCSSI)2014In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 61, no 7, p. 2002-2012Article in journal (Refereed)
    Abstract [en]

    This paper presents a new approach to design multiplierless constant rotators. The approach is based on a combined coefficient selection and shift-and-add implementation (CCSSI) for the design of the rotators. First, complete freedom is given to the selection of the coefficients, i.e., no constraints to the coefficients are set in advance and all the alternatives are taken into account. Second, the shift-and-add implementation uses advanced single constant multiplication (SCM) and multiple constant multiplication (MCM) techniques that lead to low-complexity multiplierless implementations. Third, the design of the rotators is done by a joint optimization of the coefficient selection and shift-and-add implementation. As a result, the CCSSI provides an extended design space that offers a larger number of alternatives with respect to previous works. Furthermore, the design space is explored in a simple and efficient way. The proposed approach has wide applications in numerous hardware scenarios. This includes rotations by single or multiple angles, rotators in single or multiple branches, and different scaling of the outputs. Experimental results for various scenarios are provided. In all of them, the proposed approach achieves significant improvements with respect to state of the art.

  • 91.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Unnikrishnan, Nanda K.
    Univ Minnesota, MN 55455 USA.
    Parhi, Keshab K.
    Univ Minnesota, MN 55455 USA.
    A Serial Commutator Fast Fourier Transform Architecture for Real-Valued Signals2018In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 65, no 11, p. 1693-1697Article in journal (Refereed)
    Abstract [en]

    This brief presents a novel pipelined architecture to compute the fast Fourier transform of real input signals in a serial manner, i.e., one sample is processed per cycle. The proposed architecture, referred to as real-valued serial commutator, achieves full hardware utilization by mapping each stage of the fast Fourier transform (FFT) to a half-butterfly operation that operates on real input signals. Prior serial architectures to compute FFT of real signals only achieved 50% hardware utilization. Novel data-exchange and data-reordering circuits are also presented. The complete serial commutator architecture requires 2 log(2) N - 2 real adders, log(2) N - 2 real multipliers, and N + 9 log(2) N - 19 real delay elements, where N represents the size of the FFT.

  • 92.
    Garrido, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Multiplexer and Memory-Efficient Circuits for Parallel Bit Reversal2019In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 66, no 4, p. 657-661Article in journal (Refereed)
    Abstract [en]

    This brief presents novel circuits for calculating the bit reversal on parallel data. The circuits consist of delays/memories and multiplexers, and have the advantage that they requires the minimum number of multiplexers among circuits for parallel bit reversal so far, as well as a small total memory.

  • 93.
    Garrido, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Acevedo, Miguel
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Challenging the Limits of FFT Performance on FPGAs2014Conference paper (Refereed)
    Abstract [en]

    This paper analyzes the limits of FFT performance on FPGAs. For this purpose, a FFT generation tool has been developed. This tool is highly parameterizable and allows for generating FFTs with different FFT sizes and amount of parallelization. Experimental results for FFT sizes from 16 to 65536, and 4 to 64 parallel samples have been obtained. They show that even the largest FFT architectures fit well in today's FPGAs, achieving throughput rates from several GSamples/s to tens of GSamples/s.

  • 94.
    Garrido, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Grajal, Jesus
    Univ Politecn Madrid, Spain.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Optimum Circuits for Bit-Dimension Permutations2019In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 27, no 5, p. 1148-1160Article in journal (Refereed)
    Abstract [en]

    In this paper, we present a systematic approach to design hardware circuits for bit-dimension permutations. The proposed approach is based on decomposing any bit-dimension permutation into elementary bit-exchanges. Such decomposition is proven to achieve the theoretical minimum number of delays required for the permutation. This offers optimum solutions for multiple well-known problems in the literature that make use of bit-dimension permutations. This includes the design of permutation circuits for the fast Fourier transform, bit reversal, matrix transposition, stride permutations, and Viterbi decoders.

  • 95.
    Garrido, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Möller, K.
    University of Kassel, Kassel, Germany.
    Kumm, M.
    University of Kassel, Kassel, Germany.
    World’s Fastest FFT Architectures: Breaking the Barrier of 100 GS/s2019In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 66, no 4, p. 1507-1516Article in journal (Refereed)
    Abstract [en]

    This paper presents the fastest fast Fourier transform (FFT) hardware architectures so far. The architectures are based on a fully parallel implementation of the FFT algorithm. In order to obtain the highest throughput while keeping the resource utilization low, we base our design on making use of advanced shift-and-add techniques to implement the rotators and on selecting the most suitable FFT algorithms for these architectures. Apart from high throughput and resource efficiency, we also guarantee high accuracy in the proposed architectures. For the implementation, we have developed an automatic tool that generates the architectures as a function of the FFT size, input word length and accuracy of the rotations. We provide experimental results covering various FFT sizes, FFT algorithms, and field-programmable gate array boards. These results show that it is possible to break the barrier of 100 GS/s for FFT calculation.

  • 96.
    Ge, Hanxiao
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Investigation of LDPC code in DVB-S22012Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    As one of the most powerful error-correcting codes, Low-density parity check codes are widely used in digital communications. Because of the performance of LDPC codes are capable to close the shannon limited extraordinarily, LDPC codes are to be used in the new Digital Video Broadcast-Satellite-Second Generation(DVB-S2) and it is the first time that LDPC codes are included in the broadcast standard in 2003.

    In this thesis, a restructured parity-check matrices which can be divided into sub-matrices for LDPC code in DVB-S2 is provided. Corresponded to this restructured parity-check matrix, a reconstructed decoding table is invented. The encoding table of DVB-S2 standard only could obtain the unknown check nodes from known variable nodes, while the decoding table this thesis provided could obtain the unknown variable nodes from known check nodes what is exactly the Layered-massage passing algorithm needed. Layered-message passing algorithm which also known as "Turbo-decoding message passing" is used to reduce the decoding iterations and memory storage for messages. The thesis also investigate Bp algorithm, lambda-min algorithm, Min-sum algorithm and SISO-s algorithm, meanwhile, simulation results of these algorithms and schedules are also presented.

  • 97.
    Gebrewahid, Essayas
    et al.
    Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES), Sweden.
    Ali Arslan, Mehmet
    Lund University, Sweden.
    Karlsson, Andréas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Ul-Abdin, Zain
    Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES), Sweden.
    Support for Data Parallelism in the CAL Actor Language2016In: PROCEEDINGS OF THE 2016 3RD WORKSHOP ON PROGRAMMING MODELS FOR SIMD/VECTOR PROCESSING (WPMVP 2016), New York, NY: Association for Computing Machinery (ACM), 2016, p. 1-8Conference paper (Refereed)
    Abstract [en]

    With the arrival of heterogeneous manycores comprising various features to support task, data and instruction-level parallelism, developing applications that take full advantage of the hardware parallel features has become a major challenge. In this paper, we present an extension to our CAL compilation framework (CAL2Many) that supports data parallelism in the CAL Actor Language. Our compilation framework makes it possible to program architectures with SIMD support using high-level language and provides efficient code generation. We support general SIMD instructions but the code generation backend is currently implemented for two custom architectures, namely ePUMA and EIT. Our experiments were carried out for two custom SIMD processor architectures using two applications.

    The experiment shows the possibility of achieving performance comparable to hand-written machine code with much less programming effort.

  • 98.
    Gonzalez, Maya
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Design and Implementation of a SATA Host Controller on a Spartan-6 FPGA2012Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    At Saab Dynamics AB there are a number of projects where cameras are an important part of a sensor system. Examples of such projects are monitoring for civil security and 3D mapping, where several cameras are used. The cameras can for example be located in airplanes, helicopters or cars and therefore it is important to have a robust function for recording data. One way to achieve a quick recording with sufficient storage size is to use SATA flash disks. To reduce the size and power consumption of the recording equipment and to enable project-specific adaptations it is desirable to use an FPGA as an interface to SATA devices. This thesis concerns the development of such an interface implemented on an FPGA. The theory behind the SATA interconnect standard is described along with the design work and its challenges.

  • 99.
    Gu, Haohao
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Zhang, He
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Implementation of CMMB System using Software Defined Radio (SDR) Platform2010Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
    Abstract [en]

    CMMB (China Multimedia Mobile Broadingcasting) is a wireless broadcastingchannel standard for low bandwidth, low cost hand-held digital TV is adopted byall continental Chinese government TV broadcasting companies and some HongKong private TV broadcasting companies. The business potential is high, yet thefuture is hard to predict because it might be replaced by GB200600 or DTMB. Thedigital modulation is based on OFDM with pilot supporting channel estimationand equalization and CP supporting multi-path induced ISI problems.This thesis investigates the implement a CMMB system using a SDR platform.Simulation chain was implemented using MATLAB with full data precision includingCMMB transmitter and receiver. The transmitter behavior model includes RSencoder, LDPC encoder, OFDM modulation, etc. The receiver behavior modelincludes OFDM demodulation, channel estimation, channel equalization, LDPCdecoder, RS decoder, etc. Different channel models emulating pathloss, whitenoise, multi-path, and glitch were modeled. Based on the simulation chain andchannel models, T-domain, F-domain channel estimator and equalizer were implemented,optimized. Optimized TD-FD models for different mobility scenarioswere proposed. The focus of the thesis is on 2D (FD-TD) channel estimation andequalization.

  • 100.
    Gunnarsson, Svante
    et al.
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, The Institute of Technology.
    Wiklund, Ingela
    Linköping University, The Institute of Technology.
    Svensson, Tomas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Kindgren, Annalena
    Linköping University, The Institute of Technology.
    Granath, Sten
    Linköping University, The Institute of Technology.
    Large Scale use of the CDIO Syllabus in Formulation of Program and Course Goals2007In: Proceedings of the 3rd International CDIO Conference, 2007Conference paper (Refereed)
    Abstract [en]

    A large scale application of the CDIO Syllabus in formulation of course and program goals is presented. The application involves all programs and courses within the engineering education at Linköping University. Key components in the work are course level ITU-matrices for mapping of the course contents to the CDIO Syllabus, and a suggested way to organize suitable verbs for formulation of learning outcomes according to the sections in the CDIO Syllabus

1234567 51 - 100 of 362
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf