liu.seSearch for publications in DiVA
Endre søk
Begrens søket
1234567 51 - 100 of 362
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51.
    Ehliar, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Optimizing Xilinx designs through primitive instantiation2010Inngår i: FPGAworld '10 Proceedings of the 7th FPGAworld Conference, New York: ACM , 2010, s. 20-27Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper is intended as a guideline for people who are interested in manual instantiation of FPGA primitives as a way of improving the performance of an FPGA design. The focus of the paper is on designs where slice primitives like flip-fops and lookup tables are instantiated. Guidelines on how to develop a design with manual instantiation are presented together with a case study of a high performance bitserial two's complement divider where a majority of the area is manually instantiated. This divider is capable of reaching a maximum frequency of 345 MHz in the fastest Virtex-4 while utilizing less than 150 LUTs thanks to the high amount of manual optimizations. An open source library containing modules intended to promote the structured development of modules with manually instantiated components is also presented.

  • 52.
    Ehliar, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Performance driven FPGA design with an ASIC perspective2009Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    FPGA devices are an important component in many modern devices. This means that it is important that VLSI designers have a thorough knowledge of how to optimize designs for FPGAs. While the design flows for ASICs and FPGAs are similar, there are many differences as well due to the limitations inherent in FPGA devices. To be able to use an FPGA efficiently it is important to be aware of both the strengths and oweaknesses of FPGAs. If an FPGA design should be ported to an ASIC at a later stage it is also important to take this into account early in the design cycle so that the ASIC port will be efficient.

    This thesis investigates how to optimize a design for an FPGA through a number of case studies of important SoC components. One of these case studies discusses high speed processors and the tradeoffs that are necessary when constructing very high speed processors in FPGAs. The processor has a maximum clock frequency of 357~MHz in a Xilinx Virtex-4 devices of the fastest speedgrade, which is significantly higher than Xilinx' own processor in the same FPGA.

    Another case study investigates floating point datapaths and describes how a floating point adder and multiplier can be efficiently implemented in an FPGA.

    The final case study investigates Network-on-Chip architectures and how these can be optimized for FPGAs. The main focus is on packet switched architectures, but a circuit switched architecture optimized for FPGAs is also investigated.

    All of these case studies also contain information about potential pitfalls when porting designs optimized for an FPGA to an ASIC. The focus in this case is on systems where initial low volume production will be using FPGAs while still keeping the option open to port the design to an ASIC if the demand is high. This information will also be useful for designers who want to create IP cores that can be efficiently mapped to both FPGAs and ASICs.

    Finally, a framework is also presented which allows for the creation of custom backend tools for the Xilinx design flow. The framework is already useful for some tasks, but the main reason for including it is to inspire researchers and developers to use this powerful ability in their own design tools.

    Delarbeid
    1. Using low precision floating point numbers to reduce memory cost for MP3 decoding
    Åpne denne publikasjonen i ny fane eller vindu >>Using low precision floating point numbers to reduce memory cost for MP3 decoding
    2004 (engelsk)Inngår i: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, s. 119-122Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

    sted, utgiver, år, opplag, sider
    IEEE Xplore, 2004
    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-16559 (URN)10.1109/MMSP.2004.1436435 (DOI)0-7803-8578-0 (ISBN)
    Tilgjengelig fra: 2009-02-02 Laget: 2009-02-02 Sist oppdatert: 2015-02-18bibliografisk kontrollert
    2. An FPGA based Open Source Network-on-chip Architecture
    Åpne denne publikasjonen i ny fane eller vindu >>An FPGA based Open Source Network-on-chip Architecture
    2007 (engelsk)Inngår i: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, s. 800-803Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

    sted, utgiver, år, opplag, sider
    IEEE, 2007
    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-16560 (URN)10.1109/FPL.2007.4380772 (DOI)978-1-4244-1060-6 (ISBN)
    Tilgjengelig fra: 2009-02-02 Laget: 2009-02-02 Sist oppdatert: 2015-02-18bibliografisk kontrollert
    3. Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    Åpne denne publikasjonen i ny fane eller vindu >>Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    2007 (engelsk)Inngår i: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-16561 (URN)
    Tilgjengelig fra: 2009-02-02 Laget: 2009-02-02 Sist oppdatert: 2015-02-18bibliografisk kontrollert
    4. A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    Åpne denne publikasjonen i ny fane eller vindu >>A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    2008 (engelsk)Inngår i: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, s. 599-602Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-16562 (URN)10.1109/FPL.2008.4630018 (DOI)978-1-4244-1960-9 (ISBN)
    Tilgjengelig fra: 2009-02-02 Laget: 2009-02-02 Sist oppdatert: 2015-02-18bibliografisk kontrollert
    5. High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 4
    Åpne denne publikasjonen i ny fane eller vindu >>High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 4
    2008 (engelsk)Inngår i: IET Computers and digital techniques, ISSN 1751-8601, Vol. 2, s. 305-313Artikkel i tidsskrift (Fagfellevurdert) Published
    Abstract [en]

    There is increasing interest about floating-point arithmetics in field programmable gate arrays (FPGAs) because of the increase in their size and performance. FPGAs are generally good at bit manipulations and fixed-point arithmetics, but they have a harder time coping with floating-point arithmetics. An architecture used to construct high-performance floating-point components in a Virtex-4 FPGA is described in detail. Floating-point adder/subtracter and multiplier units have been constructed. The adder/subtracter can operate at a frequency of 377 MHz in a Virtex-4SX35 (speed grade -12).

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-16563 (URN)10.1049/iet-cdt:20070075 (DOI)
    Tilgjengelig fra: 2009-02-02 Laget: 2009-02-02 Sist oppdatert: 2015-02-18bibliografisk kontrollert
    6. An ASIC Perspective on High Performance FPGA Design
    Åpne denne publikasjonen i ny fane eller vindu >>An ASIC Perspective on High Performance FPGA Design
    2009 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-16564 (URN)
    Tilgjengelig fra: 2009-02-02 Laget: 2009-02-02 Sist oppdatert: 2015-02-18bibliografisk kontrollert
  • 53.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlström, Per
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA2008Inngår i: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, s. 599-602Konferansepaper (Fagfellevurdert)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

  • 54.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Comparison of Three FPGA Optimized NoC Architectures2007Inngår i: Swedish System-on-Chip Conference, SSoCC,2007, 2007Konferansepaper (Annet vitenskapelig)
  • 55.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Network on Chip based gigabit Ethernet router implemented on an FPGA2006Inngår i: SSoCC Swedish System-on-Chip Conference,2006, 2006Konferansepaper (Annet vitenskapelig)
  • 56.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    An Asic Perspective on FPGA Optimizations2009Inngår i: 19th International Conference on Field Programmable Logic and Applications (FPL), 2009, s. 218-223Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

  • 57.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    An FPGA based Open Source Network-on-chip Architecture2007Inngår i: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, s. 800-803Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

  • 58.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Benchmarking network processors2004Inngår i: Swedish System-on-Chip Conference,2004, 2004Konferansepaper (Annet vitenskapelig)
  • 59.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Flexible Route Lookup Using Range Search2005Inngår i: The Third IASTED International Conference on Communications and Computer Networks,2005, 2005Konferansepaper (Fagfellevurdert)
  • 60.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Thinking outside the flow: Creating customized backend tools for Xilinx based designs2007Inngår i: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

  • 61.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Siverskog, Jacob
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Using Partial Reconfigurability to aid Debugging of FPGA Designs2011Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper discusses the use of partial reconfigurability in Xilinx FPGA designs in order to aid debugging. A debugging framework is proposed where the use of partial reconfigurability can allow for added flexibility by allowing a debugger to decide at run time what debugging module to use. This paper also presents an open source debugging tool which allows a user to read-out the contents of memory blocks in Xilinx designs without needing to use any JTAG adapter. This allows a user to debug an FPGA in situations which would otherwise be difficult, i.e. in the field.

  • 62.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wiklund, Daniel
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Feasibility study of a core router based on a network on chip2005Inngår i: Swedish System on Chip Conference SSoCC,2005, 2005Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    In this paper we investigate the feasibility of creating a core router based upon a network on chip. The investigated architecture uses 16x10-Gbit Ethernet ports. The purpose of this is to show that it is possible to create such a solution considering current process technologies. This is done through an analysis of the required chip area, clock frequencies, and pin count. The results show that such a solution is feasible and can be implemented as a single chip.

  • 63.
    Ehrenstråhle, Carl
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Polynomial Expansion-Based Displacement Calculation on FPGA2016Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    This thesis implements a system for calculating the displacement between two consecutive video frames. The displacement is calculated using a polynomial expansion-based algorithm. A unit tested bottoms-up approach is successfully used to design and implement the system. The designed and implemented system is thoroughly elaborated upon. The chosen algorithm and its computational details are presented to provide context to the implemented system. Some of the major issues and their impact on the system are discussed.

  • 64.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    ASIP for Wireless Communication and Media2010Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    While general purpose processors reach both high performance and high application flexibility, this comes at a high cost in terms of silicon area and power consumption. In systems where high application flexibility is not required, it is possible to trade off flexibility for lower cost by tailoring the processor to the application to create an Application Specific Instruction set Processor (ASIP) with high performance yet low silicon cost.

    This thesis demonstrates how ASIPs with application specific data types can provide efficient solutions with lower cost. Two examples are presented, an audio decoder ASIP for audio and music processing and a matrix manipulation ASIP for MIMO radio baseband signal processing.

    The audio decoder ASIP uses a 16-bit floating point data type to reduce the size of the data memory to about 60% of other solutions that use a 32-bit data type. Since the data memory occupies a major part of the silicon area, this has a significant impact on the total silicon area, and thereby also the static and dynamic power consumption. The data width reduction can be done without any noticeable artifacts in the decoded audio due to the natural masking effect ofthe human ear.

    The matrix manipulation SIMD ASIP is designed to perform various matrix operations such as matrix inversion and QR decomposition of small complex-valued matrices. This type of processing is found in MIMO radio baseband signal processing and the matrices are typically not larger than 4x4. There have been solutions published that use arrays of fixed-function processing elements to perform these operations, but the proposed ASIP performs the computations in less time and with lower hardware cost.

    The matrix manipulation ASIP data path uses a floating point data type to avoid data scaling issues associated with fixed point computations, especially those related to division and reciprocal calculations, and it also simplifies the program control flow since no special cases for certain inputs are needed which is especially important for SIMD architectures.

    These two applications were chosen to show how ASIPs can be a suitable alternative and match the requirements for different types of applications, to provide enough flexibility and performance to support different standards and algorithms with low hardware cost.

    Delarbeid
    1. Using low precision floating point numbers to reduce memory cost for MP3 decoding
    Åpne denne publikasjonen i ny fane eller vindu >>Using low precision floating point numbers to reduce memory cost for MP3 decoding
    2004 (engelsk)Inngår i: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, s. 119-122Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

    sted, utgiver, år, opplag, sider
    IEEE Xplore, 2004
    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-16559 (URN)10.1109/MMSP.2004.1436435 (DOI)0-7803-8578-0 (ISBN)
    Tilgjengelig fra: 2009-02-02 Laget: 2009-02-02 Sist oppdatert: 2015-02-18bibliografisk kontrollert
    2. Efficient Complex Matrix Inversion for MIMO Software Defined Radio
    Åpne denne publikasjonen i ny fane eller vindu >>Efficient Complex Matrix Inversion for MIMO Software Defined Radio
    2007 (engelsk)Inngår i: International Symposium on Circuits and Systems, ISCAS,2007, IEEE , 2007, s. 2610-2613Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    Complex matrix inversion is a very computationally demanding operation in advanced multi-antenna wireless communications. Traditionally, systolic array-based QR decomposition (QRD) is used to invert large matrices. However, the matrices involved in MIMO baseband processing in mobile handsets are generally small which means QRD is not necessarily efficient. In this paper, a new method is proposed using programmable hardware units which not only achieves higher performance but also consumes less silicon area. Furthermore, the hardware can be reused for many other operations such as complex matrix multiplication, filtering, correlation and FFT/IFFT.

    sted, utgiver, år, opplag, sider
    IEEE, 2007
    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-39855 (URN)10.1109/ISCAS.2007.377850 (DOI)51537 (Lokal ID)1-4244-0920-9 (ISBN)51537 (Arkivnummer)51537 (OAI)
    Konferanse
    nternational Symposium on Circuits and Systems (ISCAS 2007), 27-20 May, New Orleans, Louisiana, USA
    Tilgjengelig fra: 2009-10-10 Laget: 2009-10-10 Sist oppdatert: 2011-02-04
    3. Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding
    Åpne denne publikasjonen i ny fane eller vindu >>Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding
    Vise andre…
    2007 (engelsk)Inngår i: IEEE Sarnoff Symmposium,2007, 2007, s. 1-5Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    This paper studies efficient complex valued matrix manipulations for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) is proposed for the inversion of large complex matrices that are based on Alamouti sub-blocks. Another method using a variant of Givens rotation is proposed for fast QR decomposition of this kind of matrices. Our solutions significantly reduce the number of operations which makes them more than 4 times faster than several other solutions in the literature. Furthermore, compared to fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware is programmable and it can be reused for many other operations such as filtering, correlation and FFT/IFFT. Besides the analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix manipulations.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-39861 (URN)10.1109/SARNOF.2007.4567354 (DOI)51543 (Lokal ID)978-1-4244-2483-2 (ISBN)51543 (Arkivnummer)51543 (OAI)
    Konferanse
    Sarnoff Symposium, April 30-May 2, Nassau Inn, Princeton, NJ, USA
    Tilgjengelig fra: 2009-10-10 Laget: 2009-10-10 Sist oppdatert: 2011-02-04
    4. Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM
    Åpne denne publikasjonen i ny fane eller vindu >>Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM
    2008 (engelsk)Inngår i: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP,2008, IEEE , 2008, s. 5396-5399Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    This paper presents a linear minimum mean square error (LMMSE) symbol detector for MIMO-OFDM enabled mobile terminals. The detector is implemented using a programmable baseband processor aimed for software-defined radio (SDR). Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of detection. The programmable solution not only supports different transmit/receive antenna configurations, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional solutions, the one proposed in this paper is smaller, more flexible and faster.

    sted, utgiver, år, opplag, sider
    IEEE, 2008
    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-42734 (URN)10.1109/ICASSP.2008.4518880 (DOI)68460 (Lokal ID)978-1-4244-1483-3 (ISBN)68460 (Arkivnummer)68460 (OAI)
    Konferanse
    IEEE International Conference on Acoustics, Speech and Signal Processing, March 31-April 4, Las Vegas, NV, USA
    Tilgjengelig fra: 2009-10-10 Laget: 2009-10-10 Sist oppdatert: 2011-02-04bibliografisk kontrollert
    5. Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor
    Åpne denne publikasjonen i ny fane eller vindu >>Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor
    2008 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    This paper presents a space-time block coding decoder for MIMO-OFDM enabled mobile terminals. The decoder is implemented using a programmable baseband processor aimed for software-defined radio (SDR). The dynamic range supplied by the floating-point SIMD datapath allows special algorithms to significantly reduce the computational latency of decoding. The programmable solution not only supports different transmit/receive antenna configuration, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional ASIC solutions in literature, the one proposed in this paper is by far the smallest, fastest and with more flexibility.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-42763 (URN)10.1109/ICCSC.2008.65 (DOI)68620 (Lokal ID)978-1-4244-1707-0 (ISBN)68620 (Arkivnummer)68620 (OAI)
    Konferanse
    4th IEEE International Conference on Circuits and Systems for Communications, 26-28 May, Shanghai, China
    Tilgjengelig fra: 2009-10-10 Laget: 2009-10-10 Sist oppdatert: 2011-02-04bibliografisk kontrollert
  • 65.
    Eilert, Johan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Ehliar, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Using low precision floating point numbers to reduce memory cost for MP3 decoding2004Inngår i: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, s. 119-122Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

  • 66.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Ehliar, Andreas
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Design of a Floating Point DSP for Full Precision MPEG-I Layer II and III Decoding2005Inngår i: Swedish System on Cihip Conference SSoCC,2005, 2005Konferansepaper (Annet vitenskapelig)
  • 67.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Early Exploratioin of MIPS Cost and Memory Cost Trade-off for Media DSP Media Processor2006Inngår i: SSoCC Swedish System-on-Chip Conference,2006, 2006Konferansepaper (Annet vitenskapelig)
  • 68.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Efficient Complex Matrix Inversion for MIMO Software Defined Radio2007Inngår i: International Symposium on Circuits and Systems, ISCAS,2007, IEEE , 2007, s. 2610-2613Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Complex matrix inversion is a very computationally demanding operation in advanced multi-antenna wireless communications. Traditionally, systolic array-based QR decomposition (QRD) is used to invert large matrices. However, the matrices involved in MIMO baseband processing in mobile handsets are generally small which means QRD is not necessarily efficient. In this paper, a new method is proposed using programmable hardware units which not only achieves higher performance but also consumes less silicon area. Furthermore, the hardware can be reused for many other operations such as complex matrix multiplication, filtering, correlation and FFT/IFFT.

  • 69.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM2008Inngår i: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP,2008, IEEE , 2008, s. 5396-5399Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a linear minimum mean square error (LMMSE) symbol detector for MIMO-OFDM enabled mobile terminals. The detector is implemented using a programmable baseband processor aimed for software-defined radio (SDR). Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of detection. The programmable solution not only supports different transmit/receive antenna configurations, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional solutions, the one proposed in this paper is smaller, more flexible and faster.

  • 70.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor2008Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a space-time block coding decoder for MIMO-OFDM enabled mobile terminals. The decoder is implemented using a programmable baseband processor aimed for software-defined radio (SDR). The dynamic range supplied by the floating-point SIMD datapath allows special algorithms to significantly reduce the computational latency of decoding. The programmable solution not only supports different transmit/receive antenna configuration, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional ASIC solutions in literature, the one proposed in this paper is by far the smallest, fastest and with more flexibility.

  • 71.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wang, Dandan
    Al-Dhahir, Naofal
    Minn, Hlaing
    Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding2007Inngår i: IEEE Sarnoff Symmposium,2007, 2007, s. 1-5Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper studies efficient complex valued matrix manipulations for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) is proposed for the inversion of large complex matrices that are based on Alamouti sub-blocks. Another method using a variant of Givens rotation is proposed for fast QR decomposition of this kind of matrices. Our solutions significantly reduce the number of operations which makes them more than 4 times faster than several other solutions in the literature. Furthermore, compared to fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware is programmable and it can be reused for many other operations such as filtering, correlation and FFT/IFFT. Besides the analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix manipulations.

  • 72.
    Einemo, Jonas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Lundqvist, Magnus
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Selection of H.264 Encoder Components Implemented and Benchmarked on a Multi-core DSP Processor2010Independent thesis Advanced level (professional degree), 20 poäng / 30 hpOppgave
    Abstract [en]

    H.264 is a video coding standard which offers high data compression rate at the cost of a high computational load. This thesis evaluates how well parts of the H.264 standard can be implemented for a new multi-core digital signal processing processor architecture called ePUMA. The thesis investigates if real-time encoding of high definition video sequences could be performed. The implementation consists of the motion estimation, motion compensation, discrete cosine transform, inverse discrete cosine transform, quantization and rescaling parts of the H.264 standard. Benchmarking is done using the ePUMA system simulator and the results are compared to an implementation of an existing H.264 encoder for another multi-core processor architecture called STI Cell. The results show that the selected parts of the H.264 encoder could be run on 6 calculation cores in 5 million cycles per frame. This setup leaves 2 calculation cores to run the remaining parts of the encoder.

  • 73.
    Englund, Madeleine
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Hybrid Floating-point Units in FPGAs2012Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    Floating point numbers are used in many applications that  would be well suited to a higher parallelism than that offered in a CPU. In  these cases, an FPGA, with its ability to handle multiple calculations  simultaneously, could be the solution. Unfortunately, floating point  operations which are implemented in an FPGA is often resource intensive,  which means that many developers avoid floating point solutions in FPGAs or  using FPGAs for floating point applications.

    Here the potential to get less expensive floating point operations by using ahigher radix for the floating point numbers and using and expand the existingDSP block in the FPGA is investigated. One of the goals is that the FPGAshould be usable for both the users that have floating point in their designsand those who do not. In order to motivate hard floating point blocks in theFPGA, these must not consume too much of the limited resources.

    This work shows that the floating point addition will become smaller withthe use of the higher radix, while the multiplication becomes smaller by usingthe hardware of the DSP block. When both operations are examined at the sametime, it turns out that it is possible to get a reduced area, compared toseparate floating point units, by utilizing both the DSP block and higherradix for the floating point numbers.

  • 74.
    Eriksson, Henrik
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Elektroniska komponenter.
    Henriksson, Tomas
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Larsson-Edefors, Per
    Chalmers.
    Full-custom vs standard-cell based design - An adder comparison.2002Inngår i: Swedish System-on-Chip conference.,2002, 2002Konferansepaper (Annet vitenskapelig)
  • 75.
    Eriksson, Henrik
    et al.
    Dept of Computer Engineering Chalmers tekniska högskola.
    Henriksson, Tomas
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Larsson-Edefors, Per
    Dept of Computer Engineering Chalmers tekniska högskola.
    Svensson, Christer
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Elektroniska komponenter.
    Full-Custom vs. Standard-Cell Design Flow - An Adder Case Study2003Inngår i: Asia South Pacific Design Automation Conference,2003, 2003, s. 507-Konferansepaper (Fagfellevurdert)
  • 76.
    Ferdeen, Mats
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Reducing Energy Consumption Through Image Compression2016Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    The energy consumption to make the off-chip memory writing and readings are aknown problem. In the image processing field structure from motion simpler compressiontechniques could be used to save energy. A balance between the detected features suchas corners, edges, etc., and the degree of compression becomes a big issue to investigate.In this thesis a deeper study of this balance are performed. A number of more advancedcompression algorithms for processing of still images such as JPEG is used for comparisonwith a selected number of simpler compression algorithms. The simpler algorithms canbe divided into two categories: individual block-wise compression of each image andcompression with respect to all pixels in each image. In this study the image sequences arein grayscale and provided from an earlier study about rolling shutters. Synthetic data setsfrom a further study about optical flow is also included to see how reliable the other datasets are.

  • 77.
    Flordal, Oskar
    et al.
    Axis Communications AB .
    Flordal, Oskar
    Axis Communications AB .
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Accelerating CABAC Encoding for Multi-standard Media with Configurability2006Inngår i: IEEE IPDPS,2006, 2006Konferansepaper (Fagfellevurdert)
  • 78.
    Fries, Jakob
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Johansson, Simon
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Modular 3D Graphics Accelerator for FPGA2011Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    A modular and area-efficient 3D graphics accelerator for tile based rendering in FPGA systems has been designed and implemented. The accelerator supports a subset of OpenGL, with features such as mipmapping, multitexturing and blending. The accelerator consists of a software component for projection and clipping of triangles, as well as a hardware component for rasterization, coloring and video output. Trade-offs made between area, performance and functionality have been described and justified. In order to evaluate the functionality and performance of the accelerator, it has been tested with two different applications.

  • 79.
    Frisk, Erik
    et al.
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Residual Selection for Consistency Based Diagnosis Using Machine Learning Models2018Inngår i: IFAC PAPERSONLINE, ELSEVIER SCIENCE BV , 2018, Vol. 51, nr 24, s. 139-146Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A common architecture of model-based diagnosis systems is to use a set of residuals to detect and isolate faults. In the paper it is motivated that in many cases there are more possible candidate residuals than needed for detection and single fault isolation and key sources of varying performance in the candidate residuals are model errors and noise. This paper formulates a systematic method of how to select, from a set of candidate residuals, a subset with good diagnosis performance. A key contribution is the combination of a machine learning model, here a random forest model, with diagnosis specific performance specifications to select a high performing subset of residuals. The approach is applied to an industrial use case, an automotive engine, and it is shown how the trade-off between diagnosis performance and the number of residuals easily can be controlled. The number of residuals used are reduced from original 42 to only 12 without losing significant diagnosis performance. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

  • 80.
    Frisk, Erik
    et al.
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Jung, Daniel
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    A Toolbox for Analysis and Design of Model Based Diagnosis Systems for Large Scale Models2017Inngår i: IFAC PAPERSONLINE, ELSEVIER SCIENCE BV , 2017, Vol. 50, nr 1, s. 3287-3293Konferansepaper (Fagfellevurdert)
    Abstract [en]

    To facilitate the use of advanced fault diagnosis analysis and design techniques to industrial sized systems, there is a need for computer support. This paper describes a Matlab toolbox and evaluates the software on a challenging industrial problem, air-path diagnosis in an automotive engine. The toolbox includes tools for analysis and design of model based diagnosis systems for large-scale differential algebraic models. The software package supports a complete tool-chain from modeling a system to generating C-code for residual generators. Major design steps supported by the tool are modeling, fault diagnosability analysis, sensor selection, residual generator analysis, test selection, and code generation. Structural methods based on efficient graph theoretical algorithms are used in several steps. In the automotive diagnosis example, a diagnosis system is generated and evaluated using measurement data, both in fault-free operation and with faults injected in the control-loop. The results clearly show the benefit of the toolbox in a model-based design of a diagnosis system. Latest version of the toolbox can be downloaded at faultdiagnosistoolbox.github.io. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

  • 81.
    Frisk, Erik
    et al.
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Åslund, Jan
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Analysis and Design of Diagnosis Systems Based on the Structural Differential Index2017Inngår i: 20th IFAC World Congress, ELSEVIER SCIENCE BV , 2017, Vol. 50, nr 1, s. 12236-12242Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Structural approaches have shown to be useful for analyzing and designing diagnosis systems for industrial systems. In simulation and estimation literature, related theories about differential index have been developed and, also there, structural methods have been successfully applied for simulating large-scale differential algebraic models. A main contribution of this paper is to connect those theories and thus making the tools from simulation and estimation literature available for model based diagnosis design. A key step in the unification is an extension of the notion of differential index of exactly determined systems of equations to overdetermined systems of equations. A second main contribution is how differential-index can be used in diagnosability analysis and also in the design stage where an exponentially sized search space is significantly reduced. This allows focusing on residual generators where basic design techniques, such as standard state-observation techniques and sequential residual generation are directly applicable. The developed theory has a direct industrial relevance, which is illustrated with discussions on an automotive engine example. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

  • 82.
    Garrido Gálvez, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    A New Representation of FFT Algorithms Using Triangular Matrices2016Inngår i: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 63, nr 10, s. 1737-1745Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper we propose a new representation for FFT algorithms called the triangular matrix representation. This representation is more general than the binary tree representation and, therefore, it introduces new FFT algorithms that were not discovered before. Furthermore, the new representation has the advantage that it is simple and easy to understand, as each FFT algorithm only consists of a triangular matrix. Besides, the new representation allows for obtaining the exact twiddle factor values in the FFT flow graph easily. This facilitates the design of FFT hardware architectures. As a result, the triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.

  • 83.
    Garrido Gálvez, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    The Feedforward Short-Time Fourier Transform2016Inngår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, nr 9, s. 868-872Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This brief presents the feedforward short-time Fourier transform (STFT). This new approach is based on reusing the calculations of the STFT at consecutive time instants. This leads to significant savings in hardware components with respect to fast Fourier transform based STFTs. Furthermore, the feedforward STFT does not have the accumulative error of iterative STFT approaches. As a result, the proposed feedforward STFT presents an excellent tradeoff between hardware utilization and performance.

  • 84.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Andersson, Rikard
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska högskolan.
    Qureshi, Fahad
    Tampere University of Technology, Finland.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Multiplierless Unity-Gain SDF FFTs2016Inngår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 24, nr 9, s. 3003-3007Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this brief, we propose a novel approach to implement multiplierless unity-gain single-delay feedback fast Fourier transforms (FFTs). Previous methods achieve unity-gain FFTs by using either complex multipliers or nonunity-gain rotators with additional scaling compensation. Conversely, this brief proposes unity-gain FFTs without compensation circuits, even when using nonunity-gain rotators. This is achieved by a joint design of rotators, so that the entire FFT is scaled by a power of two, which is then shifted to unity. This reduces the amount of hardware resources of the FFT architecture, while having high accuracy in the calculations. The proposed approach can be applied to any FFT size, and various designs for different FFT sizes are presented.

  • 85.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Angel Sanchez, Miguel
    University of Politecn Madrid, Spain.
    Luisa Lopez-Vallejo, Maria
    University of Politecn Madrid, Spain.
    Grajal, Jesus
    University of Politecn Madrid, Spain.
    A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices2017Inngår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, nr 1, s. 375-379Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This brief presents a novel 4096-point radix-4 memory-based fast Fourier transform (FFT). The proposed architecture follows a conflict-free strategy that only requires a total memory of size N and a few additional multiplexers. The control is also simple, as it is generated directly from the bits of a counter. Apart from the low complexity, the FFT has been implemented on a Virtex-5 field programmable gate array (FPGA) using DSP slices. The goal has been to reduce the use of distributed logic, which is scarce in the target FPGA. With this purpose, most of the hardware has been implemented in DSP48E. As a result, the proposed FPGA is efficient in terms of hardware resources, as is shown by the experimental results.

  • 86.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Chen, Sau-Gee
    Natl Chiao Tung Univ, Taiwan.
    Feedforward FFT Hardware Architectures Based on Rotator Allocation2018Inngår i: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 65, nr 2, s. 581-592Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper, we present new feedforward FFT hardware architectures based on rotator allocation. The rotator allocation approach consists in distributing the rotations of the FFT in such a way that the number of edges in the FFT that need rotators and the complexity of the rotators are reduced. Radix-2 and radix-2(k) feedforward architectures based on rotator allocation are presented in this paper. Experimental results show that the proposed architectures reduce the hardware cost significantly with respect to previous FFT architectures.

  • 87.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Chen, Sau-Gee
    National Chiao Tung University, Taiwan.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    The Serial Commutator FFT2016Inngår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, nr 10, s. 974-978Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This brief presents a new type of fast Fourier transform (FFT) hardware architectures called serial commutator (SC) FFT. The SC FFT is characterized by the use of circuits for bit-dimension permutation of serial data. The proposed architectures are based on the observation that, in the radix-2 FFT algorithm, only half of the samples at each stage must be rotated. This fact, together with a proper data management, makes it possible to allocate rotations only every other clock cycle. This allows for simplifying the rotator, halving the complexity with respect to conventional serial FFT architectures. Likewise, the proposed approach halves the number of adders in the butterflies with respect to previous architectures. As a result, the proposed architectures use the minimum number of adders, rotators, and memory that are necessary for a pipelined FFT of serial data, with 100% utilization ratio.

  • 88.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Källström, Petter
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Kumm, Martin
    University of Kassel, Germany.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    CORDIC II: A New Improved CORDIC Algorithm2016Inngår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, nr 2, s. 186-190Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this brief, we present the CORDIC II algorithm. Like previous CORDIC algorithms, the CORDIC II calculates rotations by breaking down the rotation angle into a series of microrotations. However, the CORDIC II algorithm uses a novel angle set, different from the angles used in previous CORDIC algorithms. The new angle set provides a faster convergence that reduces the number of adders with respect to previous approaches.

  • 89.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Lopez-Vallejo, Maria Luisa
    Tech Univ Madrid, Spain.
    Chen, Sau-Gee
    Natl Chiao Tung Univ, Taiwan.
    Guest Editorial: Special Section on Fast Fourier Transform (FFT) Hardware Implementations2018Inngår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 90, nr 11, s. 1581-1582Artikkel i tidsskrift (Annet vitenskapelig)
    Abstract [en]

    n/a

  • 90.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Qureshi, Fahad
    Linköpings universitet, Institutionen för systemteknik, Elektroniksystem. Linköpings universitet, Tekniska högskolan.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Low-Complexity Multiplierless Constant Rotators Based on Combined Coefficient Selection and Shift-and-Add Implementation (CCSSI)2014Inngår i: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 61, nr 7, s. 2002-2012Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a new approach to design multiplierless constant rotators. The approach is based on a combined coefficient selection and shift-and-add implementation (CCSSI) for the design of the rotators. First, complete freedom is given to the selection of the coefficients, i.e., no constraints to the coefficients are set in advance and all the alternatives are taken into account. Second, the shift-and-add implementation uses advanced single constant multiplication (SCM) and multiple constant multiplication (MCM) techniques that lead to low-complexity multiplierless implementations. Third, the design of the rotators is done by a joint optimization of the coefficient selection and shift-and-add implementation. As a result, the CCSSI provides an extended design space that offers a larger number of alternatives with respect to previous works. Furthermore, the design space is explored in a simple and efficient way. The proposed approach has wide applications in numerous hardware scenarios. This includes rotations by single or multiple angles, rotators in single or multiple branches, and different scaling of the outputs. Experimental results for various scenarios are provided. In all of them, the proposed approach achieves significant improvements with respect to state of the art.

  • 91.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Unnikrishnan, Nanda K.
    Univ Minnesota, MN 55455 USA.
    Parhi, Keshab K.
    Univ Minnesota, MN 55455 USA.
    A Serial Commutator Fast Fourier Transform Architecture for Real-Valued Signals2018Inngår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 65, nr 11, s. 1693-1697Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This brief presents a novel pipelined architecture to compute the fast Fourier transform of real input signals in a serial manner, i.e., one sample is processed per cycle. The proposed architecture, referred to as real-valued serial commutator, achieves full hardware utilization by mapping each stage of the fast Fourier transform (FFT) to a half-butterfly operation that operates on real input signals. Prior serial architectures to compute FFT of real signals only achieved 50% hardware utilization. Novel data-exchange and data-reordering circuits are also presented. The complete serial commutator architecture requires 2 log(2) N - 2 real adders, log(2) N - 2 real multipliers, and N + 9 log(2) N - 19 real delay elements, where N represents the size of the FFT.

  • 92.
    Garrido, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Multiplexer and Memory-Efficient Circuits for Parallel Bit Reversal2019Inngår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 66, nr 4, s. 657-661Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This brief presents novel circuits for calculating the bit reversal on parallel data. The circuits consist of delays/memories and multiplexers, and have the advantage that they requires the minimum number of multiplexers among circuits for parallel bit reversal so far, as well as a small total memory.

  • 93.
    Garrido, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Acevedo, Miguel
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska fakulteten.
    Ehliar, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Challenging the Limits of FFT Performance on FPGAs2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper analyzes the limits of FFT performance on FPGAs. For this purpose, a FFT generation tool has been developed. This tool is highly parameterizable and allows for generating FFTs with different FFT sizes and amount of parallelization. Experimental results for FFT sizes from 16 to 65536, and 4 to 64 parallel samples have been obtained. They show that even the largest FFT architectures fit well in today's FPGAs, achieving throughput rates from several GSamples/s to tens of GSamples/s.

  • 94.
    Garrido, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Grajal, Jesus
    Univ Politecn Madrid, Spain.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Optimum Circuits for Bit-Dimension Permutations2019Inngår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 27, nr 5, s. 1148-1160Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper, we present a systematic approach to design hardware circuits for bit-dimension permutations. The proposed approach is based on decomposing any bit-dimension permutation into elementary bit-exchanges. Such decomposition is proven to achieve the theoretical minimum number of delays required for the permutation. This offers optimum solutions for multiple well-known problems in the literature that make use of bit-dimension permutations. This includes the design of permutation circuits for the fast Fourier transform, bit reversal, matrix transposition, stride permutations, and Viterbi decoders.

  • 95.
    Garrido, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Möller, K.
    University of Kassel, Kassel, Germany.
    Kumm, M.
    University of Kassel, Kassel, Germany.
    World’s Fastest FFT Architectures: Breaking the Barrier of 100 GS/s2019Inngår i: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 66, nr 4, s. 1507-1516Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents the fastest fast Fourier transform (FFT) hardware architectures so far. The architectures are based on a fully parallel implementation of the FFT algorithm. In order to obtain the highest throughput while keeping the resource utilization low, we base our design on making use of advanced shift-and-add techniques to implement the rotators and on selecting the most suitable FFT algorithms for these architectures. Apart from high throughput and resource efficiency, we also guarantee high accuracy in the proposed architectures. For the implementation, we have developed an automatic tool that generates the architectures as a function of the FFT size, input word length and accuracy of the rotations. We provide experimental results covering various FFT sizes, FFT algorithms, and field-programmable gate array boards. These results show that it is possible to break the barrier of 100 GS/s for FFT calculation.

  • 96.
    Ge, Hanxiao
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Investigation of LDPC code in DVB-S22012Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    As one of the most powerful error-correcting codes, Low-density parity check codes are widely used in digital communications. Because of the performance of LDPC codes are capable to close the shannon limited extraordinarily, LDPC codes are to be used in the new Digital Video Broadcast-Satellite-Second Generation(DVB-S2) and it is the first time that LDPC codes are included in the broadcast standard in 2003.

    In this thesis, a restructured parity-check matrices which can be divided into sub-matrices for LDPC code in DVB-S2 is provided. Corresponded to this restructured parity-check matrix, a reconstructed decoding table is invented. The encoding table of DVB-S2 standard only could obtain the unknown check nodes from known variable nodes, while the decoding table this thesis provided could obtain the unknown variable nodes from known check nodes what is exactly the Layered-massage passing algorithm needed. Layered-message passing algorithm which also known as "Turbo-decoding message passing" is used to reduce the decoding iterations and memory storage for messages. The thesis also investigate Bp algorithm, lambda-min algorithm, Min-sum algorithm and SISO-s algorithm, meanwhile, simulation results of these algorithms and schedules are also presented.

  • 97.
    Gebrewahid, Essayas
    et al.
    Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES), Sweden.
    Ali Arslan, Mehmet
    Lund University, Sweden.
    Karlsson, Andréas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Ul-Abdin, Zain
    Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES), Sweden.
    Support for Data Parallelism in the CAL Actor Language2016Inngår i: PROCEEDINGS OF THE 2016 3RD WORKSHOP ON PROGRAMMING MODELS FOR SIMD/VECTOR PROCESSING (WPMVP 2016), New York, NY: Association for Computing Machinery (ACM), 2016, s. 1-8Konferansepaper (Fagfellevurdert)
    Abstract [en]

    With the arrival of heterogeneous manycores comprising various features to support task, data and instruction-level parallelism, developing applications that take full advantage of the hardware parallel features has become a major challenge. In this paper, we present an extension to our CAL compilation framework (CAL2Many) that supports data parallelism in the CAL Actor Language. Our compilation framework makes it possible to program architectures with SIMD support using high-level language and provides efficient code generation. We support general SIMD instructions but the code generation backend is currently implemented for two custom architectures, namely ePUMA and EIT. Our experiments were carried out for two custom SIMD processor architectures using two applications.

    The experiment shows the possibility of achieving performance comparable to hand-written machine code with much less programming effort.

  • 98.
    Gonzalez, Maya
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Design and Implementation of a SATA Host Controller on a Spartan-6 FPGA2012Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    At Saab Dynamics AB there are a number of projects where cameras are an important part of a sensor system. Examples of such projects are monitoring for civil security and 3D mapping, where several cameras are used. The cameras can for example be located in airplanes, helicopters or cars and therefore it is important to have a robust function for recording data. One way to achieve a quick recording with sufficient storage size is to use SATA flash disks. To reduce the size and power consumption of the recording equipment and to enable project-specific adaptations it is desirable to use an FPGA as an interface to SATA devices. This thesis concerns the development of such an interface implemented on an FPGA. The theory behind the SATA interconnect standard is described along with the design work and its challenges.

  • 99.
    Gu, Haohao
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Zhang, He
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Implementation of CMMB System using Software Defined Radio (SDR) Platform2010Independent thesis Advanced level (degree of Master (Two Years)), 30 poäng / 45 hpOppgave
    Abstract [en]

    CMMB (China Multimedia Mobile Broadingcasting) is a wireless broadcastingchannel standard for low bandwidth, low cost hand-held digital TV is adopted byall continental Chinese government TV broadcasting companies and some HongKong private TV broadcasting companies. The business potential is high, yet thefuture is hard to predict because it might be replaced by GB200600 or DTMB. Thedigital modulation is based on OFDM with pilot supporting channel estimationand equalization and CP supporting multi-path induced ISI problems.This thesis investigates the implement a CMMB system using a SDR platform.Simulation chain was implemented using MATLAB with full data precision includingCMMB transmitter and receiver. The transmitter behavior model includes RSencoder, LDPC encoder, OFDM modulation, etc. The receiver behavior modelincludes OFDM demodulation, channel estimation, channel equalization, LDPCdecoder, RS decoder, etc. Different channel models emulating pathloss, whitenoise, multi-path, and glitch were modeled. Based on the simulation chain andchannel models, T-domain, F-domain channel estimator and equalizer were implemented,optimized. Optimized TD-FD models for different mobility scenarioswere proposed. The focus of the thesis is on 2D (FD-TD) channel estimation andequalization.

  • 100.
    Gunnarsson, Svante
    et al.
    Linköpings universitet, Institutionen för systemteknik, Reglerteknik. Linköpings universitet, Tekniska högskolan.
    Wiklund, Ingela
    Linköpings universitet, Tekniska högskolan.
    Svensson, Tomas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Kindgren, Annalena
    Linköpings universitet, Tekniska högskolan.
    Granath, Sten
    Linköpings universitet, Tekniska högskolan.
    Large Scale use of the CDIO Syllabus in Formulation of Program and Course Goals2007Inngår i: Proceedings of the 3rd International CDIO Conference, 2007Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A large scale application of the CDIO Syllabus in formulation of course and program goals is presented. The application involves all programs and courses within the engineering education at Linköping University. Key components in the work are course level ITU-matrices for mapping of the course contents to the CDIO Syllabus, and a suggested way to organize suitable verbs for formulation of learning outcomes according to the sections in the CDIO Syllabus

1234567 51 - 100 of 362
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf