liu.seSearch for publications in DiVA
Change search
Refine search result
123 1 - 50 of 136
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    2-D Realization of WiMAX Channel Interleaver for Efficient Hardware Implementation2009In: Proceedings of World Academy of Science, Engineering and Technology (ISSN: 2070-3740), 2009, p. 25-29Conference paper (Refereed)
    Abstract [en]

    The direct implementation of interleaver functions in WiMAX is not hardware efficient due to presence of complex functions. Also the conventional method i.e. using memories for storing the permutation tables is silicon consuming. This work presents a 2-D transformation for WiMAX channel interleaver functions which reduces the overall hardware complexity to compute the interleaver addresses on the fly.  A fully re-configurable architecture for address generation in WiMAX channel interleaver is presented, which consume 1.1 k-gates in total. It can be configured for any block size and any modulation scheme in WiMAX. The presented architecture can run at a frequency of 200 MHz, thus fully supporting high bandwidth requirements for WiMAX.

  • 2.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Dual standard re-configurable hardware interleaver for turbo decoding2008Conference paper (Refereed)
    Abstract [en]

    A very low cost re-configurable hardwareinterleaver for two standards, 3GPP-WCMDA and 3GPPLong Term Evolution (3GPP-LTE) is presented. Theinterleaver is a key component of radio communicationsystems. Using conventional design methods, it consumes alarge part of silicon area in the design of turbo encoder anddecoder. The presented hardware interleaver addressgeneration architecture, utilizes the algorithmic levelhardware simplifications to achieve very low cost solution.After doing the hardware optimizations the proposedarchitecture consumes only 3.1k gates with a 256x8 bitmemory for the fully re-configurable dual standardinterleaver address generator. The interleaved address iscomputed every clock cycle except the case of pruning (ifblock size is less than the row-column matrix) in 3GPPWCDMA.In this case one additional clock cycle is consumedfor valid address generation.

  • 3.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low Complexity Hardware Interleaver for MIMO-OFDM based Wireless LAN2009In: Proceedings - IEEE International Symposium on Circuits and Systems, 2009, p. 1747-1750Conference paper (Refereed)
    Abstract [en]

    A low complexity hardware interleaver architecture is presented for MIMO-OFDM based Wireless LAN e.g. 802.11n. Novelty of the presented architecture is twofold; 1) Flexibility to choose interleaver implementation with different modulation scheme and different size for different spatial streams in a multi antenna system, 2) Complexity to compute on the fly interleaver address is reduce by using recursion and is supported by mathematical formulation. The proposed interleaver architecture is implemented on 65nm CMOS process and it consumes 0.035 mm2 area. The proposed architecture supports high speed communication with maximum throughput of 900 Mbps at a clock rate of 225 MHz.

  • 4.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low Complexity Multi Mode Interleaver Core for WiMax with Support for Convolutional Interleaving2009In: International Journal of Electronics, Communications and Computer Engineering, Vol. 1, no 1, p. 20-29Article in journal (Refereed)
    Abstract [en]

    A hardware efficient, multi mode, re-configurable architecture of interleaver/de-interleaver for multiple standards, like DVB, WiMAX and WLAN is presented. The interleavers consume a large part of silicon area when implemented by using conventional methods as they use memories to store permutation patterns. In addition, different types of interleavers in different standards cannot share the hardware due to different construction methodologies. The novelty of the work presented in this paper is threefold: 1) Mapping of vital types of interleavers including convolutional interleaver onto a single architecture with flexibility to change interleaver size; 2) Hardware complexity for channel interleaving in WiMAX is reduced by using 2-D realization of the interleaver functions; and 3) Silicon cost overheads reduced by avoiding the use of small memories. The proposed architecture consumes 0.18mm2 silicon area for 0.12μm process and can operate at a frequency of 140 MHz. The reduced complexity helps in minimizing the memory utilization, and at the same time provides strong support to on-the-fly computation of permutation patterns.

  • 5.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Multimode flex-interleaver core for baseband processor platform2010In: Journal of Computer Systems, Networks and Communications, ISSN 1687-7381, Vol. 2010, p. 1-16Article in journal (Refereed)
    Abstract [en]

    This paper presents a flexible interleaver architecture supportingmultiple standards likeWLAN,WiMAX, HSPA+, 3GPP-LTE, and DVB. Algorithmic level optimizations like 2D transformation and realization of recursive computation are applied, which appear to be the key to reach to an efficient hardware multiplexing among different interleaver implementations. The presented hardware enables the mapping of vital types of interleavers including multiple block interleavers and convolutional interleaver onto a single architecture. By exploiting the hardware reuse methodology the silicon cost is reduced, and it consumes 0.126mm2 area in total in 65nm CMOS process for a fully reconfigurable architecture. It can operate at a frequency of 166 MHz, providing a maximum throughput up to 664 Mbps for a multistream system and 166 Mbps for single stream communication systems, respectively. One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by this hardware with minimal cycle cost overheads. Maximum flexibility and fast switchability among multiple standards during run time makes the proposed architecture a right choice for the radio baseband processing platform.

  • 6.
    Asghar, Rizwan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Programmable Parallel Data-path for FEC2007In: Swedish System-on-Chip Conference, SSoCC,2007, 2007Conference paper (Other academic)
  • 7.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Towards Radix-4, Parallel Interleaver Design to Support High-Throughput Turbo Decoding for Re-Configurability2010Conference paper (Refereed)
    Abstract [en]

    Parallel, radix-4 turbo decoding is used to enhance the throughput and at the same time reduce the overall memory cost. The bottleneck is the higher complexity associated with radix-4 parallel interleaver implementation. This paper addresses the implementation issues of radix-4, parallel interleaver and also proposes necessary modifications in the interleaver algorithms for parallel address generation. It presents a re-configurable architecture which enables the use of same turbo decoding core to be used for multiple standards. The proposed interleaver architecture is capable of handling the memory conflicts on-the-fly. It consumes 12.5K gates and can run at a frequency of 285MHz, thus supporting a throughput of 173.3Mpbs, which can cover most of the emerging communication standards.

  • 8.
    Asghar, Rizwan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Very Low Cost Configurable Hardware Interleaver for 3G Turbo Decoding2008In: IEEE Internation Conference on Information and Communication Tech from Theory to Applications, ICTTA,2008, IEEE , 2008, p. 2314-2318Conference paper (Refereed)
    Abstract [en]

    A very low cost hardware interleaver for 3rd Generation Partnership Project (3GPP) turbo coding algorithm is presented. The interleaver is a key component of turbo codes and it is used to minimize the effect of burst errors in the transmission. Using conventional design methods, it consumes a large part of silicon area in the design of turbo encoder and decoder. The presented hardware interleaver architecture utilizes the algorithmic level hardware simplifications as well as the iterative modulo computation to achieve very low cost solution. After doing the hardware multiplexing and optimization the proposed architecture consumes only 1.5 k gates (without pre-computation) and 2.2 k gates (with pre-computation). In both cases the interleaved address is computed every clock cycle except the case of pruning, in which one additional clock cycle is consumed.

  • 9.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding2010In: Journal of Signal Processing Systems for Signal, Image, and Video Technology, ISSN 1939-8018, Vol. 60, no 1, p. 15-29Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used indifferent standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.

  • 10.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Memory Conflict Analysis and Interleaver Design for Parallel Turbo Decoding Supporting HSPA Evolution2009In: 12th EUROMICRO Conference on Digital System Design, 2009, p. 699-706Conference paper (Refereed)
    Abstract [en]

    HSPA evolution has raised the throughput requirements for WCDMA based systems where turbo code has been adapted to perform the error correction. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithm used in WCDMA based systems does not freely allows to use them due to high percentage of memory conflicts. This paper provides a comprehensive analysis for reduction of interleaver memory conflicts while generating more than one address in a single clock cycle. It also provides trade-off analysis in terms of area and power efficiency for multiple architectures for different functions involved in the interleaver design. The final architecture supports processing of two parallel SISO blocks and manages the conflicts by applying different approaches like stream misalignment, memory division and small FIFO buffer. The proposed architecture is low cost and consumes 4.3K gates at a frequency of 150MHz. This work also focuses on reduction of pre-processing overheads by introducing the segment based modulo computation, thus providing further relaxation to SISO decoding process.

  • 11.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Saeed, Ali
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Huang, Yulin
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support2012In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 66, no 1, p. 25-41Article in journal (Refereed)
    Abstract [en]

    This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm(2) area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm(2)/Mbps and energy efficiency of 0.55 nJ/b/iter.

  • 12.
    Di, Wu
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Nilsson, Anders
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Tell, Eric
    Coresonic AB, Linköping.
    Alfredsson, Erik
    Coresonic AB, Linköping.
    System Architecture for 3GPP LTE Modem Using a Programmable Baseband Processo2009In: International Symposium on System-on-Chip (SoC 2009), 2009Conference paper (Refereed)
    Abstract [en]

    3G evolution towards HSPA and LTE is ongoing which will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing such as time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier demapping. The throughput and latency requirements of a Category 4 User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ. This brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution.

  • 13.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Karlström, Per
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA2008In: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, p. 599-602Conference paper (Refereed)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

  • 14.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Eilert, Johan
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Comparison of Three FPGA Optimized NoC Architectures2007In: Swedish System-on-Chip Conference, SSoCC,2007, 2007Conference paper (Other academic)
  • 15.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Network on Chip based gigabit Ethernet router implemented on an FPGA2006In: SSoCC Swedish System-on-Chip Conference,2006, 2006Conference paper (Other academic)
  • 16.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    An Asic Perspective on FPGA Optimizations2009In: 19th International Conference on Field Programmable Logic and Applications (FPL), 2009, p. 218-223Conference paper (Refereed)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

  • 17.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering.
    Liu, Dake
    Linköping University, Department of Electrical Engineering.
    An ASIC Perspective on High Performance FPGA Design2009Conference paper (Refereed)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

  • 18.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    An FPGA based Open Source Network-on-chip Architecture2007In: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, p. 800-803Conference paper (Refereed)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

  • 19.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Benchmarking network processors2004In: Swedish System-on-Chip Conference,2004, 2004Conference paper (Other academic)
  • 20.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Flexible Route Lookup Using Range Search2005In: The Third IASTED International Conference on Communications and Computer Networks,2005, 2005Conference paper (Refereed)
  • 21.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Thinking outside the flow: Creating customized backend tools for Xilinx based designs2007In: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Conference paper (Refereed)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

  • 22.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wiklund, Daniel
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Feasibility study of a core router based on a network on chip2005In: Swedish System on Chip Conference SSoCC,2005, 2005Conference paper (Other academic)
    Abstract [en]

    In this paper we investigate the feasibility of creating a core router based upon a network on chip. The investigated architecture uses 16x10-Gbit Ethernet ports. The purpose of this is to show that it is possible to create such a solution considering current process technologies. This is done through an analysis of the required chip area, clock frequencies, and pin count. The results show that such a solution is feasible and can be implemented as a single chip.

  • 23.
    Eilert, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Using low precision floating point numbers to reduce memory cost for MP3 decoding2004In: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, p. 119-122Conference paper (Refereed)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

  • 24.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Early Exploratioin of MIPS Cost and Memory Cost Trade-off for Media DSP Media Processor2006In: SSoCC Swedish System-on-Chip Conference,2006, 2006Conference paper (Other academic)
  • 25.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Efficient Complex Matrix Inversion for MIMO Software Defined Radio2007In: International Symposium on Circuits and Systems, ISCAS,2007, IEEE , 2007, p. 2610-2613Conference paper (Refereed)
    Abstract [en]

    Complex matrix inversion is a very computationally demanding operation in advanced multi-antenna wireless communications. Traditionally, systolic array-based QR decomposition (QRD) is used to invert large matrices. However, the matrices involved in MIMO baseband processing in mobile handsets are generally small which means QRD is not necessarily efficient. In this paper, a new method is proposed using programmable hardware units which not only achieves higher performance but also consumes less silicon area. Furthermore, the hardware can be reused for many other operations such as complex matrix multiplication, filtering, correlation and FFT/IFFT.

  • 26.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM2008In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP,2008, IEEE , 2008, p. 5396-5399Conference paper (Refereed)
    Abstract [en]

    This paper presents a linear minimum mean square error (LMMSE) symbol detector for MIMO-OFDM enabled mobile terminals. The detector is implemented using a programmable baseband processor aimed for software-defined radio (SDR). Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of detection. The programmable solution not only supports different transmit/receive antenna configurations, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional solutions, the one proposed in this paper is smaller, more flexible and faster.

  • 27.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor2008Conference paper (Refereed)
    Abstract [en]

    This paper presents a space-time block coding decoder for MIMO-OFDM enabled mobile terminals. The decoder is implemented using a programmable baseband processor aimed for software-defined radio (SDR). The dynamic range supplied by the floating-point SIMD datapath allows special algorithms to significantly reduce the computational latency of decoding. The programmable solution not only supports different transmit/receive antenna configuration, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional ASIC solutions in literature, the one proposed in this paper is by far the smallest, fastest and with more flexibility.

  • 28.
    Eilert, Johan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wang, Dandan
    Al-Dhahir, Naofal
    Minn, Hlaing
    Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding2007In: IEEE Sarnoff Symmposium,2007, 2007, p. 1-5Conference paper (Refereed)
    Abstract [en]

    This paper studies efficient complex valued matrix manipulations for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) is proposed for the inversion of large complex matrices that are based on Alamouti sub-blocks. Another method using a variant of Givens rotation is proposed for fast QR decomposition of this kind of matrices. Our solutions significantly reduce the number of operations which makes them more than 4 times faster than several other solutions in the literature. Furthermore, compared to fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware is programmable and it can be reused for many other operations such as filtering, correlation and FFT/IFFT. Besides the analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix manipulations.

  • 29.
    Flordal, Oskar
    et al.
    Axis Communications AB .
    Flordal, Oskar
    Axis Communications AB .
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wu, Di
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Accelerating CABAC Encoding for Multi-standard Media with Configurability2006In: IEEE IPDPS,2006, 2006Conference paper (Refereed)
  • 30.
    Haque, Muhammad Fahim Ul
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Johansson, Ted
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, Faculty of Science & Engineering.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Combined RF and Multiphase PWM Transmitter2015In: 2015 European Conference on Circuit Theory and Design (ECCTD), IEEE , 2015, p. 264-267Conference paper (Refereed)
    Abstract [en]

    This paper presents two novel transmitter architectures based on the combination of radio-frequency pulse-width modulation and multiphase pulse-width modulation. The proposed transmitter architectures provide good amplitude resolution and large dynamic range at high carrier frequency, which is problematic with existing radio-frequency pulse-width modulation based transmitters. They also have better power efficiency and smaller chip area compared to multiphase pulse-width modulation based transmitters.

  • 31.
    Haque, Muhammad Fahim Ul
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Johansson, Ted
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, Faculty of Science & Engineering.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Combined RF and Multiphase PWM Transmitter2015Conference paper (Other academic)
  • 32.
    Haque, Muhammad Fahim Ul
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Johansson, Ted
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, Faculty of Science & Engineering.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Modified Band-limited Pulse-Width Modulated Polar Transmitter2015Conference paper (Other academic)
  • 33.
    Haque, Muhammad Fahim Ul
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Johansson, Ted
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Modified Multilevel PWM Switch Mode Power Amplifier2014Conference paper (Other academic)
  • 34.
    Henriksson, Tomas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eriksson, Henrik
    Linköping University, Department of Electrical Engineering, Electronic Devices. Linköping University, The Institute of Technology.
    Nordqvist, Ulf
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Larsson-Edefors, Per
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    VLSI Implementation of CRC-32 for 10 Gigabit Ethernet2001In: The 8th IEEE International Conference on Electronics, Circuits and Systems, 2001: ICECS 2001, 2001, p. 1215-1218Conference paper (Refereed)
    Abstract [en]

    For 10 Gigabit Ethernet a CRC-32 generation is essential and timing critical. Many efficient software algorithms have been proposed for CRC generation. In this work we use an algorithm based on the properties of Galois fields, which gives very efficient hardware. The CRC generator has been implemented and simulated in both standard cells and a full-custom design technique. In standard cells from the UMC 0.18 micron library a throughput of 8.7 Gb/s has been achieved. In the full-custom design for AMS 0.35 micron process we have achieved a throughput of 5.0 Gb/s. The conclusion, based on extrapolation of device characteristics, is that CRC-32 generation for 10 Gb/s can be designed with standard cells in a 0.15 micron process technology, or using full-custom design techniques in a 0.18 micron process technology

  • 35.
    Henriksson, Tomas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Implementation of Fast CRC Calculation2003In: Asia South Pacific Design Automation Conference,2003, 2003, p. 563-Conference paper (Refereed)
  • 36.
    Henriksson, Tomas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Novel ASIP and Processor Architecture for Packet Decoding2002In: Workshop of Application Specific Processors Digest,2002, 2002, p. 25-Conference paper (Refereed)
  • 37.
    Henriksson, Tomas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Nordqvist, Ulf
    Linköping University, Department of Electrical Engineering, Electronic Devices. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Configurable Port Processor Increases Flexibility in the Protocol Processing Area2000In: Proceedings of COOLChips III An International Symposium on Low-Power and High-Speed Chips, 2000, p. 275-Conference paper (Other academic)
    Abstract [en]

    The limitation in networking is no longer only the physical transmission media but also the end equipment, which has to process the protocol control fields. In most end terminals this processing has been performed by the main processor, but different types of co-processor have lately appeared to relieve it from this task. These co-processors have high power consumption since they are based on a RISC core. Instead ASIC:s can be used, but they lack flexibility and are specific for only one single protocol. It is clear that a new approach is needed.

    (...)

  • 38.
    Henriksson, Tomas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Nordqvist, Ulf
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Embedded Protocol Processor for Fast and Efficient Packet Reception2002In: International Conference on Computer Design,2002, 2002, p. 414-Conference paper (Refereed)
  • 39.
    Henriksson, Tomas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Nordqvist, Ulf
    Linköping University, Department of Electrical Engineering, Electronic Devices. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Specification of a configurable general-purpose protocol processor2000In: Proceedings of Second International Symposium on Communication Systems, Networks and Digital Signal Processing, 2000, p. 284-289Conference paper (Other academic)
    Abstract [en]

    A general-purpose protocol processor is specified with a dedicated architecture for protocol processing. This paper defines a functional coverage, analyses the control requirements, specifies functional pages and a controller unit. The general-purpose protocol processor is aimed for network terminals, therefore routing is not completely supported. However it should be possible to use it as part of a router with some minor modifications. The general-purpose protocol processor is partitioned into two parts, a configurable stand alone part and a program based microcontroller. The configurable part performs the protocol processing without any running program. The processor does not execute any cycle based program, instead execution is controlled by configuration vectors and control vectors. The microcontroller assists with the interface to the host processor and handles the configuration. It is concluded that by partitioning the control into three levels, the architecture is flexible and verification is simplified. The proposed architecture also has higher performance and lower power dissipation than other solutions.

  • 40.
    Henriksson, Tomas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Nordqvist, Ulf
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Specification of a configurable general-purpose protocol processor2002In: IEE Proceedings - Circuits Devices and Systems, ISSN 1350-2409, E-ISSN 1359-7000, Vol. 149, no 3, p. 198-202Article in journal (Refereed)
    Abstract [en]

    A general-purpose protocol processor is specified with a dedicated architecture for protocol processing. The paper defines a functional coverage, analyses the control requirements, and specifies functional pages and a controller unit. The general-purpose protocol processor is for network terminals, and therefore routing is not completely supported. However, it should be possible to use it as part of a router. with some minor modifications. The general-purpose protocol processor is partitioned into two parts: a configurable stand-alone part and a program based microcontroller. The configurable part performs the protocol processing without any running program. The processor does not execute any cycle based program; instead execution is controlled by configuration vectors and control vectors. The microcontroller assists with the interface to the host processor and handles the configuration. It is concluded that by partitioning the control into three levels, the architecture is flexible and verification is simplified. The proposed architecture also has higher performance and lower power dissipation than other solutions

  • 41.
    Henriksson, Tomas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Persson, Niclas
    Linköping University, Department of Electrical Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    VLSI Implementation of Internet Checksum Calculation for 10 gigabit Ethernet2002In: Design and Diagnostics of Electronics, Circuits and Systems,2002, 2002, p. 114-Conference paper (Refereed)
  • 42.
    Henriksson, Tomas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Wiklund, Daniel
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    VLSI implementation of a switch for on-chip networks2003In: Int workshop on Design and diagnostics of electronic circuits and systems DDECS,2003, 2003Conference paper (Refereed)
  • 43. Jiao, Haiyan
    et al.
    Nilsson, Anders
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Tell, Eric
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    MIPS Cost Estimation for OFDM-VBLAST systems2006In: WCNC, IEEE Wireless Communications and Networking,2006, 2006Conference paper (Refereed)
  • 44.
    Karlsson, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Sohl, Joar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Cost-efficient Mapping of 3- and 5-point DFTs to General Baseband Processors2015In: International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015, p. 780-784Conference paper (Refereed)
    Abstract [en]

    Discrete Fourier transforms of 3 and 5 points are essential building blocks in FFT implementations for standards such as 3GPP-LTE. In addition to being more complex than 2 and 4 point DFTs, these DFTs also cause problems with data access in SDR-DSPs, since the data access width, in general, is a power of 2. This work derives mappings of these DFTs to a 4-way SIMD datapath that has been designed with 2 and 4-point DFT in mind. Our instruction set proposals, based on modified Winograd DFT, achieves single cycle execution of 3-point DFTs and 2.25 cycle average execution of 5-point DFTs in a cost-effective manner by reutilizing the already available arithmetic units. This represents an approximate speed-up of 3 times compared to an SDR-DSP with only MAC-support. In contrast to our more general design, we also demonstrate that a typical single-purpose FFT-specialized 5-way architecture only delivers 9% to 25% extra performance on average, while requiring 85% more arithmetic units and a more expensive memory subsystem.

  • 45.
    Karlsson, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Sohl, Joar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Energy-efficient sorting with the distributed memory architecture ePUMA2015In: IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Institute of Electrical and Electronics Engineers (IEEE), 2015, p. 116-123Conference paper (Refereed)
    Abstract [en]

    This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks suitable for distributed memory architectures, which allows very simple scheduling and predictable data-independent sorting time.The implementation on ePUMA utilizes the architecture's specialized compute cores and control cores, and local memory parallelism, to separate and overlap sorting with data access and control for close to stall-free sorting.Penalty-free unaligned and out-of-order local memory access is used in combination with proposed application-specific sorting instructions to derive highly efficient local sorting and merging kernels used by the system-level algorithm.Our evaluation shows that the proposed implementation can rival the sorting performance of high-performance commercial CPUs and GPUs, with two orders of magnitude higher energy efficiency, which would allow high-performance sorting on low-power devices.

  • 46.
    Karlsson, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Sohl, Joar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    ePUMA: A Processor Architecture for Future DSP2015In: International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015, 2015, p. 253-257Conference paper (Refereed)
    Abstract [en]

    Since the breakdown of Dennard scaling the primary design goal for processor designs has shifted from increasing performance to increasing performance per Watt. The ePUMA platform is a flexible and configurable DSP platform that tries to address many of the problems with traditional DSP designs, to increase  performance, but use less power. We trade the flexibility of traditional VLIW DSP designs for a simpler single instruction issue scheme and instead make sure that each instruction can perform more work. Multi-cycle instructions can operate directly on vectors and matrices in memory and the datapaths implement common DSP subgraphs directly in hardware, for high compute throughput. Memory bottlenecks, that are common in other architectures, are handled with flexible LUT-based multi-bank memory addressing and memory parallelism. A major contributor to energy consumption, data movement, is reduced by using heterogeneous interconnect and clustering compute resources around local memories for simple data sharing. To evaluate ePUMA we have implemented the majority of the kernel library from a commercial VLIW DSP manufacturer for comparison. Our results not only show good performance, but also an order of magnitude increase in energy- and area efficiency. In addition, the kernel code size is reduced by 91% on average compared to the VLIW DSP. These benefits makes ePUMA an attractive solution for future DSP.

  • 47.
    Karlsson, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Sohl, Joar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Software-based QPP Interleaving for Baseband DSPs with LUT-accelerated Addressing2015In: International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015Conference paper (Refereed)
    Abstract [en]

    This paper demonstrates how QPP interleaving and de-interleaving for Turbo decoding in 3GPP-LTE can be implemented efficiently on baseband processors with lookup-table (LUT) based addressing support of multi-bank memory. We introduce a LUT-compression technique that reduces LUT size to 1% of what would otherwise be needed to store the full data access patterns for all LTE block sizes. By reusing the already existing program memory of a baseband processor to store LUTs and using our proposed general address generator, our 8-way data access path can reach the same throughput as a dedicated 8-way interleaving ASIC implementation. This avoids the addition of a dedicated interleaving address generator to a processor which,  according to ASIC synthesis, would be 75\% larger than our proposed address generator. Since our software implementation only involves the address generator, the processor's datapaths are free to perform the other operations of Turbo decoding in parallel with interleaving. Our software implementation ensure programmability and flexibility and is the fastest software-based implementation of QPP interleaving known to us.

  • 48.
    Karlsson, Andréas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Sohl, Joar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wang, Jian
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    ePUMA: A unique memory access based parallel DSP processor for SDR and CR2013In: Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, IEEE , 2013, p. 1234-1237Conference paper (Refereed)
    Abstract [en]

    This paper presents ePUMA, a master-slave heterogeneous DSP processor for communications and multimedia. We introduce the ePUMA VPE, a vector processing slave-core designed for heavy DSP workloads and demonstrate how its features can used to implement DSP kernels that efficiently overlap computing, data access and control to achieve maximum datapath utilization. The efficiency is evaluated by implementing a basic set of kernels commonly used in SDR. The experiments show that all kernels asymptotically reach above 90% effective datapath utilization. while many approach 100%, thus the design effectively overlaps computing, data access and control. Compared to popular VLIW solutions, the need for a large register file with many ports is eliminated, thus saving power and chip area. When compared to a commercial VLIW solution, our solution also achieves code size reductions of up to 30 times and a significantly simplified kernel implementation.

  • 49.
    Karlström, Per
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 42008In: IET Computers and digital techniques, ISSN 1751-8601, Vol. 2, p. 305-313Article in journal (Refereed)
    Abstract [en]

    There is increasing interest about floating-point arithmetics in field programmable gate arrays (FPGAs) because of the increase in their size and performance. FPGAs are generally good at bit manipulations and fixed-point arithmetics, but they have a harder time coping with floating-point arithmetics. An architecture used to construct high-performance floating-point components in a Virtex-4 FPGA is described in detail. Floating-point adder/subtracter and multiplier units have been constructed. The adder/subtracter can operate at a frequency of 377 MHz in a Virtex-4SX35 (speed grade -12).

  • 50.
    Karlström, Per
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    High Performance, Low Latency FPGA based Floating Point Adder and Multiplier Units in a Virtex 42006In: NORCHIP 2006: The Nordic Microelectronics Event. 2006, 2006, p. 31-34Conference paper (Refereed)
    Abstract [en]

    Since the invention of FPGAs, the increase in their size and performance has allowed designers to use FPGAs for more complex designs. FPGAs are generally good at bit manipulations and fixed point arithmetics but has a harder time coping with floating point arithmetics. In this paper we describe methods used to construct high performance floating point components in a Virtex-4. We have constructed a floating point adder/subtracter and multiplier which we then used to construct a complex radix-2 butterfly. Our adder/subtracter can operate at a frequency of 361 MHz in a Virtex-4SX35 (speed grade -12)

123 1 - 50 of 136
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf