liu.seSearch for publications in DiVA
Endre søk
Begrens søket
123 1 - 50 of 137
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    2-D Realization of WiMAX Channel Interleaver for Efficient Hardware Implementation2009Inngår i: Proceedings of World Academy of Science, Engineering and Technology (ISSN: 2070-3740), 2009, s. 25-29Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The direct implementation of interleaver functions in WiMAX is not hardware efficient due to presence of complex functions. Also the conventional method i.e. using memories for storing the permutation tables is silicon consuming. This work presents a 2-D transformation for WiMAX channel interleaver functions which reduces the overall hardware complexity to compute the interleaver addresses on the fly.  A fully re-configurable architecture for address generation in WiMAX channel interleaver is presented, which consume 1.1 k-gates in total. It can be configured for any block size and any modulation scheme in WiMAX. The presented architecture can run at a frequency of 200 MHz, thus fully supporting high bandwidth requirements for WiMAX.

  • 2.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Dual standard re-configurable hardware interleaver for turbo decoding2008Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A very low cost re-configurable hardwareinterleaver for two standards, 3GPP-WCMDA and 3GPPLong Term Evolution (3GPP-LTE) is presented. Theinterleaver is a key component of radio communicationsystems. Using conventional design methods, it consumes alarge part of silicon area in the design of turbo encoder anddecoder. The presented hardware interleaver addressgeneration architecture, utilizes the algorithmic levelhardware simplifications to achieve very low cost solution.After doing the hardware optimizations the proposedarchitecture consumes only 3.1k gates with a 256x8 bitmemory for the fully re-configurable dual standardinterleaver address generator. The interleaved address iscomputed every clock cycle except the case of pruning (ifblock size is less than the row-column matrix) in 3GPPWCDMA.In this case one additional clock cycle is consumedfor valid address generation.

  • 3.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Low Complexity Hardware Interleaver for MIMO-OFDM based Wireless LAN2009Inngår i: Proceedings - IEEE International Symposium on Circuits and Systems, 2009, s. 1747-1750Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A low complexity hardware interleaver architecture is presented for MIMO-OFDM based Wireless LAN e.g. 802.11n. Novelty of the presented architecture is twofold; 1) Flexibility to choose interleaver implementation with different modulation scheme and different size for different spatial streams in a multi antenna system, 2) Complexity to compute on the fly interleaver address is reduce by using recursion and is supported by mathematical formulation. The proposed interleaver architecture is implemented on 65nm CMOS process and it consumes 0.035 mm2 area. The proposed architecture supports high speed communication with maximum throughput of 900 Mbps at a clock rate of 225 MHz.

  • 4.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Low Complexity Multi Mode Interleaver Core for WiMax with Support for Convolutional Interleaving2009Inngår i: International Journal of Electronics, Communications and Computer Engineering, Vol. 1, nr 1, s. 20-29Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A hardware efficient, multi mode, re-configurable architecture of interleaver/de-interleaver for multiple standards, like DVB, WiMAX and WLAN is presented. The interleavers consume a large part of silicon area when implemented by using conventional methods as they use memories to store permutation patterns. In addition, different types of interleavers in different standards cannot share the hardware due to different construction methodologies. The novelty of the work presented in this paper is threefold: 1) Mapping of vital types of interleavers including convolutional interleaver onto a single architecture with flexibility to change interleaver size; 2) Hardware complexity for channel interleaving in WiMAX is reduced by using 2-D realization of the interleaver functions; and 3) Silicon cost overheads reduced by avoiding the use of small memories. The proposed architecture consumes 0.18mm2 silicon area for 0.12μm process and can operate at a frequency of 140 MHz. The reduced complexity helps in minimizing the memory utilization, and at the same time provides strong support to on-the-fly computation of permutation patterns.

  • 5.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Multimode flex-interleaver core for baseband processor platform2010Inngår i: Journal of Computer Systems, Networks and Communications, ISSN 1687-7381, Vol. 2010, s. 1-16Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a flexible interleaver architecture supportingmultiple standards likeWLAN,WiMAX, HSPA+, 3GPP-LTE, and DVB. Algorithmic level optimizations like 2D transformation and realization of recursive computation are applied, which appear to be the key to reach to an efficient hardware multiplexing among different interleaver implementations. The presented hardware enables the mapping of vital types of interleavers including multiple block interleavers and convolutional interleaver onto a single architecture. By exploiting the hardware reuse methodology the silicon cost is reduced, and it consumes 0.126mm2 area in total in 65nm CMOS process for a fully reconfigurable architecture. It can operate at a frequency of 166 MHz, providing a maximum throughput up to 664 Mbps for a multistream system and 166 Mbps for single stream communication systems, respectively. One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by this hardware with minimal cycle cost overheads. Maximum flexibility and fast switchability among multiple standards during run time makes the proposed architecture a right choice for the radio baseband processing platform.

  • 6.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Programmable Parallel Data-path for FEC2007Inngår i: Swedish System-on-Chip Conference, SSoCC,2007, 2007Konferansepaper (Annet vitenskapelig)
  • 7.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Towards Radix-4, Parallel Interleaver Design to Support High-Throughput Turbo Decoding for Re-Configurability2010Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Parallel, radix-4 turbo decoding is used to enhance the throughput and at the same time reduce the overall memory cost. The bottleneck is the higher complexity associated with radix-4 parallel interleaver implementation. This paper addresses the implementation issues of radix-4, parallel interleaver and also proposes necessary modifications in the interleaver algorithms for parallel address generation. It presents a re-configurable architecture which enables the use of same turbo decoding core to be used for multiple standards. The proposed interleaver architecture is capable of handling the memory conflicts on-the-fly. It consumes 12.5K gates and can run at a frequency of 285MHz, thus supporting a throughput of 173.3Mpbs, which can cover most of the emerging communication standards.

  • 8.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Very Low Cost Configurable Hardware Interleaver for 3G Turbo Decoding2008Inngår i: IEEE Internation Conference on Information and Communication Tech from Theory to Applications, ICTTA,2008, IEEE , 2008, s. 2314-2318Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A very low cost hardware interleaver for 3rd Generation Partnership Project (3GPP) turbo coding algorithm is presented. The interleaver is a key component of turbo codes and it is used to minimize the effect of burst errors in the transmission. Using conventional design methods, it consumes a large part of silicon area in the design of turbo encoder and decoder. The presented hardware interleaver architecture utilizes the algorithmic level hardware simplifications as well as the iterative modulo computation to achieve very low cost solution. After doing the hardware multiplexing and optimization the proposed architecture consumes only 1.5 k gates (without pre-computation) and 2.2 k gates (with pre-computation). In both cases the interleaved address is computed every clock cycle except the case of pruning, in which one additional clock cycle is consumed.

  • 9.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Wu, Di
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding2010Inngår i: Journal of Signal Processing Systems for Signal, Image, and Video Technology, ISSN 1939-8018, Vol. 60, nr 1, s. 15-29Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used indifferent standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.

  • 10.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Wu, Di
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Memory Conflict Analysis and Interleaver Design for Parallel Turbo Decoding Supporting HSPA Evolution2009Inngår i: 12th EUROMICRO Conference on Digital System Design, 2009, s. 699-706Konferansepaper (Fagfellevurdert)
    Abstract [en]

    HSPA evolution has raised the throughput requirements for WCDMA based systems where turbo code has been adapted to perform the error correction. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithm used in WCDMA based systems does not freely allows to use them due to high percentage of memory conflicts. This paper provides a comprehensive analysis for reduction of interleaver memory conflicts while generating more than one address in a single clock cycle. It also provides trade-off analysis in terms of area and power efficiency for multiple architectures for different functions involved in the interleaver design. The final architecture supports processing of two parallel SISO blocks and manages the conflicts by applying different approaches like stream misalignment, memory division and small FIFO buffer. The proposed architecture is low cost and consumes 4.3K gates at a frequency of 150MHz. This work also focuses on reduction of pre-processing overheads by introducing the segment based modulo computation, thus providing further relaxation to SISO decoding process.

  • 11.
    Asghar, Rizwan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Wu, Di
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    Saeed, Ali
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    Huang, Yulin
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support2012Inngår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 66, nr 1, s. 25-41Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm(2) area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm(2)/Mbps and energy efficiency of 0.55 nJ/b/iter.

  • 12.
    Di, Wu
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Nilsson, Anders
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Tell, Eric
    Coresonic AB, Linköping.
    Alfredsson, Erik
    Coresonic AB, Linköping.
    System Architecture for 3GPP LTE Modem Using a Programmable Baseband Processo2009Inngår i: International Symposium on System-on-Chip (SoC 2009), 2009Konferansepaper (Fagfellevurdert)
    Abstract [en]

    3G evolution towards HSPA and LTE is ongoing which will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing such as time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier demapping. The throughput and latency requirements of a Category 4 User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ. This brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution.

  • 13.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlström, Per
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA2008Inngår i: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, s. 599-602Konferansepaper (Fagfellevurdert)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

  • 14.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Comparison of Three FPGA Optimized NoC Architectures2007Inngår i: Swedish System-on-Chip Conference, SSoCC,2007, 2007Konferansepaper (Annet vitenskapelig)
  • 15.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Network on Chip based gigabit Ethernet router implemented on an FPGA2006Inngår i: SSoCC Swedish System-on-Chip Conference,2006, 2006Konferansepaper (Annet vitenskapelig)
  • 16.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    An Asic Perspective on FPGA Optimizations2009Inngår i: 19th International Conference on Field Programmable Logic and Applications (FPL), 2009, s. 218-223Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

  • 17.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik.
    An ASIC Perspective on High Performance FPGA Design2009Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

  • 18.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    An FPGA based Open Source Network-on-chip Architecture2007Inngår i: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, s. 800-803Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

  • 19.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Benchmarking network processors2004Inngår i: Swedish System-on-Chip Conference,2004, 2004Konferansepaper (Annet vitenskapelig)
  • 20.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Flexible Route Lookup Using Range Search2005Inngår i: The Third IASTED International Conference on Communications and Computer Networks,2005, 2005Konferansepaper (Fagfellevurdert)
  • 21.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Thinking outside the flow: Creating customized backend tools for Xilinx based designs2007Inngår i: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

  • 22.
    Ehliar, Andreas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wiklund, Daniel
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Feasibility study of a core router based on a network on chip2005Inngår i: Swedish System on Chip Conference SSoCC,2005, 2005Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    In this paper we investigate the feasibility of creating a core router based upon a network on chip. The investigated architecture uses 16x10-Gbit Ethernet ports. The purpose of this is to show that it is possible to create such a solution considering current process technologies. This is done through an analysis of the required chip area, clock frequencies, and pin count. The results show that such a solution is feasible and can be implemented as a single chip.

  • 23.
    Eilert, Johan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Ehliar, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Using low precision floating point numbers to reduce memory cost for MP3 decoding2004Inngår i: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, s. 119-122Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

  • 24.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Early Exploratioin of MIPS Cost and Memory Cost Trade-off for Media DSP Media Processor2006Inngår i: SSoCC Swedish System-on-Chip Conference,2006, 2006Konferansepaper (Annet vitenskapelig)
  • 25.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Efficient Complex Matrix Inversion for MIMO Software Defined Radio2007Inngår i: International Symposium on Circuits and Systems, ISCAS,2007, IEEE , 2007, s. 2610-2613Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Complex matrix inversion is a very computationally demanding operation in advanced multi-antenna wireless communications. Traditionally, systolic array-based QR decomposition (QRD) is used to invert large matrices. However, the matrices involved in MIMO baseband processing in mobile handsets are generally small which means QRD is not necessarily efficient. In this paper, a new method is proposed using programmable hardware units which not only achieves higher performance but also consumes less silicon area. Furthermore, the hardware can be reused for many other operations such as complex matrix multiplication, filtering, correlation and FFT/IFFT.

  • 26.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Implementation of a Programmable Linear MMSE Detector for MIMO-OFDM2008Inngår i: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP,2008, IEEE , 2008, s. 5396-5399Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a linear minimum mean square error (LMMSE) symbol detector for MIMO-OFDM enabled mobile terminals. The detector is implemented using a programmable baseband processor aimed for software-defined radio (SDR). Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of detection. The programmable solution not only supports different transmit/receive antenna configurations, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional solutions, the one proposed in this paper is smaller, more flexible and faster.

  • 27.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Real-Time Alamouti STBC Decoding on A Programmable Baseband Processor2008Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a space-time block coding decoder for MIMO-OFDM enabled mobile terminals. The decoder is implemented using a programmable baseband processor aimed for software-defined radio (SDR). The dynamic range supplied by the floating-point SIMD datapath allows special algorithms to significantly reduce the computational latency of decoding. The programmable solution not only supports different transmit/receive antenna configuration, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional ASIC solutions in literature, the one proposed in this paper is by far the smallest, fastest and with more flexibility.

  • 28.
    Eilert, Johan
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wang, Dandan
    Al-Dhahir, Naofal
    Minn, Hlaing
    Complexity Reduction of Matrix Manipulation for Multi-User STBC-MIMO Decoding2007Inngår i: IEEE Sarnoff Symmposium,2007, 2007, s. 1-5Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper studies efficient complex valued matrix manipulations for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) is proposed for the inversion of large complex matrices that are based on Alamouti sub-blocks. Another method using a variant of Givens rotation is proposed for fast QR decomposition of this kind of matrices. Our solutions significantly reduce the number of operations which makes them more than 4 times faster than several other solutions in the literature. Furthermore, compared to fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware is programmable and it can be reused for many other operations such as filtering, correlation and FFT/IFFT. Besides the analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix manipulations.

  • 29.
    Flordal, Oskar
    et al.
    Axis Communications AB .
    Flordal, Oskar
    Axis Communications AB .
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Accelerating CABAC Encoding for Multi-standard Media with Configurability2006Inngår i: IEEE IPDPS,2006, 2006Konferansepaper (Fagfellevurdert)
  • 30.
    Haque, Muhammad Fahim Ul
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Combined RF and Multilevel PWM Switch Mode Power Amplifier2013Inngår i: Norchip Conference, IEEE , 2013, s. 1-4Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a novel power amplifier (PA) architecture based on the combination of radio frequency pulse width modulation (RFPWM) and multilevel PWM. The architecture provides better dynamic range at high carrier frequency compared to RFPWM. The benefits of this architecture over multilevel PWM are that it only requires a single PA and no combiner. The average efficiency for an 802.11g baseband signal is better than multilevel PWM. Our results also shows that the proposed technique exhibit a constant dynamic range at carrier frequency of 3, 4 and 5 GHz, in contrast to RFPWM which shows a decrease in dynamic range for increase in carrier frequency.

  • 31.
    Haque, Muhammad Fahim Ul
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Combined RF and Multiphase PWM Transmitter2015Inngår i: 2015 European Conference on Circuit Theory and Design (ECCTD), IEEE , 2015, s. 264-267Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents two novel transmitter architectures based on the combination of radio-frequency pulse-width modulation and multiphase pulse-width modulation. The proposed transmitter architectures provide good amplitude resolution and large dynamic range at high carrier frequency, which is problematic with existing radio-frequency pulse-width modulation based transmitters. They also have better power efficiency and smaller chip area compared to multiphase pulse-width modulation based transmitters.

  • 32.
    Haque, Muhammad Fahim Ul
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Combined RF and Multiphase PWM Transmitter2015Konferansepaper (Annet vitenskapelig)
  • 33.
    Haque, Muhammad Fahim Ul
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Modified Band-limited Pulse-Width Modulated Polar Transmitter2015Konferansepaper (Annet vitenskapelig)
  • 34.
    Haque, Muhammad Fahim Ul
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Modified Multilevel PWM Switch Mode Power Amplifier2014Konferansepaper (Annet vitenskapelig)
  • 35.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eriksson, Henrik
    Linköpings universitet, Institutionen för systemteknik, Elektroniska komponenter. Linköpings universitet, Tekniska högskolan.
    Nordqvist, Ulf
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Larsson-Edefors, Per
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    VLSI Implementation of CRC-32 for 10 Gigabit Ethernet2001Inngår i: The 8th IEEE International Conference on Electronics, Circuits and Systems, 2001: ICECS 2001, 2001, s. 1215-1218Konferansepaper (Fagfellevurdert)
    Abstract [en]

    For 10 Gigabit Ethernet a CRC-32 generation is essential and timing critical. Many efficient software algorithms have been proposed for CRC generation. In this work we use an algorithm based on the properties of Galois fields, which gives very efficient hardware. The CRC generator has been implemented and simulated in both standard cells and a full-custom design technique. In standard cells from the UMC 0.18 micron library a throughput of 8.7 Gb/s has been achieved. In the full-custom design for AMS 0.35 micron process we have achieved a throughput of 5.0 Gb/s. The conclusion, based on extrapolation of device characteristics, is that CRC-32 generation for 10 Gb/s can be designed with standard cells in a 0.15 micron process technology, or using full-custom design techniques in a 0.18 micron process technology

  • 36.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Implementation of Fast CRC Calculation2003Inngår i: Asia South Pacific Design Automation Conference,2003, 2003, s. 563-Konferansepaper (Fagfellevurdert)
  • 37.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Novel ASIP and Processor Architecture for Packet Decoding2002Inngår i: Workshop of Application Specific Processors Digest,2002, 2002, s. 25-Konferansepaper (Fagfellevurdert)
  • 38.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Nordqvist, Ulf
    Linköpings universitet, Institutionen för systemteknik, Elektroniska komponenter. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Configurable Port Processor Increases Flexibility in the Protocol Processing Area2000Inngår i: Proceedings of COOLChips III An International Symposium on Low-Power and High-Speed Chips, 2000, s. 275-Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    The limitation in networking is no longer only the physical transmission media but also the end equipment, which has to process the protocol control fields. In most end terminals this processing has been performed by the main processor, but different types of co-processor have lately appeared to relieve it from this task. These co-processors have high power consumption since they are based on a RISC core. Instead ASIC:s can be used, but they lack flexibility and are specific for only one single protocol. It is clear that a new approach is needed.

    (...)

  • 39.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Nordqvist, Ulf
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Embedded Protocol Processor for Fast and Efficient Packet Reception2002Inngår i: International Conference on Computer Design,2002, 2002, s. 414-Konferansepaper (Fagfellevurdert)
  • 40.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Nordqvist, Ulf
    Linköpings universitet, Institutionen för systemteknik, Elektroniska komponenter. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Specification of a configurable general-purpose protocol processor2000Inngår i: Proceedings of Second International Symposium on Communication Systems, Networks and Digital Signal Processing, 2000, s. 284-289Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    A general-purpose protocol processor is specified with a dedicated architecture for protocol processing. This paper defines a functional coverage, analyses the control requirements, specifies functional pages and a controller unit. The general-purpose protocol processor is aimed for network terminals, therefore routing is not completely supported. However it should be possible to use it as part of a router with some minor modifications. The general-purpose protocol processor is partitioned into two parts, a configurable stand alone part and a program based microcontroller. The configurable part performs the protocol processing without any running program. The processor does not execute any cycle based program, instead execution is controlled by configuration vectors and control vectors. The microcontroller assists with the interface to the host processor and handles the configuration. It is concluded that by partitioning the control into three levels, the architecture is flexible and verification is simplified. The proposed architecture also has higher performance and lower power dissipation than other solutions.

  • 41.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Nordqvist, Ulf
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Specification of a configurable general-purpose protocol processor2002Inngår i: IEE Proceedings - Circuits Devices and Systems, ISSN 1350-2409, E-ISSN 1359-7000, Vol. 149, nr 3, s. 198-202Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A general-purpose protocol processor is specified with a dedicated architecture for protocol processing. The paper defines a functional coverage, analyses the control requirements, and specifies functional pages and a controller unit. The general-purpose protocol processor is for network terminals, and therefore routing is not completely supported. However, it should be possible to use it as part of a router. with some minor modifications. The general-purpose protocol processor is partitioned into two parts: a configurable stand-alone part and a program based microcontroller. The configurable part performs the protocol processing without any running program. The processor does not execute any cycle based program; instead execution is controlled by configuration vectors and control vectors. The microcontroller assists with the interface to the host processor and handles the configuration. It is concluded that by partitioning the control into three levels, the architecture is flexible and verification is simplified. The proposed architecture also has higher performance and lower power dissipation than other solutions

  • 42.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Persson, Niclas
    Linköpings universitet, Institutionen för systemteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    VLSI Implementation of Internet Checksum Calculation for 10 gigabit Ethernet2002Inngår i: Design and Diagnostics of Electronics, Circuits and Systems,2002, 2002, s. 114-Konferansepaper (Fagfellevurdert)
  • 43.
    Henriksson, Tomas
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wiklund, Daniel
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    VLSI implementation of a switch for on-chip networks2003Inngår i: Int workshop on Design and diagnostics of electronic circuits and systems DDECS,2003, 2003Konferansepaper (Fagfellevurdert)
  • 44. Jiao, Haiyan
    et al.
    Nilsson, Anders
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Tell, Eric
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    MIPS Cost Estimation for OFDM-VBLAST systems2006Inngår i: WCNC, IEEE Wireless Communications and Networking,2006, 2006Konferansepaper (Fagfellevurdert)
  • 45.
    Karlsson, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Cost-efficient Mapping of 3- and 5-point DFTs to General Baseband Processors2015Inngår i: International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015, s. 780-784Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Discrete Fourier transforms of 3 and 5 points are essential building blocks in FFT implementations for standards such as 3GPP-LTE. In addition to being more complex than 2 and 4 point DFTs, these DFTs also cause problems with data access in SDR-DSPs, since the data access width, in general, is a power of 2. This work derives mappings of these DFTs to a 4-way SIMD datapath that has been designed with 2 and 4-point DFT in mind. Our instruction set proposals, based on modified Winograd DFT, achieves single cycle execution of 3-point DFTs and 2.25 cycle average execution of 5-point DFTs in a cost-effective manner by reutilizing the already available arithmetic units. This represents an approximate speed-up of 3 times compared to an SDR-DSP with only MAC-support. In contrast to our more general design, we also demonstrate that a typical single-purpose FFT-specialized 5-way architecture only delivers 9% to 25% extra performance on average, while requiring 85% more arithmetic units and a more expensive memory subsystem.

  • 46.
    Karlsson, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Energy-efficient sorting with the distributed memory architecture ePUMA2015Inngår i: IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Institute of Electrical and Electronics Engineers (IEEE), 2015, s. 116-123Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks suitable for distributed memory architectures, which allows very simple scheduling and predictable data-independent sorting time.The implementation on ePUMA utilizes the architecture's specialized compute cores and control cores, and local memory parallelism, to separate and overlap sorting with data access and control for close to stall-free sorting.Penalty-free unaligned and out-of-order local memory access is used in combination with proposed application-specific sorting instructions to derive highly efficient local sorting and merging kernels used by the system-level algorithm.Our evaluation shows that the proposed implementation can rival the sorting performance of high-performance commercial CPUs and GPUs, with two orders of magnitude higher energy efficiency, which would allow high-performance sorting on low-power devices.

  • 47.
    Karlsson, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    ePUMA: A Processor Architecture for Future DSP2015Inngår i: International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015, 2015, s. 253-257Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Since the breakdown of Dennard scaling the primary design goal for processor designs has shifted from increasing performance to increasing performance per Watt. The ePUMA platform is a flexible and configurable DSP platform that tries to address many of the problems with traditional DSP designs, to increase  performance, but use less power. We trade the flexibility of traditional VLIW DSP designs for a simpler single instruction issue scheme and instead make sure that each instruction can perform more work. Multi-cycle instructions can operate directly on vectors and matrices in memory and the datapaths implement common DSP subgraphs directly in hardware, for high compute throughput. Memory bottlenecks, that are common in other architectures, are handled with flexible LUT-based multi-bank memory addressing and memory parallelism. A major contributor to energy consumption, data movement, is reduced by using heterogeneous interconnect and clustering compute resources around local memories for simple data sharing. To evaluate ePUMA we have implemented the majority of the kernel library from a commercial VLIW DSP manufacturer for comparison. Our results not only show good performance, but also an order of magnitude increase in energy- and area efficiency. In addition, the kernel code size is reduced by 91% on average compared to the VLIW DSP. These benefits makes ePUMA an attractive solution for future DSP.

  • 48.
    Karlsson, Andreas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Software-based QPP Interleaving for Baseband DSPs with LUT-accelerated Addressing2015Inngår i: International Conference on Digital Signal Processing (DSP), Singapore, 21-24 July, 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper demonstrates how QPP interleaving and de-interleaving for Turbo decoding in 3GPP-LTE can be implemented efficiently on baseband processors with lookup-table (LUT) based addressing support of multi-bank memory. We introduce a LUT-compression technique that reduces LUT size to 1% of what would otherwise be needed to store the full data access patterns for all LTE block sizes. By reusing the already existing program memory of a baseband processor to store LUTs and using our proposed general address generator, our 8-way data access path can reach the same throughput as a dedicated 8-way interleaving ASIC implementation. This avoids the addition of a dedicated interleaving address generator to a processor which,  according to ASIC synthesis, would be 75\% larger than our proposed address generator. Since our software implementation only involves the address generator, the processor's datapaths are free to perform the other operations of Turbo decoding in parallel with interleaving. Our software implementation ensure programmability and flexibility and is the fastest software-based implementation of QPP interleaving known to us.

  • 49.
    Karlsson, Andréas
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Wang, Jian
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    ePUMA: A unique memory access based parallel DSP processor for SDR and CR2013Inngår i: Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, IEEE , 2013, s. 1234-1237Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents ePUMA, a master-slave heterogeneous DSP processor for communications and multimedia. We introduce the ePUMA VPE, a vector processing slave-core designed for heavy DSP workloads and demonstrate how its features can used to implement DSP kernels that efficiently overlap computing, data access and control to achieve maximum datapath utilization. The efficiency is evaluated by implementing a basic set of kernels commonly used in SDR. The experiments show that all kernels asymptotically reach above 90% effective datapath utilization. while many approach 100%, thus the design effectively overlaps computing, data access and control. Compared to popular VLIW solutions, the need for a large register file with many ports is eliminated, thus saving power and chip area. When compared to a commercial VLIW solution, our solution also achieves code size reductions of up to 30 times and a significantly simplified kernel implementation.

  • 50.
    Karlström, Per
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Ehliar, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 42008Inngår i: IET Computers and digital techniques, ISSN 1751-8601, Vol. 2, s. 305-313Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    There is increasing interest about floating-point arithmetics in field programmable gate arrays (FPGAs) because of the increase in their size and performance. FPGAs are generally good at bit manipulations and fixed-point arithmetics, but they have a harder time coping with floating-point arithmetics. An architecture used to construct high-performance floating-point components in a Virtex-4 FPGA is described in detail. Floating-point adder/subtracter and multiplier units have been constructed. The adder/subtracter can operate at a frequency of 377 MHz in a Virtex-4SX35 (speed grade -12).

123 1 - 50 of 137
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf