liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Asghar, Rizwan
Publications (10 of 16) Show all publications
Asghar, R., Wu, D., Saeed, A., Huang, Y. & Liu, D. (2012). Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support. Journal of Signal Processing Systems, 66(1), 25-41
Open this publication in new window or tab >>Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support
Show others...
2012 (English)In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 66, no 1, p. 25-41Article in journal (Refereed) Published
Abstract [en]

This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm(2) area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm(2)/Mbps and energy efficiency of 0.55 nJ/b/iter.

Place, publisher, year, edition, pages
Springer Verlag (Germany), 2012
Keywords
Turbo codes, VLSI implementation, Radix-4, Parallel interleaver, Multimode
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-75110 (URN)10.1007/s11265-010-0521-6 (DOI)000299530700004 ()
Note
Funding Agencies|European Commission||Ericson AB||Infineon Austria AG||IMEC||Lund University||KU-Leuven||Available from: 2012-02-21 Created: 2012-02-17 Last updated: 2017-12-07
Asghar, R. (2010). Flexible Interleaving Sub–systems for FEC in Baseband Processors. (Doctoral dissertation). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Flexible Interleaving Sub–systems for FEC in Baseband Processors
2010 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Interleaving is always used in combination with an error control coding. It spreads the burst noise, and changes the burst noise to white noise so that the noise induced bit errors can be corrected. With the advancement of communication systems and substantial increase in bandwidth requirements, use of coding for forward error correction (FEC) has become an integral part in the modern communication systems. Dividing the FEC sub-systems in two categories i.e. channel coding/de-coding and interleaving/de-interleaving, the later appears to be more varying in permutation functions, block sizes and throughput requirements. The interleaving/de-interleaving consumes more silicon due to the silicon cost of the permutation tables used in conventional LUT based approaches. For multi-standard support devices the silicon cost of the permutation tables can grow much higher resulting in an un-efficient solution. Therefore, the hardware re-use among different interleaver modules to support multimode processing platform is of significance.

The broadness of the interleaving algorithms gives rise to many challenges when considering a true multimode interleaver implementation. The main challenges include real-time low latency computation for different permutation functions, managing wide range of interleaving block sizes, higher throughput, low cost, fast and dynamic reconfiguration for different standards, and introducing parallelism where ever necessary.

It is difficult to merge all currently used interleavers to a singlearchitecture because of different algorithms and throughputs; however, thefact that multimode coverage does not require multiple interleavers to workat the same time, provides opportunities to use hardware multiplexing. The multimode functionality is then achieved by fast switching between differentstandards. We used the algorithmic level transformations such as 2-Dtransformation, and realization of recursive computations, which appear to bethe key to bring different interleaving functions to the same level. In general,the work focuses on function level hardware re-use, but it also utilizesclassical data-path level optimizations for efficient hardware multiplexingamong different standards.

The research has resulted in multiple flexible architectures supporting multiple standards. These architectures target both channel interleaving and turbo-code interleaving. The presented architectures can support both types of communication systems i.e. single-stream and multi-stream systems. Introducing the algorithmic level transformations and then applying hardware re-use methodology has resulted in lower silicon cost while supporting sufficient throughput. According to the database searching in March 2010, we have the first multimode interleaver core covering WLAN (802.11a/b/g and 802.11n), WiMAX (802.16e), 3GPP-WCDMA, 3GPP-LTE, and DVB-T/H on a single architecture with minimum silicon cost. The research also provides the support for parallel interleaver address generation using different architectures. It provides the algorithmic modifications and architectures to generate up to 8 addresses in parallel and handle the memory conflicts on-the-fly.

One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by the presented architectures with minimal cycle cost overheads. Fast switching between different standards gives luxury to the baseband processor to re-configure the

interleaver architecture on-the-fly and re-use the same hardware for another standard. Lower silicon cost, maximum flexibility and fast switchability among multiple standards during run time make the proposed research a good choice for the radio baseband processing platforms.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2010. p. 189
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1312
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-54903 (URN)978-91-7393-397-1 (ISBN)
Public defence
2010-05-20, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 10:15 (English)
Opponent
Supervisors
Available from: 2010-04-20 Created: 2010-04-20 Last updated: 2010-04-20Bibliographically approved
Asghar, R., Wu, D., Eilert, J. & Liu, D. (2010). Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding. Journal of Signal Processing Systems for Signal, Image, and Video Technology, 60(1), 15-29
Open this publication in new window or tab >>Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding
2010 (English)In: Journal of Signal Processing Systems for Signal, Image, and Video Technology, ISSN 1939-8018, Vol. 60, no 1, p. 15-29Article in journal (Refereed) Published
Abstract [en]

This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used indifferent standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-25599 (URN)10.1007/s11265-009-0394-8 (DOI)000276722700002 ()
Note
The original publication is available at www.springerlink.com: Rizwan Asghar, Di Wu, Johan Eilert and Dake Liu, Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding, 2010, Journal of Signal Processing Systems for Signal, Image, and Video Technology, (60), 1, 15-29. http://dx.doi.org/10.1007/s11265-009-0394-8 Copyright: Springer Science Business Media http://www.springerlink.com/ Available from: 2009-10-13 Created: 2009-10-08 Last updated: 2010-05-10
Asghar, R. & Liu, D. (2010). Multimode flex-interleaver core for baseband processor platform. Journal of Computer Systems, Networks and Communications, 2010, 1-16
Open this publication in new window or tab >>Multimode flex-interleaver core for baseband processor platform
2010 (English)In: Journal of Computer Systems, Networks and Communications, ISSN 1687-7381, Vol. 2010, p. 1-16Article in journal (Refereed) Published
Abstract [en]

This paper presents a flexible interleaver architecture supportingmultiple standards likeWLAN,WiMAX, HSPA+, 3GPP-LTE, and DVB. Algorithmic level optimizations like 2D transformation and realization of recursive computation are applied, which appear to be the key to reach to an efficient hardware multiplexing among different interleaver implementations. The presented hardware enables the mapping of vital types of interleavers including multiple block interleavers and convolutional interleaver onto a single architecture. By exploiting the hardware reuse methodology the silicon cost is reduced, and it consumes 0.126mm2 area in total in 65nm CMOS process for a fully reconfigurable architecture. It can operate at a frequency of 166 MHz, providing a maximum throughput up to 664 Mbps for a multistream system and 166 Mbps for single stream communication systems, respectively. One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by this hardware with minimal cycle cost overheads. Maximum flexibility and fast switchability among multiple standards during run time makes the proposed architecture a right choice for the radio baseband processing platform.

Place, publisher, year, edition, pages
Hindawi, 2010
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-55304 (URN)10.1155/2010/793807 (DOI)
Available from: 2010-04-29 Created: 2010-04-29 Last updated: 2014-12-03
Wu, D., Eilert, J., Asghar, R., Liu, D., Nilsson, A., Tell, E. & Alfredsson, E. (2010). System architecture for 3GPP-LTE modem using a programmable baseband processor. International Journal of Embedded and Real-Time Communication Systems, 1(3), 44-64
Open this publication in new window or tab >>System architecture for 3GPP-LTE modem using a programmable baseband processor
Show others...
2010 (English)In: International Journal of Embedded and Real-Time Communication Systems, ISSN 1947-3176, E-ISSN 1947-3184, Vol. 1, no 3, p. 44-64Article in journal (Refereed) Published
Abstract [en]

The evolution of third generation mobile communications toward high-speed packet access and long-term evolution is ongoing and will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier de-mapping. The throughput and latency requirements of a Category four User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ, which brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution. Copyright © 2010, IGI Global.

Place, publisher, year, edition, pages
IGI Global, 2010
Keywords
3GPP; Long-Term Evolution; Programmable; Radio Baseband; Software Defined Radio
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-100694 (URN)10.4018/jertcs.2010070103 (DOI)
Available from: 2013-11-11 Created: 2013-11-11 Last updated: 2017-12-06
Asghar, R. & Liu, D. (2010). Towards Radix-4, Parallel Interleaver Design to Support High-Throughput Turbo Decoding for Re-Configurability. In: : . Paper presented at 33rd IEEE SARNOFF Symposium, Princeton, NJ, USA (pp. 1-5). IEEE
Open this publication in new window or tab >>Towards Radix-4, Parallel Interleaver Design to Support High-Throughput Turbo Decoding for Re-Configurability
2010 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Parallel, radix-4 turbo decoding is used to enhance the throughput and at the same time reduce the overall memory cost. The bottleneck is the higher complexity associated with radix-4 parallel interleaver implementation. This paper addresses the implementation issues of radix-4, parallel interleaver and also proposes necessary modifications in the interleaver algorithms for parallel address generation. It presents a re-configurable architecture which enables the use of same turbo decoding core to be used for multiple standards. The proposed interleaver architecture is capable of handling the memory conflicts on-the-fly. It consumes 12.5K gates and can run at a frequency of 285MHz, thus supporting a throughput of 173.3Mpbs, which can cover most of the emerging communication standards.

Place, publisher, year, edition, pages
IEEE, 2010
Keywords
Radix-4 interleaver, Parallel turbo decoding, HSPA, DVB, WiMAX, LTE
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-55305 (URN)10.1109/SARNOF.2010.5469723 (DOI)978-1-4244-5592-8 (ISBN)
Conference
33rd IEEE SARNOFF Symposium, Princeton, NJ, USA
Available from: 2010-04-29 Created: 2010-04-29 Last updated: 2014-10-01
Wu, D., Eilert, J., Asghar, R. & Liu, D. (2010). VLSI Implementation of a Fixed-Complexity Soft-Output MIMO Detector for High-Speed Wireless. EURASIP Journal on Wireless Communications and Networking, 2010(893184)
Open this publication in new window or tab >>VLSI Implementation of a Fixed-Complexity Soft-Output MIMO Detector for High-Speed Wireless
2010 (English)In: EURASIP Journal on Wireless Communications and Networking, ISSN 1687-1472, E-ISSN 1687-1499, Vol. 2010, no 893184Article in journal (Refereed) Published
Abstract [en]

This paper presents a low-complexity MIMO symbol detector with close-Maximum a posteriori performance for the emerging multiantenna enhanced high-speed wireless communications. The VLSI implementation is based on a novel MIMO detection algorithm called Modified Fixed-Complexity Soft-Output (MFCSO) detection, which achieves a good trade-off between performance and implementation cost compared to the referenced prior art. By including a microcode-controlled channel preprocessing unit and a pipelined detection unit, it is flexible enough to cover several different standards and transmission schemes. The flexibility allows adaptive detection to minimize power consumption without degradation in throughput. The VLSI implementation of the detector is presented to show that real-time MIMO symbol detection of 20 MHz bandwidth 3GPP LTE and 10 MHz WiMAX downlink physical channel is achievable at reasonable silicon cost.

Place, publisher, year, edition, pages
Hindawi, 2010
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-67289 (URN)10.1155/2010/893184 (DOI)
Available from: 2011-04-07 Created: 2011-04-07 Last updated: 2017-12-11
Wu, D., Eilert, J., Asghar, R., Liu, D. & Ge, Q. (2010). VLSI Implementation of A Multi-Standard MIMO Symbol Detector for 3GPP LTE and WiMAX. In: Wireless Telecommunications Symposium (WTS), 2010: . Paper presented at 9th IEEE Wireless Telecommunication Symposium, WTS'10 (pp. 1-4). IEEE
Open this publication in new window or tab >>VLSI Implementation of A Multi-Standard MIMO Symbol Detector for 3GPP LTE and WiMAX
Show others...
2010 (English)In: Wireless Telecommunications Symposium (WTS), 2010, IEEE , 2010, p. 1-4Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, a low-complexity symbol detector is presentedtargeting the emerging 3GPP LTE andWiMAX standards. The detector isthe VLSI implementation of a novel MIMO detection algorithm recentlyproposed. Compared to the design in the reference, the detector performsbetter while consumes less silicon area. Including a microcode controlledchannel preprocessing unit and a pipelined detection unit, it is flexibleenough to cover different standards and transmission schemes whilemaintaining the power and area efficiency. Implemented using 65 nmCMOS process, the detector can support real-time detection of 20 MHzbandwidth 3GPP LTE or 10 MHz WiMAX downlink physical channel.

Place, publisher, year, edition, pages
IEEE, 2010
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-56218 (URN)10.1109/WTS.2010.5479665 (DOI)978-1-4244-6558-3 (ISBN)
Conference
9th IEEE Wireless Telecommunication Symposium, WTS'10
Available from: 2010-05-01 Created: 2010-05-01 Last updated: 2014-09-01
Asghar, R. & Liu, D. (2009). 2-D Realization of WiMAX Channel Interleaver for Efficient Hardware Implementation. In: Proceedings of World Academy of Science, Engineering and Technology (ISSN: 2070-3740): . Paper presented at International Conference on Wireless Communication and Sensor Networks (pp. 25-29).
Open this publication in new window or tab >>2-D Realization of WiMAX Channel Interleaver for Efficient Hardware Implementation
2009 (English)In: Proceedings of World Academy of Science, Engineering and Technology (ISSN: 2070-3740), 2009, p. 25-29Conference paper, Published paper (Refereed)
Abstract [en]

The direct implementation of interleaver functions in WiMAX is not hardware efficient due to presence of complex functions. Also the conventional method i.e. using memories for storing the permutation tables is silicon consuming. This work presents a 2-D transformation for WiMAX channel interleaver functions which reduces the overall hardware complexity to compute the interleaver addresses on the fly.  A fully re-configurable architecture for address generation in WiMAX channel interleaver is presented, which consume 1.1 k-gates in total. It can be configured for any block size and any modulation scheme in WiMAX. The presented architecture can run at a frequency of 200 MHz, thus fully supporting high bandwidth requirements for WiMAX.

Series
World Academy of Science, Engineering and Technology, ISSN 2070-3740
Keywords
Interleaver, deinterleaver, WiMAX, 802.16e
National Category
Engineering and Technology Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-55303 (URN)
Conference
International Conference on Wireless Communication and Sensor Networks
Available from: 2010-04-29 Created: 2010-04-29 Last updated: 2013-07-19
Wu, D., Asghar, R., Huang, Y. & Liu, D. (2009). Implementation of a high-speed parallel Turbo decoder for 3GPP LTE terminals. Paper presented at IEEE 8th International Conference on ASIC, ASICON'09.. IEEE
Open this publication in new window or tab >>Implementation of a high-speed parallel Turbo decoder for 3GPP LTE terminals
2009 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a parameterized parallel Turbo decoder for 3GPP LTE terminals. To support the high peak data-rate defined in the forthcoming 3GPP LTE standard, turbo decoder with a throughout beyond 150 Mbit/s is needed as a key component of the radio baseband chip. By exploiting the tradeoff of precision, speed and area consumption, a turbo decoder with eight parallel SISO units is implemented to meet the throughput requirement. The turbo decoder is synthesized, placed and routed using 65 nm CMOS technology. It achieves a throughput of 152 Mbit/s and occupies an area of 0.7 mm2 with estimated power consumption being 650 mW.

Place, publisher, year, edition, pages
IEEE, 2009
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-56217 (URN)10.1109/ASICON.2009.5351623 (DOI)000275924100117 ()978-1-4244-3868-6 (ISBN)
Conference
IEEE 8th International Conference on ASIC, ASICON'09.
Available from: 2010-05-01 Created: 2010-05-01 Last updated: 2010-08-12
Organisations

Search in DiVA

Show all publications