liu.seSearch for publications in DiVA
Endre søk
Begrens søket
123 101 - 137 of 137
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 101.
    Tell, Eric
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Nilsson, Anders
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Implementation of a Programmable Baseband Processor2005Inngår i: Radiovetenskap och Kommunikation RVK,2005, 2005Konferansepaper (Fagfellevurdert)
  • 102.
    Tell, Eric
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Olausson, Mikael
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A General DSP processor at the cost of 23k gates and 1/2 a man-year design time2003Inngår i: International Conference on Acoustics, Speech and Signal Processing,2003, 2003, s. 657-Konferansepaper (Fagfellevurdert)
  • 103.
    Tell, Eric
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Seger, Olle
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Converged Hardware Solution for FFT, DCT and Walsh Transform2003Inngår i: International Symposium on Signal Processing and its Applications,2003, 2003, s. 609-Konferansepaper (Fagfellevurdert)
  • 104.
    Ul Haque, Muhammad Fahim
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Large dynamic range PWM transmitter2016Konferansepaper (Annet (populærvitenskap, debatt, mm))
  • 105.
    Ul Haque, Muhammad Fahim
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Power Efficienct Band-limited Pulse Width Modulated Transmitter2015Konferansepaper (Annet (populærvitenskap, debatt, mm))
  • 106.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Joar, Sohl
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Olof, Kraigher
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    ePUMA: a novel embedded parallel DSP platform for predictable computing2010Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, a novel parallel DSP platform based on master-multi-SIMD architecture is introduced. The platform is named ePUMA [1]. The essential technology is to use separated data access kernels and algorithm kernels to minimize the communication overhead of parallel processing by running the two types of kernels in parallel. ePUMA platform is optimized for predictable computing. The memory subsystem design that relies on regular and predictable memory accesses can dramatically improve the performance according to benchmarking results. As a scalable parallel platform, the chip area is estimated for different number of co-processors. The aim of ePUMA parallel platform is to achieve low power high performance embedded parallel computing with low silicon cost for communications and similar signal processing applications.

  • 107.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlsson, Andréas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Convolutional Decoding on Deep-pipelined SIMD Processor with Flexible Parallel Memory2012Inngår i: Digital System Design (DSD), 2012, IEEE , 2012, s. 529-532Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Single Instruction Multiple Data (SIMD) architecture has been proved to be a suitable parallel processor architecture for media and communication signal processing. But the computing overhead such as memory access latency and vector data permutation limit the performance of conventional SIMD processor. Solutions such as combined VLIW and SIMD architecture are designed with an increased complexity for compiler design and assembly programming. This paper introduces the SIMD processor in the ePUMA1 platform which uses deep execution pipeline and flexible parallel memory to achieve high computing performance. Its deep pipeline can execute combined operations in one cycle. And the parallel memory architecture supports conflict free parallel data access. It solves the problem of large vector permutation in a short vector SIMD machine in a more efficient way than conventional vector permutation instruction. We evaluate the architecture by implementing the soft decision Viterbi algorithm for convolutional decoding. The result is compared with other architectures, including TI C54x, CEVA TeakLike III, and PowerPC AltiVec, to show ePUMA’s computing efficiency advantage.

  • 108.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlsson, Andréas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Pettersson, Magnus
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    A multi-level arbitration and topology free streaming network for chip multiprocessor2011Inngår i: ASIC (ASICON), 2011, IEEE , 2011, s. 153-158Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Predictable computing is common in embedded signal processing, which has communication characteristics of data independent memory access and long streaming data transfer. This paper presents a streaming network-on-chip (NoC) StreamNet for chip multiprocessor (CMP) platform targeting predictable signal processing. The network is based on circuit-switch and uses a two-level arbitration scheme. The first level uses fast hardware arbitration, and the second level is programmable software arbitration. Its communication protocol is designed to support free choice of network topology. Associated with its scheduling tool, the network can achieve high communication efficiency and improve parallel computing performance. This NoC architecture is used to design the Ring network in the ePUMA1 multiprocessor DSP. The evaluation by the multi-user signal processing application at the LTE base-station shows the low parallel computing overhead on the ePUMA multiprocessor platform.

  • 109.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Dake, Liu
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor2010Inngår i: EUC '10 Proceedings of the 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, Washington, DC, USA: IEEE Computer Society , 2010, s. 47-52Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient architecture for high performance signal processing which explores both task level parallelism by multi-core processing and data level parallelism by SIMD processors. Different from the cache-based memory subsystem in most general purpose processors, this architecture uses on-chip scratchpad memory (SPM) as processor local data buffer and allows software to explicitly control the data movements in the memory hierarchy. This SPM-based solution is more efficient for predictable signal processing in embedded systems where data access patterns are known at design time. The predictable performance is especially important for real time signal processing. According to Amdahl¡¯s law, the nonparallelizable part of an algorithm has critical impact on the overall performance. Implementing an algorithm in a parallel platform usually produces control and communication overhead which is not parallelizable. This paper presents the architectural support in an embedded multiprocessor platform to maximally reduce the parallel processing overhead. The effectiveness of these architecture designs in boosting parallel performance is evaluated by an implementation example of 64x64 complex matrix multiplication. The result shows that the parallel processing overhead is reduced from 369% to 28%.

  • 110.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlsson, Andréas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    An Efficient Streaming Star Network for Multi-core Parallel DSP Processor2011Inngår i: Networking and Computing (ICNC), 2011, IEEE , 2011, s. 332-336Konferansepaper (Fagfellevurdert)
    Abstract [en]

    As more and more computing components are integrated into one digital signal processing (DSP) system to achieve high computing power by executing tasks in parallel, it is soon observed that the inter-processor and processor to memory communication overheads become the performance bottleneck and limit the scalability of a multi-processor platform. For chip multiprocessor (CMP) DSP systems targeting on predictable computing, an appreciation of the communication characteristics is essential to design an efficient interconnection architecture and improve performance. This paper presents a Star network designed for the ePUMA multi-core DSP processor based on analysis of the network communication models. As part of ePUMA’s multi-layer interconnection network, the Star network handles core to off-chip memory communications for kernel computing on slave processors. The network has short setup latency, easy multiprocessor synchronization, rich memory addressing patterns, and power efficient streaming data transfer. The improved network efficiency is evaluated in comparison with a previous study.

  • 111.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Kraigher, Olof
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Dake, Liu
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Software programmable data allocation in multi-bank memory of SIMD processors2010Inngår i: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools / [ed] Sebastian Lopez, Washington, DC, USA: IEEE Computer Society , 2010, s. 28-33Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.

  • 112. Wang, Qi
    et al.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Cost Analysis of Channel Estimation in MIMO-OFDM for Software Defined Radio2008Inngår i: WCNC 2008: IEEE WIRELESS COMMUNICATIONS & NETWORKING CONFERENCE, IEEE , 2008, s. 935-939Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Channel State Information (CSI) is critical for the overall performance or wireless systems. Meanwhile, the estimation or CSI forms one or the most intensive tasks in radio baseband signal processing. This paper investigates the real-time implementation or channel estimation for MIMO-OFDM systems using programmable hardware aimed for software defined radio. Based on the programmable hardware architecture proposed by us, several prevalent channel estimation methods such as Least Square (LS), Minimum Mean Square Error (MMSE) and Pilot-Symbol-Aided (PSA) are evaluated from both the performance and computational latency perspectives. By utilizing the symmetric feature of the covariance matrix, a simplified two-sided Jacobi rotation method is adopted to speed up the complex-valued singular value decomposition involved in the MMSE channel estimation.

  • 113.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Ehliar, Andreas
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Design of an internet core router using the SoCBUS network on chip2005Inngår i: International Symposium on Signals, Circuits, and Systems ISSCS,2005, 2005Konferansepaper (Fagfellevurdert)
  • 114.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Design, mapping, and simulations of a 3G WCDMA/FDD basestation using network on chip2005Inngår i: International workshop on SoC for real-time applications,2005, 2005Konferansepaper (Fagfellevurdert)
  • 115.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Design of a system-on-chip switched network and its design support2002Inngår i: IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions, 2002, s. 1279-1283Konferansepaper (Fagfellevurdert)
    Abstract [en]

    As the degree of integration increases, the on-chip communication is becoming a bottleneck. A solution to this problem is to use an on-chip switched interconnect network. Such a system-on-chip network was proposed in 2000 by the same authors. In this paper, we present the system-on-chip network in detail together with the design flow support. The choice of topology for the network, as well as some ways to use the network to overcome the future physical implementation issues of wire delay, and to gain performance, is also discussed. To aid the design choices of the network, a behavioral simulator has been created. The importance of the behavioral simulator is clearly shown from the design flow and the design and implementation of this simulator is discussed in detail.

  • 116.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    SoCBUS: switched network on chip for hard real time embedded systems2003Inngår i: Proceedings. International Parallel and Distributed Processing Symposium, 2003, 2003, s. 78-Konferansepaper (Fagfellevurdert)
    Abstract [en]

    With the current trend in integration of more complex systems on chip there is a need for better communication infrastructure on chip that will increase the available bandwidth and simplify the interface verification. We have previously proposed a circuit switched two-dimensional mesh network known as SoCBUS that increases performance and lowers the cost of verification. In this paper, the SoCBUS is explained together with the working principles of the transaction handling. We also introduce the concept of packet connected circuit, PCC, where a packet is switched through the network locking the circuit as it goes. PCC is deadlock free and does not impose any unnecessary restrictions on the system while being simple and efficient in implementation. SoCBUS uses this PCC scheme to set up routes through the network. We introduce a possible application, a telephone to voice-over-IP gateway, and use this to show that the SoCBUS have very good properties in bandwidth, latency, and complexity when used in a hard real time system with scheduling of the traffic. The simulations analysis of the SoCBUS in the application show that a certain SoCBUS setup can handle 48000 channels of voice data including buffer swapping in a single chip. We also show that the SoCBUS is not suitable for general purpose computing platforms that exhibit random traffic patterns but that the SoCBUS show acceptable performance when the traffic is mainly local.

  • 117.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Switched interconnect for system-on-a-chip design2000Inngår i: Proceedings of the IP2000 Europe Conference, 2000, s. 185-192Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    With the increased use of IP cores in chip designs, an increasing amount of time is spent on design and verification of glue logic. To solve this problem together with the bottleneck problem of arbitration based buses, a novel approach in system-on-a-chip interconnect has been investigated. The approach is based on a switched interconnect structure, with small crossbar switches connected in a mesh for intercore communications with low latency in system-on-chip solutions. The interfaces between the interconnect network and the cores are handled by configurable wrappers that adapt the port parameters from core to network format. The core functionality of the interconnect network can be fully verified with a fairly low work effort even when configurable, so the main problem for cutting verification time is the quite complex wrappers. The concept is to make the wrappers highly configurable yet needing short verification time in an application by making a fairly complete verification of the wrappers for all configurations. How this can be achieved is under investigation. The approach described in this paper is mainly aimed for use in communication equipment where high bandwidth and low latency is essential.

  • 118.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Sathe, Sumant
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Benchmarking of On-Chip Interconnection Networks2004Inngår i: International Conference on Microelectronics, ICM,2004, 2004Konferansepaper (Fagfellevurdert)
  • 119.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Sathe, Sumant
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Network on chip simulations for benchmarking2004Inngår i: International Workshop on SoC for real-time applications,2004, 2004Konferansepaper (Fagfellevurdert)
  • 120.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Huang, Yulin
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Implementation of a high-speed parallel Turbo decoder for 3GPP LTE terminals2009Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a parameterized parallel Turbo decoder for 3GPP LTE terminals. To support the high peak data-rate defined in the forthcoming 3GPP LTE standard, turbo decoder with a throughout beyond 150 Mbit/s is needed as a key component of the radio baseband chip. By exploiting the tradeoff of precision, speed and area consumption, a turbo decoder with eight parallel SISO units is implemented to meet the throughput requirement. The turbo decoder is synthesized, placed and routed using 65 nm CMOS technology. It achieves a throughput of 152 Mbit/s and occupies an area of 0.7 mm2 with estimated power consumption being 650 mW.

  • 121.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    VLSI Implementation of a Fixed-Complexity Soft-Output MIMO Detector for High-Speed Wireless2010Inngår i: EURASIP Journal on Wireless Communications and Networking, ISSN 1687-1472, E-ISSN 1687-1499, Vol. 2010, nr 893184Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a low-complexity MIMO symbol detector with close-Maximum a posteriori performance for the emerging multiantenna enhanced high-speed wireless communications. The VLSI implementation is based on a novel MIMO detection algorithm called Modified Fixed-Complexity Soft-Output (MFCSO) detection, which achieves a good trade-off between performance and implementation cost compared to the referenced prior art. By including a microcode-controlled channel preprocessing unit and a pipelined detection unit, it is flexible enough to cover several different standards and transmission schemes. The flexibility allows adaptive detection to minimize power consumption without degradation in throughput. The VLSI implementation of the detector is presented to show that real-time MIMO symbol detection of 20 MHz bandwidth 3GPP LTE and 10 MHz WiMAX downlink physical channel is achievable at reasonable silicon cost.

  • 122.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Ge, Qun
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    VLSI Implementation of A Multi-Standard MIMO Symbol Detector for 3GPP LTE and WiMAX2010Inngår i: Wireless Telecommunications Symposium (WTS), 2010, IEEE , 2010, s. 1-4Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, a low-complexity symbol detector is presentedtargeting the emerging 3GPP LTE andWiMAX standards. The detector isthe VLSI implementation of a novel MIMO detection algorithm recentlyproposed. Compared to the design in the reference, the detector performsbetter while consumes less silicon area. Including a microcode controlledchannel preprocessing unit and a pipelined detection unit, it is flexibleenough to cover different standards and transmission schemes whilemaintaining the power and area efficiency. Implemented using 65 nmCMOS process, the detector can support real-time detection of 20 MHzbandwidth 3GPP LTE or 10 MHz WiMAX downlink physical channel.

  • 123.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Nilsson, A.
    Coresonic AB, Sweden.
    Tell, E.
    Coresonic AB, Sweden.
    Alfredsson, E.
    Coresonic AB, Sweden.
    System architecture for 3GPP-LTE modem using a programmable baseband processor2010Inngår i: International Journal of Embedded and Real-Time Communication Systems, ISSN 1947-3176, E-ISSN 1947-3184, Vol. 1, nr 3, s. 44-64Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The evolution of third generation mobile communications toward high-speed packet access and long-term evolution is ongoing and will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier de-mapping. The throughput and latency requirements of a Category four User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ, which brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution. Copyright © 2010, IGI Global.

  • 124.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Programmable Lattice-Reduction Aided Detector for MIMO-OFDMA2008Inngår i: 4th IEEE International Conference on Circuits and Systems, ICCSC,2008, IEEE , 2008, s. 293-297Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents the first programmable Lattice- Reduction Aided (LRA) symbol detector for Multiple-Input Multiple-Output (MIMO) and Orthogonal Frequency Division Multiple Access (OFDMA). The detector proposed is implemented using 65 nm ASIC technologies. Owing to the programmability, the detector can be dynamically switched between linear (e.g. MMSE) and lattice-reduction aided (e.g. LRA-MMSE) detectors by simply running another software subroutine. Therefore, it allows a good trade-off between performance and computational latency to be achieved under various scenarios. Along with the hardware, two algorithm simplifications (SCNT-LR and SOT-LR) are proposed for finding subcarriers with ill- conditioned channel matrices. And in the end, interpolated LR (I- LR) is proposed to further reduce the computational complexity for real-time implementations.

  • 125.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Evaluation of MIMO Symbol Detectors for 3GPP LTE Terminals2009Inngår i: 17th European Signal Processing Conference (EUSIPCO), 2009Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper investigates various MIMO detection methods for 3GPP LTE open-loop downlink multi-antenna transmission. Targeting VLSI implementation, these detection methods are evaluated with respect to complexity and detection performance. A realistic 3GPP LTE simulation chain is developed for the evaluation. The result shows that with the aid of Hybrid Automatic Repeat reQuest (H-ARQ), a recently proposed reduced complexity close-ML detector called MFCSO achieves a good tradeoff between achievable throughput and complexity. An adaptive transmission and detection scheme is also proposed based on user scenarios.

  • 126.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Implementation of a High-Speed MIMO Soft-Output Symbol Detector for Software Defined Radio2011Inngår i: Journal of Signal Processing Systems, ISSN 1939-8115, Vol. 63, nr 1, s. 27-37Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a programmable MMSE soft-output MIMO symbol detector that supports 600 Mbps data rate defined in 802.11n. The detector is implemented using a multi-core floating-point processor and configurable soft-bit demapper. Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of channel processing with sufficient numerical stability for large channel matrices. When compared to several existing fixed-functional solutions, the detector proposed in this paper is smaller and faster. More important, it is programmable and configurable so that it can support various MIMO transmission schemes defined by different standards.

  • 127.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Lattice-Reduction Aided Multi-User STBC Decoding with Resource Constraints2007Inngår i: 2007 IEEE 18TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, IEEE , 2007, s. 192-196Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Recently lattice-reduction aided decoders have been proposed in MIMO system to achieve near Maximum Likelihood decoder performance while maintaining reasonable complexity. This paper studies the implementation of lattice-reduction aided linear decoders on a programmable device for multi-user space-time block coding (MU-STBC). By reloading software, the device can be configured to use different decoding schemes according to the amount of resources available, which is an important feature of cognitive radio. In this paper, two different lattice-reduction aided linear decoding methods namely SQRD-LR and AQRD-LR for MU-STBC are evaluated based on their BER performance and computational complexity. Furthermore, the effect of deadline constraint on LR is evaluated and based on the evaluation, a new method namely adaptive decoding is proposed by us to allow mode-switching of the decoder according to the environment parameters, so that the best decoder performance can always be achieved while fulfiling the resource constraints.

  • 128.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wang, Dandan
    Al-Dhahir, Naofal
    Minn, Hlaing
    Fast Complex Valued Matrix Inversion for Multi-User STBC-MIMO Decoding2007Inngår i: IEEE Computer Society Annual Aymposium on VLSI, ISVLSI,2007, IEEE , 2007, s. 325-330Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper studies the efficient complex matrix inversion for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) and its programmable VLSI implementation are proposed for the inversion of (in this context) large complex matrices with Alamouti sub-blocks. Our solution significantly reduces the number of operations which makes it more than 4 times faster than several other solutions in the literature. Furthermore, compared to these fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware can be reused for many other operations. In addition to the routine analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix inversion.

  • 129.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Hu, Tiejun
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Single Scalar DSP based Programmable H.264 Decoder2005Inngår i: Swedish System on Chip Conference SSoCC,2005, 2005Konferansepaper (Annet vitenskapelig)
  • 130.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Hu, Tiejun
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Single-Issue DSP based Multi-standard Media Processor for Mobile Platform2006Inngår i: ARCS,2006, 2006Konferansepaper (Fagfellevurdert)
  • 131.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Karlström, Per
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Ehliar, Andreas
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Media DSP: An Application Specific Heterogeneous Multiprocessor SoC2006Inngår i: SSoCC Swedish System-on-Chip Conference,2006, 2006Konferansepaper (Annet vitenskapelig)
  • 132.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Li, Yi-Hsien
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Real-Time Space-Time Adaptive Processing on the STI CELL Multiprocessor2007Inngår i: 4th European Radar Conference,2007, IEEE , 2007, s. 71-74Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Space-time adaptive processing (STAP) has been widely used in modern radar systems such as ground moving target indication (GMTI) systems in order to suppress jamming and interference. However, its baseband signal processing part usually requires huge amount of computing power. This paper presents the real-time implementation of an STAP baseband signal processing flow on the state-of-the-art STI CELL multiprocessor which enables the concept of software-defined radar (SDR). SIMD vectorization is applied to speed-up the kernel subroutines of STAP such as the QR decomposition, forward/backward substitution and fast Fourier transform (FFT). Benchmarking results of both the kernel subroutines and the overall flow are presented. Furthmore, based on the result of earlier benchmarking, optimized task partitioning and scheduling methods are proposed by us to improve the overall performance so that the overhead is reduced to the minimum.

  • 133.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Lim, Boonshyang
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Parallelization of High-Performance Video Encoding on a Single-Chip Multiprocessor2007Inngår i: IEEE International Conference on Signal Processing and Communications,2007, IEEE , 2007Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Although single-chip multiprocessor architectures are available nowadays for embedded computing, programming them with efficiency and productivity has become a significant challenge. This paper studies the multi-level parallelization of video encoding algorithms on a state-of-the-art on-chip multiprocessor. The encoding of H.264/AVC video is chosen as the case to be studied because of its performance demanding and branch-rich features. The final benchmarking result proves that the optimized processing flow can achieve more than 100 operations per cycle in performance which allows a single-chip multiprocessor to encode high resolution video (1920 x 1080) in real-time (30 fps).

  • 134.
    Wu, Zhenzhi
    et al.
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Flexible Multistandard FEC Processor Design With ASIP Methodology2014Inngår i: PROCEEDINGS OF THE 2014 IEEE 25TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2014), IEEE , 2014, s. 210-218Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Designing decoder for forward error correction (FEC) is more and more challenging because of the requirements on simultaneous supporting of various wireless standards within one IC module. The flexibility, silicon cost and throughput efficiency are all necessary to be traded off. In this paper, by using ASIP methodology, software-hardware co-design is introduced to offer sufficient flexibility of FEC decoding. The decoding procedure can be programmable for decoding QC-LDPC, Turbo and Convolutional Codes. Firstly, the common features from all mentioned algorithms and their corresponding datapaths are analyzed and a unified multi-standard datapath is introduced. Based on it, an application specific instruction-set is proposed and an ASIP (Application Specific Instruction-set Processor) for the FEC algorithms is designed. The firmware FEC codes are developed to adapt to standards. Synthesis results show that the proposed FEC processor is 1.54mm(2) under 65nm CMOS process. It offers QC-LDPC decoding for WiMAX, Turbo decoding for 3GPP-LTE, and 64 states Convolutional code (CC) decoding at the throughput of 193 Mbps, 62 Mbps and 60 Mbps respectively under clock frequency of 200 MHz. The proposed ASIP provides programmable high throughput compared to other tri-mode hardware modules.

  • 135.
    Wu, Zhenzhi
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten. Beijing Institute Technology, Peoples R China.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten. Beijing Institute Technology, Peoples R China.
    High-Throughput Trellis Processor for Multistandard FEC Decoding2015Inngår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 23, nr 12, s. 2757-2767Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Trellis codes, including Low-Density Parity-Check (LDPC), turbo, and convolutional code (CC), are widely adopted in advanced wireless standards to offer high-throughput forward error correction (FEC). Designing a multistandard FEC decoder is of great challenge. In this paper, a trellis application specified instruction-set processor (TASIP) is presented for multistandard trellis decoding. A unified forward-backward recursion kernel with an eight-state parallel trellis structure is proposed. Based on the kernel, a datapath for multialgorithm and a shared memory subsystem are introduced. The flexibility and the compatibility are guaranteed by a programmable decoding flow and the trellis decoding instruction set. Synthesis results show that the area consumption is 2.12 mm(2) (65 nm). TASIP provides trimode FEC decoding ability with the throughput of 533, 186, and 225 Mb/s for LDPC, turbo, and 64 states CC under the clock frequency of 200 MHz, which outperforms other trimode proposals both in area efficiency and recursion efficiency. TASIP provides high-throughput decoding for current standards, including 3rd Generation Partnership Project-Long Term Evolution, 802.16e, and 802.11n, with unified architecture and high compatibility.

  • 136.
    Wu, Zhenzhi
    et al.
    Beijing Institute Technology, Peoples R China.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Memory Sharing Techniques for Multi-standard High-throughput FEC Decoder2014Inngår i: 2014 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS XIV), IEEE , 2014, s. 93-98Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Nowadays multi-standard wireless baseband, Convolutional Code (CC), Turbo code and LDPC code are widely applied and need to be integrated within one FEC module. Since memory occupies half or even more area of the decoder, memory sharing techniques for area saving purpose is valuable to consider. In this work, several memory merging techniques are proposed. A non-conflict access technique for merged path metric buffer is proposed. The results show that 41% of total memory bits are saved when integrating three different decoding schemes including CC (802.11a/g/n), LDPC (802.11n and 802.16e) and Turbo (3GPP-LTE). Synthesis result with 65nm process shows that the merged memory blocks consume merely 1.06mm(2) of the chip area.

  • 137.
    Zhou, Wenbiao
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlström, Per
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    NoGapCL: A flexible common language for processor hardware description2010Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Flexible Application Specific Instruction set Processors (ASIP) are starting to replace monolithic ASICs in a wide variety of fields. However the construction of an ASIP is today associated with a substantial design effort. NoGap (Novel Generator of Micro Architecture and Processor) is a tool for ASIP designs, utilizing hardware multiplexed data paths. One of the main advantages of NoGap compared to other EDA tools for processor design, is that NoGap impose few limits on the architecture and thus design freedom. NoGap does not assume a fixed processor template and is not a data flow synthesizer. To reach this flexibility NoGap makes heavy use of the compositional design principle. This paper describe NoGapCL, a flexible common language for processor hardware description. A RISC processor using NoGapCL has been constructed with NoGap in less than a working day and synthesized to an FPGA. With no FPGA specific optimizations this processor met timing closure at 178MHz in a Virtex-4 LX80 speedgrade 12.

123 101 - 137 of 137
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf