liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Design of Energy-Efficient High-Performance ASIP-DSP Platforms
Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
2016 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

In the last ten years, limited clock frequency scaling and increasing power density has shifted IC design focus towards parallelism, heterogeneity and energy efficiency. Improving energy efficiency is by no means simple and it calls for a reevaluation of old design choices in processor architecture, and perhaps more importantly, development of new programming methodologies that exploit the features of modern architectures.

This thesis discusses the design of energy-efficient digital signal processors with application-specific instructions sets, so-called ASIP-DSPs, and their programming tools. Target applications for such processors include, but are not limited to, communications, multimedia, image processing, intelligent vision and radar. These applications are often implemented by a limited set of kernel algorithms, whose performance and efficiency are critical to the application's success. At the same time, the extreme non-recurring engineering cost of system-on-chip designs means that product life-time must be kept as long as possible. Neither general-purpose processors nor non-programmable ASICs can meet both the flexibility and efficiency requirements, and ASIPs may instead be the best trade-off between all the conflicting goals.

Traditional superscalar- and VLIW processor design focus has been to improve the throughput of fine-grained instructions, which results in high flexibility, but also high energy consumption. SIMD architectures, on the other hand, are often restricted by inefficient data access. The result is architectures which spend more energy and/or time on supporting operations rather than actual computing.

This thesis defines the performance limit of an architecture with an N-way parallel datapath as consuming 2N elements of compute data per clock cycle. To approach this performance, this work proposes coarse-grained higher-order functional (HOF) instructions, which encode the most  frequently executed compute-, data access- and control sequences into single many-cycle instructions, to reduce the overheads of instruction delivery, while at the same time maintaining orthogonality. The work further investigates opportunities for operation fusion to improve computing performance, and proposes a flexible memory subsystem for conflict-free parallel memory access with permutation and lookup-table-based addressing, to ensure that high computing throughput can be sustained even in the presence of irregular data access patterns. These concepts are extensively studied by implementing a large kernel algorithm library with typical DSP kernels, to prove their effectiveness and adequacy. Compared to contemporary VLIW DSP solutions, our solution can practically eliminate instruction fetching energy in many scenarios, significantly reduce control path switching, simplify the implementation of kernels and reduce code size, sometimes by as much as 30 times.

The techniques proposed in this thesis have been implemented in the DSP platform ePUMA (embedded Parallel DSP processor with Unique Memory Access), a configurable control-compute heterogeneous platform with distributed memory, optimized for low-power predictable DSP computing. Hardware evaluation has been done with FPGA prototypes. In addition, several VLSI layouts have been created for energy and area estimations. This includes smaller designs, as well as a large design with 73 cores, capable of 1280 integer GOPS or 256 GFLOPS at 500MHz and which measures 45mm2 in 28nm FD-SOI technology.

In addition to the hardware design, this thesis also discusses parallel programming flow for distributed memory architectures and ePUMA application implementation. A DSP kernel programming language and its compiler is presented. This effectively demonstrates how kernels written in a high-level language can be translated into HOF instructions for very high processing efficiency.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2016. , 340 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1772
National Category
Computer Engineering Computer Systems Computer Science Other Electrical Engineering, Electronic Engineering, Information Engineering
URN: urn:nbn:se:liu:diva-130723DOI: 10.3384/diss.diva-130723ISBN: 9789176857458 (Print)OAI: diva2:954326
Public defence
2016-09-07, Visionen, B-huset, Campus Valla, Linköping, 10:15
Available from: 2016-08-22 Created: 2016-08-22 Last updated: 2016-08-22Bibliographically approved

Open Access in DiVA

Design of Energy-Efficient High-Performance ASIP-DSP Platforms(6647 kB)74 downloads
File information
File name FULLTEXT01.pdfFile size 6647 kBChecksum SHA-512
Type fulltextMimetype application/pdf
omslag(3100 kB)9 downloads
File information
File name COVER01.pdfFile size 3100 kBChecksum SHA-512
Type coverMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Karlsson, Andréas
By organisation
Computer EngineeringFaculty of Science & Engineering
Computer EngineeringComputer SystemsComputer ScienceOther Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 74 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 295 hits
ReferencesLink to record
Permanent link

Direct link