While the Micro Processors (MPUs) as a general purpose CPU are converging (into Intel Pentium), the DSP processors are diverging. In 1995, approximately 50% of the DSP processors on the market were general purpose processors, but last year only 15% were general purpose DSP processors on the market. The reason general purpose DSP processors fall short to the application specific DSP processors is that most users want to achieve highest performance under minimized power consumption and minimized silicon costs. Therefore, a DSP processor must be an Application Specific Instruction set Processor (ASIP) for a group of domain specific applications.
An essential feature of the ASIP is its functional acceleration on instruction level, which gives the specific instruction set architecture for a group of applications. Hardware acceleration for digital signal processing in DSP processors is essential to enhance the performance while keeping enough flexibility. In the last 20 years, researchers and DSP semiconductor companies have been working on different kinds of accelerations for digital signal processing. The trade-off between the performance and the flexibility is always an interesting question because all DSP algorithms are "application specific"; the acceleration for audio may not be suitable for the acceleration of baseband signal processing. Even within the same domain, for example speech CODEC (COder/DECoder), the acceleration for communication infrastructure is different from the acceleration for terminals.
Benchmarks are good parameters when evaluating a processor or a computing platform, but for domain specific algorithms, such as audio and speech CODEC, they are not enough. The solution here is to profile the algorithm and from the resulting statistics make the decisions. The statistics also suggest where to start optimizing the implementation of the algorithm. The statistics from the profi ling has been used to improve implementations of speech and audio coding algorithms, both in terms of the cycle cost and for memory efficiency, i.e. code and data memory.
In this thesis, we focus on designing memory efficient DSP processors based on instruction level acceleration methods and data type optimization techniques. Four major areas have been attacked in order to speed up execution and reduce memory The first one is instruction level acceleration, where consecutive instructions appear frequently and are merged together. By this merge the code memory size is reduced and execution becomes faster. Secondly, complex addressing schemes are solved by acceleration for address calculations, i.e. dedicated hardware are used for address calculations. The third area, data storage and precision, is speeded up by using a reduced floating point scheme. The number of bits is reduced compared to the normal IEEE 754 floating point standard. The result is a lower data memory requirement, yet enough precision for the application; an mp3 decoder. The fourth contribution is a compact way of storing data in a general CPU. By adding two custom instructions, one load and one store, the data memory efficiency can be improved without making the firmware complex. We have tried to make application specific instruction sets and processors and also tried to improve processors based on an available instruction set.
Experiences from this thesis can be used for DSP design for audio and speech applications. They can additionally be used as a reference to a general DSP processor design methodology.
Linköping: Linköpings universitet , 2004. , 48 p.
2004-06-02, Alan Turing (Estraden), Linköpings universitet, Linköping, 13:15 (Swedish)