In this thesis we discuss the design and implementation of Digital Signal Processing (DSP) applications in a standard digital CMOS technology. The aim is to fulfill a throughput requirement with lowest possible power consumption. As a case study a frequency selective filter is implemented using a half-band FIR filter and a bireciprocal Lattice Wave Digital Filter (LWDF) in a 0.35 µm CMOS process.
The thesis is presented in a top-down manner, following the steps in the topdown design methodology. This design methodology, which has been used for bit-serial maximally fast implementations of IIR filters in the past, is here extended and applied for digit-serial implementations of recursive and non-recursive algorithms. Transformations such as pipelining and unfolding for increasing the throughput is applied and compared from throughput and power consumption points of view. A measure of the level of the logic pipelining is developed, i.e., the Latency Model (LM), which is used as a tuning variable between throughput and power consumption. The excess speed gained by the transformations can later be traded for low power operation by lowering the supply voltage, i.e., architecture driven voltage scaling.
In the FIR filter case, it is shown that for low power operation with a given throughput requirement, that algorithm unfolding without pipelining is preferable. Decreasing the power consumption with 40, and 50 percent compared to pipelining at the logic or algorithm level, respectively. The digit-size should be tuned with the throughput requirement, i.e., using a large digit-size for low throughput requirement and decrease the digit-size with increasing throughput.
In the bireciprocal LWDF case, the LM order can be used as a tuning variable for a trade-off between low energy consumption and high throughput. In this case using LM 0, i.e., non-pipelined processing elements yields minimum energy consumption and LM 1, i.e., use of pipelined processing elements, yields maximum throughput. By introducing some pipelined processing elements in the non-pipelined filter design a fractional LM order is obtained. Using three adders between every pipeline register, i.e., LM 1/3, yields a near maximum throughput and a near minimum energy consumption. In all cases should the digit-size be equal to the number of fractional bits in the coefficient.
At the arithmetic level, digit-serial adders is designed and implemented in a 0.35 µm CMOS process, showing that for the digit-sizes, , the Ripple-Carry Adders (RCA) are preferable over Carry-Look-Ahead adders (CLA) from a throughput point of view. It is also shown that fixed coefficient digitserial multipliers based on unfolding of serial/parallel multipliers can obtain the same throughput as the corresponding adder in the digit-size range D = 2...4.
A complex multiplier based on distributed arithmetic is used as a test case, implemented in a 0.8 µm CMOS process for evaluation of different logic styles from robustness, area, speed, and power consumption points of view. The evaluated logic styles are, non-overlapping pseudo two-phase clocked C2MOS latches with pass-transistor logic, Precharged True Single Phase Clocked logic (PTSPC), and Differential Cascade Voltage Switch logic (DCVS) with Single Transistor Clocked (STC) latches. In addition we propose a non-precharged true single phase clocked differential logic style, which is suitable for implementation of robust, high speed, and low power arithmetic processing elements, denoted Differential NMOS logic (DN-logic). The comparison shows that the two-phase clocked logic style is the best choice from a power consumption point of view, when voltage scaling can not be applied and the throughput requirement is low. However, the DN-logic style is the best choice when the throughput requirements is high or when voltage scaling is used.
Institutionen för konstruktions- och produktionsteknik , 2005. , 196 p.
Implementation, CMOS, Low Power, High Speed, Wave Digital Filter (WDF), FIR filter