The dynamic element matching (DEM) techniques for digital-to-analog converters (DACs) has been suggested as a promising method to improve matching between the DAC''s reference levels. However, no work has so far taken the dynamic effects that limit the performance for higher frequenciesinto account. In this paper we present a model describing the dynamic properties of a DEM DAC and compare the simulated results with measurements of a 14-bit current-steering DEM DAC implemented in a 0.35-μm CMOS process. The measured data agrees well with the results predicted by the used model. It is also shown that the DEM technique does not necessarily increase the performance of a DAC when dynamic errors are dominating the achievable performance.
A new Vernier time-to-digital converter (TDC) architecture using a delay line and a chain of delay latches is proposed. The delay latches replace the functionality of one delay chain and the sample register commonly found in Vernier converters, hereby enabling power and hardware efficiency improvements. The delay latches can be implemented using either standard or full custom cells, allowing the architecture to be implemented in field-programmable gate arrays, digital synthesized application-specific integrated circuits, or in full custom design flows. To demonstrate the proposed concept, a 7-bit Vernier TDC has been implemented in a standard 65-nm CMOS process with an active core size of 33 mu m x 120 mu m. The time resolution is 5.7 ps with a power consumption of 1.75 mW measured at a conversion rate of 100 MS/s.
An 8-bit time-to-digital converter (TDC) for all-digital frequency-locked loops ispresented. The selected architecture uses a Vernier delay line where the commonlyused D flip-flops are replaced with a single enable transistor in the delay elements.This architecture allows for an area efficient and power efficient implementation. Thetarget application for the TDC is an all-digital frequency-locked loop which is alsooverviewed in the paper. A prototype chip has been implemented in a 65 nm CMOSprocess with an active core area of 75μmˆ120μm. The time resolution is 5.7 ps with apower consumption of 1.85 mW measured at 50 MHz sampling frequency.
Digital recursive oscillators locked in steady-state can be used to generate sinusoids with high spectral purity. The locking occurs when the oscillator returns to a previously visited state and repeats its sequence. In this work we propose a new search algorithm and two new search strategies to find all steady-states for a given oscillator configuration. The improvement in spurious-free dynamic range is between 7 and 40 dB compared to previously reported results. The algorithm is also able to find oscillator sequences for more frequencies than previously reported work. A key part of the method is the reduction of the search space made possible by a proposed extension of existing theory on recursive oscillators. Specific properties of digital oscillators in a steady-state are also discussed. It is shown that the initial states can be used to individually control the phase, amplitude, spectral purity, and also cycle length of the oscillator output.
A 14-bit dual current-steering digital-to-analog converter implemented in a 0.25 µm CMOS process is presented in this work. Both implementation issues and measurement results are presented. The measured spurious-free dynamic range is higher than 73 dB for signal frequencies up to 3 MHz, and a measured multi-tone power ratio of approximately 71 dB is reported for an ADSL-like input.
A differential current-steering digital-to-analog converter (DAC) architecture allowing the common-mode level of the input signal to be varied is presented. Simulation results with models of different DAC nonlinearities indicate that the proposed architecture has a potential of improving the linearity of the converters.
Segmented architectures are often used in digital-to-analog converters (DACs). Here we propose a DAC structure based on recursive decomposition of an N-bit binary DAC into two (N-1) bit DACs and one 1 bit DAC. A DAC model that includes matching errors has been simulated. The simulation results indicate that by using four layers of decomposition it is possible to achieve similar performance as when using seven bits of traditional segmentation.
This work is an overview of recently proposed methods on combining DACs in order to improve performance. Some further development of these techniques are also presented. The techniques aim at reducing glitches and sensitivity towards limited output impedance in current sources.
Due to the lack of proper design automation tools, designers are often forced to use full-custom design methodologies when designing analog and mixed-signal circuits. In this work, we discuss a design methodology based on parameterized cells intended for efficient design. The methodology is illustrated with the design of a 12-bit configurable current-steering DAC. Because the cells are parameterized, their layout must be described in a generalized way, resulting in a longer design time compared with a manual layout of a fixed circuit. However, the parameterized approach simplifies iteration of the layout process and block reuse.
One of the major contributors to the static nonlinearity of a current-steering digital-to-analog converter (DAC) is mismatch between current sources. A technique for enhancing the yield of binary-weighted current-steering DACs is proposed. The technique utilizes a special case of a general technique for spectral shaping of DAC nonlinearity errors presented earlier and requires oversampling. The technique relies on two DAC models with low computational complexity that can be integrated with the DAC at a negligible cost in terms of area and power consumption. Behavioral-level simulation results indicate that the proposed method has a good potential of enhancing the yield of binary-weighted DACs for situations where the matching errors constitute the dominating source of nonlinearity.
A dynamic element matching (DEM) technique is proposed that aims at improving the spurious-free dynamic range (SFDR) of current-steering digital-to-analog converters (DACs) implemented with a decomposed architecture. The architecture consists of a number of small binary-weighted DACs that are controlled such that only a minimum number of unit current sources are switching for the most critical code transitions. The DEM is obtained by scrambling bit pairs with equal weight. In contrast to most other DEM techniques, the scrambling is performed conditionally so that the number of switching current sources does not increase compared with the unscrambled case. Hence, the good glitch properties of the decomposed converter architecture are maintained. Simulations on a behavioral level of some decomposed DACs have been performed. Assuming random uncorrelated matching errors with Gaussian distribution and a 5% standard deviation, the SFDR value giving 90% yield is increased with 5.6 dB for a 14-bit DAC using scrambling of the two bit pairs with the largest weights. The hardware cost for the required scrambling circuits should be low since only two pairs of bits are scrambled.
The current-steering digital-to-analog converter (DAC) is the most common type of DAC for high-speed applications. Glitches present in the DAC output contribute to nonlinear distortion in the DAC transfer characteristics degrading the circuit performance. One source of glitches is asymmetry in the settling behavior when switching on and off a current source. A behavioral-level model of this nonideal behavior is derived in this work. Further, a method with low computational complexity for estimating the influence of the modeled errors in the frequency domain is developed. This method can be utilized by circuit designers to derive circuit requirements for fulfilling a given frequency-domain specification, potentially relaxing the requirements compared with a worst-case analysis. Examples of model utilization are given in terms of an analytical examination and MATLAB simulations. A good agreement between simulated and analytical results is obtained.
The decomposed DAC architecture was recently proposed as an alternative to the traditional segmented architecture. In this work, we present a modified version of the decomposed architecture with reduced hardware complexity denoted the partially decomposed architecture. Behavioral-level simulations indicate that the partially decomposed architecture is a good alternative for signals with Gaussian distribution, whereas the original decomposed or segmented architectures are preferred for sinusoidal signals.
We present a radix-4 static CMOS full adder circuit that reduces the propagation delay, PDP, and EDP in carry-based adders compared with using a standard radix-2 full adder solution. The improvements are obtained by employing carry look-ahead technique at the transistor level. Spice simulations using 45 nm CMOS technology parameters with a power supply voltage of 1.1 V indicate that the radix-4 circuit is 24% faster than a 2-bit radix-2 ripple carry adder with slightly larger transistor count, whereas the power consumption is almost the same. A second scheme for radix-2 and radix-4 adders that have a reduced number of transistors in the carry path is also investigated. Simulation results also confirm that the radix-4 adder gives better performance as compared to a standard 2-bit CLA. 32-Bit ripple carry, 2-stage carry select, variable size carry select, and carry skip adders are implemented with the different full adders as building blocks. There are POP savings, with one exception, for the 32-bit adders in the range 8-18% and EDP savings in the range 21-53% using radix-4 as compared to radix-2.
In this work, circuits for on-chip measurement and periodic waveform capture are designed. The aim is to analyze disturbances in mixed-signal chips such as simultaneous switching noise and the transfer of substrate noise. A programmable reference generator that replaces the standard digital-to-analog converter is proposed. It is based on a resistor string that is connected in a circular structure. A feature is that the reference outputs to the different comparators in the measurement channels are distributed over the nodes of the resistor string. Comparing with using a complete digital-to-analog converter, the use of a buffer is avoided. Hence, there is a potential reduction in the parasitic capacitance and power consumption as well as an increase in speed. We present results from a test chip demonstrating that simultaneous switching noise can be measured with the presented approach.
A clock with adjustable rise and fall time is used in conjunction with a D flip-flop that operates well with this clock. Its intended use is to relax the design of the clock network in digital circuits and to alleviate the problems with simultaneous switching noise in mixed-signal circuits. A test chip has been designed in a 0.35 μm CMOS process. The chip consists of a clock driver with adjustable rise and fall times, and an FIR filter that uses the special D flip-flop in the registers. According to measurements, the digital circuit works well when the rise and fall times of the clock is varied from 0.5 ns to 10 ns. This makes the propagation delay in the critical path to vary between 13.0 ns and 13.7 ns, and the energy dissipation to vary between 1.5 pJ and 1.7 pJ, for an input signal with a transition activity of 0.4.
A D flip-flop circuit that works well with long rise and fall times of the clock is characterized. This property is important when we would like to, e.g., relax the constraints on the clock distribution network or reduce the amount of noise generated in a mixed-signal circuit. Since the use of the D flip-flop allows small clock driver circuits, the amount of simultaneous switching noise can be reduced. There is also a potential for power savings with the use of smaller drivers, assuming that the short-circuit current in the flip-flops can be kept low. Moreover, the high frequency content of the clock is reduced, causing the noise that is injected into the substrate to be more easy to suppress. This is important in a mixed-signal circuit where analog circuits are present on the same substrate. The effects of long rise and fall times on the differential D flip-flop used in this work are mainly longer propagation times.
The design of a clock distribution network in a digital integrated circuit is challenging in terms of obtaining low power consumption, low waveform degradation, low clock skew and low simultaneous switching noise. In this work we aim at alleviating these design restrictions by using a clock buffer with reduced size and a D flip-flop circuit with relaxed constraints on the rise and fall times of the clock. According to simulations the energy dissipation of a D flip-flop, implemented in a 0.35 μm process, increases with only 21% when the fall time of the clock is increased from 0.05 ns to 7.0 ns. Considering that smaller clock buffers can be used there is a potential of power savings by using the suggested clocking scheme.
A strategy that aims at relaxing the design of the clock network in digital circuits is evaluated through simulations and measurements on a test circuit. In the strategy a clock with long rise and fall times is used in conjunction with a D flip-flop that operates well with this clock. The test circuit consists of a digital FIR filter and a clock buffer with adjustable driving strength. It was designed and manufactured in a 0.35 μm CMOS process. The energy dissipation of the circuit increased 14% when the rise and fall times of the clock increased from 0.5 ns to 10 ns. The corresponding increase in propagation delay was less than 0.5 ns, i.e. an increase of 50% in propagation delay of the register. The results in this paper show that the clocking strategy can be implemented with low costs of power and speed.
In this paper an introduction to substrate noise in silicon oninsulator (SOI) is given. Differences between substratenoise coupling in conventional bulk CMOS and SOICMOS are discussed and analyzed by simulations. The efficiencyof common substrate noise reduction methods arealso analyzed. Simulation results show that the advantageof the substrate isolation in SOI is only valid up to a frequencythat highly depends on the chip structure. In bulk,guard bands are normally directly connected to the substrate.In SOI, the guard bands are coupled to the substratevia the parasitic capacitance of the silicon oxide. Therefore,the efficiency of a guard may be much larger in aconventional bulk than in SOI. One opportunity in SOI isthat a much higher resistivity of the substrate can be used,which results in a significantly higher impedance up to afrequency where the coupling is dominated by the capacitivecoupling of the substrate.
Simultaneous switching noise (SSN) can degrade the performance of digital circuits. In mixed-signal circuits, the performance of analog circuits are degraded by the SSN that is spread from digital circuits through the substrate to the analog circuits. The most critical parameter when considering SSN is the parasitic inductance in the power supply path from off-chip to on-chip. In this paper, basic theories of inductance of current paths are given for parallel interconnects throughout examples. The results from these examples show that the placement of interconnects plays a big role for the effective inductance. Power supply interconnects should be placed with small distances in between, and so that currents in adjacent interconnects are in opposite directions. With this strategy, a low inductance in the power supply current path can be achieved. The importance of choosing a good package for the silicon die is also briefly discussed.
In this paper the authors present results from measurements on a test chip used to evaluate our method for reduction of substrate noise that originates from the clock in digital circuits. The authors use long rise and fall times of the clock signal and a D flip-flop that operates well with this clock. With this approach, smaller clock buffers can be used, which results in smaller current peaks on the power supply lines and therefore less switching noise. The measured substrate noise on the test chip was reduced by 20% and up to 54%. With optimized clock buffers this method has a potential of an even larger noise reduction.
Digital switching noise is of major concern in mixed-signal circuits due to the coupling of the noise via a shared substrate to the analog circuits. A significant noise source in this context is the digital clock network that generally has a high switching activity. There is a large capacitive coupling between the clock network and the substrate. Switching of the clock produces current peaks causing simultaneous switching noise (SSN). Sharp clock edges yields a high frequency content of the clock signal and a large SSN. High frequency noise is less attenuated through the substrate than low frequencies due to the parasitic inductance of the interconnect from on-chip to off-chip. In this work, we present a strategy that targets the problems with clock noise. The approach is to generate a clock with smooth edges, i.e. reducing both the high frequency components of the clock signal and the current peaks produced in the power supply. We use a special digital D flip-flop circuit that operates well with the clock. A test chip has been designed where we can control the rise and fall time of the clock edges in a digital FIR filter, and measure the performance of a fifth-order analog active-RC filter.
In this work a digital filter is placed on the same chip as an analog filter. We investigate how the simultaneous switching noise is propagated from the digital filter to different nodes on a manufactured chip. Conventional substrate noise reduction methods are used, e.g., separate power supplies, guard rings, and multiple pins for power supplies. We also investigate if the effect of substrate noise on the analog filter can be reduced by using a noise reduction method, which use long rise and fall times of the digital clock. The measured noise on the output of the analog filter was reduced by 30% up to 50% when the method was used.
A major concern in mixed-signal circuits is the noise injected by the digital circuits into sensitive analog circuits. Of particular interest in this work is the problem with large capacitive coupling between the digital clock network and the substrate shared with the analog circuits. It is in general more easy to reduce low frequency noise compared with high frequency noise. Therefore, we have developed a strategy where we reduce the high frequency content of the clock by using smooth clock edges, and a special digital flip-flop circuit. This strategy will be evaluated in a test chip where we can control the rise and fall time of the clock edges of a high-performance digital FIR filter, and measure the performance of a fifth-order analog active-RC filter.
In this work we focus on reducing the simultaneous switching noise located in the frequency band from DC up to half of the digital clock frequency. This frequency band is assumed to be the signal band of an analog circuit. The idea is to use circuits that have as periodic power supply currents as possible to obtain low simultaneous switching noise below the clock in the frequency domain. We use precharged differential cascode switch logic together with a novel D flip-flop. To evaluate the method two pipelined adders have been implemented on transistor level in a 0.13 mum CMOS technology, where the novel circuit is implemented with our method and the reference circuit with static CMOS logic together with a TSPC D flip-flop. According to simulation results, the frequency components in the analog signal band can be attenuated from 10 dB up to 17 dB when using the proposed method. The cost is an increase in power consumption of almost a factor of three and a higher transistor count.
The rising demand for portable system is increasing the importance of low power as a design consideration. In this sense, leakage power is increasing much faster than dynamic power at smaller dimensions. Peak values of supply current are related to noise injected into the substrate and/or propagated through supply network, limiting the performances of the sensitive analog and RF portions of mixed-signal circuits. This paper analyses how these three aspects, dynamic power, leakage power and peak power, can be considered together, optimizing the sizing and design of basic cells, with a reduced degradation in performances. The suited sizing of basic cells, show the benefits of the proposed technique, validated through simulation results on 130 nm nand, nor and inverter cells.
A polynomial-based division algorithm and a corresponding hardware structure are proposed. The proposed algorithm is shown to be competitive to other conventional algorithms like the Newton-Raphson algorithm for up to about 32 bits accuracy. For example, using Newton-Raphson with less than 12 bits accuracy of the initial approximation, requires 33% more general multiplications than the proposed algorithm, in order to achieve 24 bits accuracy.
In this paper we present a calibration technique for sigma-delta analog-to-digital converters (ΣΔADC) in which highspeed, low-resolution flash subADCs are used. The calibration technique as such is mainly targeting calibration of the flash subADC, but we also study how the correction depends on where in the ΣΔ modulator the calibration signals are applied. It is shown that the calibration technique can cope with errors that occur in the feedback digital-to-analog converter (DAC) and the input accumulator. Behavioral-level simulation results show an improvement of in effective number of bits (ENOB) from 6.6 to 11.3. Fairly large offset and gain errors have been introduced which illustrates a robust calibration technique.
In this paper a calibration technique for high-resolution, flash analog- to-digital converters (ADCs) based on histogram test methods is proposed. A probability density function, PDF, generator circuit is utilized to generate a triangular signal with a constant PDF, i.e., uniform distribution, as a test signal. In the proposed technique both offset estimation and trimming are performed without imposing any changes on the comparator structure in the ADC. The proposed algorithm estimates the offset values and stores them in a RAM. The trimming circuit uses the stored values and performs the trimming by adjusting the reference voltages to the comparators. An 8-bit flash ADC with a 1-V reference voltage, a comparator offset distribution with σ_{os} ≈ 30 mV, and a 10-bit test signal with about 3% nonlinearity are used in the simulations. The results show that the calibration improves the DNL and INL from about 3.6/3.9 LSB to about 0.9/0.75 LSB, respectively.
Bit-serial arithmetic is often advantageous both in terms of small chip area and low power consumption. When using bit-serial arithmetic for implementation of recursive digital filters, the maximal sample frequency is inversely proportional to the coefficient word lengths of the filters. For high-speed applications it is therefore essential to find filter structures with short coefficients. One way to do this is to use cascaded low-order filters instead of one high-order filter. Problems arise though when the cascaded filters are to be used for interpolation and decimation, since the straightforward realization increases the workload due to the different sample rates involved. However, we have developed a novel realization technique which keep the workload at a minimum with the additional possibility to use a high sample frequency. A digital filter for both interpolation and decimation, realized using this novel technique applied on two cascaded lattice wave digital filters, has been implemented. The filter can be used for sample rate conversions between 25 and 50 MHz.
Digit-serial/parallel multipliers with improved throughput and latency are presented. The multipliers are based on unfolded bit-serial/parallel multipliers. The unfolding yields long critical paths that are reduced by splitting the multiplication as a sum of partial multiplications. Using a sum of two partial multiplications yields an increased throughput with between 50 and 120 percent and the latency is reduced with up to 50 percent, compared with the basic digit-serial/parallel multiplier based on unfolding.
Fixed coefficient digit-serial/parallel multipliers are presented. The multipliers are based on unfolded bit-serial/parallel multipliers in combination with canonic signed-digit coding of the fixed coefficient. The unfolding yields long critical paths, which cannot be pipelined due to the feed back carry loops, and carry-look-ahead techniques cannot be applied efficiently since the propagating sum path will increase. By using canonic signed-digit code the multiplier gains higher throughput and lower latency since the critical path is reduced without pipelining. Hence, the throughput is increased with between 56 and 150 percent compared with two's complement coded coefficients, and for the digit-sizes {2,3,4} it has the same throughput as the corresponding digit-serial adder.
In this paper, a new robust non-overlapping two-phase clock generator with adjustable duty cycle is proposed. The generator is based on a differential negative edge trigged D flip-flop and has small area and power consumption. The maximal clock rate and delay are also improved reaching a clock frequency of 1.0 GHz in a standard 0.35 µm CMOS process. The new clock generator is inherently glitch and spike free and robust against slow clock transitions, that reduces the design effort significantly.
Algorithm transformations for increased throughput and decreased power consumption in design of digit-serial FIR filters are discussed in this paper. Pipelining has been used for a long time for increasing the throughput of sequential algorithms. Here we introduce algorithm unfolding, which traditionally has been used in implementation of recursive algorithms, in a sequential FIR algorithm. Pipelining at algorithm and logic level, and algorithm unfolding are compared by HSPICE simulations of netlists extracted from layouts. For a given throughput requirement, the simulations show that algorithm unfolding without any pipelining is preferable for low power operation. Algorithm unfolding yields a decrease of the power consumption with 40, and 50 percent compared to pipelining at the logic or algorithm level, respectively. For minimum power consumption the digit-size should be tuned with the throughput requirement, i.e., using a large digit-size for low throughput requirement and decrease the digit-size with increasing throughput.
In this paper, performance trade-offs between throughput, and energy consumption, in implementation of recursive digital filters are presented. Digit-serial arithmetic with different degree of pipelining are used in the implementions. As a demonstration object, a bireciprocal third-order lattice wave digital filter is used. Simulations with HSPICE show that a maximum throughput is obtained using pipelined processing elements with a digit-size equal to the fractional bits in the filter coefficient. The use of non-pipelined processing elements yields minimum energy consumption. A trade-off between throughput and energy consumption can be made by pipelining only some of the processing elements.
In this paper a new dynamic differential logic style is presented. A non-precharged single phase clocking scheme is used. The logic is suitable for high speed and low power operation in both bit-serial and bit-parallel implementations, since all logic nets are purely in NMOS and merged with the latches. The logic style is also robust for clock slope and yield a data noise margin equal to Vdd/2.
We propose an efficient scheme for implementing a complex multiplier based on distributed arithmetic. A modified bit-serial shift-accumulator for distributed arithmetic is also proposed for computing a*b+c, where a, b and c are complex numbers. The shift-accumulator is highly regular and modular and consists of only three types of bit-slices, each of which consists of only three types of blocks, multiplexers, exclusive OR gates, and latches. The implementation is done using a robust differential single-phase clocked logic style suitable for high-speed and low power operation. The resulting implementation of the complex multiplier has a maximum clock frequency of 250 MHz, consumes 70 mW, and occupies a chip area of 0.5 mm^{2} in a double-metal 0.8 μm process. The coefficient word length and the data word length are 12 bits and 16 bits, respectively
In this paper two bit-serial carry save adders are implemented using a recently proposed differential logic style. The clocking scheme uses a single clock phase with non-precharged stages of logic that may be merged with the latches or the flip-flops. A novel flip-flop structure is used in one of the adders, which significantly lowers the number of clocked transistors. The logic style used in the adder realizations suits high speed and low power operation in both bit-serial and bit-parallel implementations, since all logic nets are purely in NMOS. The logic style is also robust for clock slope and yields a data noise margin equal to Vdd/2. The adders reached a maximal clock frequency of 300 MHz in a 0.8 mm process with a 3.0 V power supply voltage.
In this paper two robust bus drivers combining low-swing and semi-adiabatic charge recycling technique are presented. The drivers use a novel concept with Schmitt-triggers as voltage sensors. Hence, voltage references are not required. The drivers reduces the power consumption by 55 and 72 percent, respectively.
In this paper novel single-rail low-swing bus-drivers based on schmitt-triggers as voltage level sensors are presented. Two novel 4-transistor schmitt-triggers with non-symmetrical trigger-voltage levels are also proposed. The power dissipation for the single rail low-swing bus-driver is reduced by 48 percent compared to a full-swing driver. Finally, two novel semi-adiabatic charge-recycle circuits are proposed. The power savings for these circuits are 45 and 65 percent, respectively. A data rate of 200 Mbit/s per driver has been reached in a double metal 0.8 μm process.
In this paper we study digit-serial implementation of the general-order lossless discrete integrator/differentiator (LDI/LDD) allpass filter structure. In low-power filter implementation, digit-serial computation has been shown to be advantageous compared to bit-serial and parallel arithmetics. The digit-serial processing elements are obtained using unfolding techniques. The implementation is compared to a corresponding wave digital (WD) implementation. It is shown in an example that a WD realization requires about 60% and 30% more D flip-flops for pipelining and shimming delays, respectively, than the corresponding LDI/LDD implementation. We also study the sample period of the second-order LDI/LDD allpass filter using different digit sizes and conclude that when the filter is scheduled over a number of sample periods we achieve the shortest sample period.
In this paper, we present a new digit-serial hybrid adder. The adder can be pipelined to the bit-level and is, therefore, well suited for high-speed applications. The main advantage of the proposed adder is that it can be implemented with few pipelining stages. We compare speed, area, and power consumption for the proposed adder with a digit-serial carry-look-ahead adder and a digit-serial Ladner-Fisher adder. The results show that the delay of the digit-serial hybrid adder is lower than the others studied in this paper for digit-sizes up to d=12. For these digit-sizes the digit-serial hybrid adder has on average 17% smaller critical path than the digit-serial carry-look-ahead adder and a 21% smaller critical path that the digit-serial Ladner-Fisher adder.
In this work, we study how retiming can be used to reduce glitches in digit-serial recursive filters. It is a well known fact that glitches can make up a large portion of the dynamic power consumption in digital systems. Digit-serial recursive systems contain registers that can be retimed to reduce the amount of glitches. A second-order digit-serial LDI allpass filter has been implemented to verify this statement. It is shown that retiming can reduce the power consumption with about 20% for small digit-sizes without affecting the throughput of the filter. We also show that introducing a large number of registers in the filter structure will increase the current consumption. This trade-off, between reducing the amount of glitches and the increase in the number of registers, is also considered in this work.