The power modeling of different realizations of cascaded integrator-comb (CIC) decimation filters has been a subject of several recent works. In this work we have extended these with modeling of leakage power, which is an important factor since the input sample rate may differ several orders of magnitude. Furthermore, we have pointed out the importance of the input wordlength on the comparison of recursive and nonrecursive implementations.
Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we investigate the performance of a hybrid decoding algorithm, using an approximating early decision algorithm and a regular probability propagation algorithm. When the early decision algorithm fails, the block is re-decoded using a probability propagation decoder. As almost all errors are detectable, the error correction performance of the hybrid algorithm is negligibly detoriated. However, simulations still achieve a 32% decrease of memory accesses.
Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we investigate a modification to the decoding algorithm, using early decisions for bits with high reliabilities. This reduces the amount of messages passed by the algorithm, which can be expected to reduce the switching activity of a hardware implementation. While direct application of the modification results in severe performance penalties, we show how to adapt the algorithm to reduce the impact, resulting in a negligible decrease in error correction performance.
We investigate a modification to the sum-product algorithm used for decoding low-density parity-check (LDPC) codes. The sum-product algorithm is algorithmically simple and highly parallelizable, but suffers from high memory usage, making LDPC codes unsuitable for usage in battery powered devices such as cell phones and PDAs. The proposed modification defines a measure of bit reliabilities during the decoding process. Whenever the reliability of a bit is over a certain threshold, the bit is declared decided, and its messages are no longer calculated. We give experimental results for white Gaussian channels, and show that the amount of memory accesses can be substantially reduced, while performance does not suffer significantly. At a bit error rate of 10^-4, the number of memory accesses is halved, while the required transmitter power increases about 0.3 dB.
Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we investigate a modification to the decoding algorithm, using early decisions for bits with high reliabilities. Currently, there are two early decision schemes proposed. We compare their theoretical performances and their suitability for hardware implementation. We also propose a new decision method, which we call weak decisions, that offers an increase in performance by a factor of two.
Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we propose an architecture for an early decision decoding algorithm. The algorithm significantly reduces the number of memory accesses. Simulation results show that the increased energy dissipation of the components is small compared to the reduced dissipation of the memories.
A class of M-channel tree-structured digital filter banks is considered with half-band IIR analysis filters and FIR synthesis filters. The filter banks approximate perfect reconstruction with an exact linear phase response and zero aliasing. This particular configuration has a very low-complexity analysis filter bank. In this paper we address the problem of minimizing the synthesis filter complexity, given the minimum-complexity analysis filters. The filter banks are designed using linear and nonlinear programming
A class of tree-structured octave-band digital filter banks is introduced. The filter banks make use of half-band IIR filters in the analysis bank and FIR filters in the synthesis bank which results in very low-complexity analysis filters. Further, the overall complexity is lower than, or comparable to, that of conventional filter banks. The overall filter banks approximate perfect reconstruction in the following sense. The distortion function is a linear-phase function approximating one in magnitude whereas the aliasing terms approximate zero. The filter banks are designed using linear and nonlinear programming. Design examples are included demonstrating the properties of the new filter banks
This paper introduces new tree-structured uniform-band and octave-band digital filter banks (FBs). These FBs make use of half-band IIR filters in the analysis FBs and FIR filters in the synthesis FBs. The resulting FBs are asymmetric in the sense that the analysis FB has a very low arithmetic complexity whereas that in the synthesis FB is higher. However, compared with other asymmetric FBs, the proposed ones have in many cases a lower overall arithmetic complexity and delay. The proposed FBs have magnitude distortion but no phase distortion, further, the aliasing components are either zero (uniform-band case) or approximately zero (octave-band case). The FBs are designed using linear and nonlinear programming. Design examples are included demonstrating the properties of the proposed filters banks. ⌐ 2003 Published by Elsevier B.V.
A new class of two-channel IIR/FIR filter banks was introduced by the authors in 2000 with half-band IIR analysis filters and FIR synthesis filters. This type of filter bank features very low-complexity analysis filters and simultaneously a low overall complexity. In this paper, we consider finite-wordlength effects of these filter banks
Multiplier blocks have been shown to require a small number of adders for multiplying one data sample with multiple, constant, coefficients. The previously proposed multiplier block algorithms have been using carry-propagation adders. However, for high-speed applications carry-save adders are a better choice. Although it is possible to map carry-save adders to carry-propagation adders, it is shown that this mapping is inconsistent in the number of carry-save adders required for a given number of carry-propagation adders for multiplier blocks. Therefore, a multiplier block algorithm for carry-save adders is proposed. It is shown that the proposed algorithm is producing multiplier blocks with consistently fewer carry-save adders compared with starting with a carry-propagation multiplier block and mapping it to carry-save adders. Further, it is shown that the proposed algorithm produces multiplier blocks with fewer carry-save adders than algorithms based on subexpression sharing
In many digital signal processing algorithms, e.g., linear transforms and digital filters, the multiplier coefficients are constant. Hence, it is possible to implement the multiplier using shifts, adders, and subtracters. In this work two approaches to realize constant coefficient multiplication with few adders and subtracters are presented. The first yields optimal results, i.e., a minimum number of adders and subtracters, but requires an exhaustive search. Compared with previous optimal approaches, redundancies in the exhaustive search cause the search time to be drastically decreased. The second is a heuristic approach based on signed-digit representation and subexpression sharing. The results for the heuristic are worse in only approximately 1% of all coefficients up to 19 bits. However, the optimal approach results in several different optimal realizations, from which it is possible to pick the best one based on other criteria. Relations between the number of adders, possible coefficients, and number of cascaded adders are presented, as well as exact equations for the number of required full and half adder cells. The results show that the number of adders and subtracters decreases on average 25% for 19-bit coefficients compared with the canonic signed-digit representation.
By introducing simplifications to multiplier graphs we extend the previous work on minimum adder multipliers to five adders and show that this is enough to express all coefficients up to 19 bits. The average savings are more than 25% for 19 bits compared with CSD multipliers. The simplifications include addition reordering and vertex reduction to see that different graphs can generate the same coefficient sets. Thus, fewer graphs need to be evaluated. A classification of the graphs reduces the effort to search the coefficient space further.
In this work we formulate a mixed integer linear programming (MILP) problem that minimizes the number of signed-power-of- two (SPT) terms given a filter specification for linear-phase frequency-response masking (FRM) filters. The proposed method designs the filters in two steps. The model filter and the masking filters are designed separately, but subject to each other. Hence, it is not guaranteed that the global minimum is obtained. However, each solution is optimal given the other filter(s), and iteration may improve the overall solution. The filter design problems are formulated using normalized peak ripple magnitude (NPRM), which for FRM filters introduces some implications, which is also discussed in this work.
For recursive filter the maximal sample frequency is bounded by the recursive loops in the filter. [In this paper, it is understood that recursive filters are infinite-length impulse response (IIR) filters.] In this work, a filter structure based on the use of the frequency masking approach is presented that increases the maximal sample frequency for narrowband and wideband filters by introducing more delay elements in the recursive loops. By using identical subfilters (except for the periods), the subfilters can be mapped using folding to a single pipeline/interleaved arithmetic structure yielding an area-efficient implementation. The filters are potentially suitable for low-power implementation by using power supply voltage scaling techniques. In this work, the design of the filters is discussed and estimations of the ripples are derived. Two examples show the viability of the proposed method.
In this work filter structures that decrease the required number of multipliers and adders for implementation of linear-phase FIR filters using frequency-response masking techniques are introduced. The basic idea of the proposed structures is that identical subfilters are used. This leads to the same arithmetic structure can be multiplexed in the implementation, reducing the number of required multipliers and adders. The subfilters are mapped using the folding transformation to obtain an area-efficient time-multiplexed (or pipeline/interleaved) implementation. Both narrow-band and wide-band frequency-response masking as well as arbitrary bandwidth frequency-response masking techniques are considered. The filter design is discussed and for each filter structure the limits on the specifications are derived. Designed examples show the usefulness of the proposed structures.
Multiple constant multiplication (MCM), i.e., realizing a number of constant multiplications using a minimum number of adders and subtracters, has been an active research area for the last decade. Almost all work has been focused on single rate FIR filters. However, for polyphase interpolation and decimation FIR filters there are two different implementation alternatives. For interpolation, direct form subfilters lead to fewer registers as they can be shared among the subfilters. The arithmetic part corresponds to a matrix vector multiplication. Using transposed direct form subfilters, the registers can not be shared, while the arithmetic part has the same input to all coefficients, and, hence, the redundancy between the coefficients is expected to be higher. For decimation filters the opposite holds for direct form and transposed direct form subfilters. In this work we discuss the trade-off between adders/subtracters and registers, and present implementation results for area, speed, and power for different realizations
Recently, a novel technique to compute sine and cosine has been proposed. By rewriting the expressions using trigonometric equations a weighted sum of bit-products is used to compute the values. This can then be mapped onto a bit-product generator followed by an adder tree. This provides an efficient architecture that can be pipelined to an arbitrary degree. It was shown in previous work that it is possible to remove a large portion of the bit-products and still obtain accurate results. The objective of this work is to study the effects of this removal and also the finite wordlength representation of the weights. Furthermore, optimization problems are formulated that can be used to minimize the maximum absolute error, the average absolute error, and the mean square error for the output values, respectively, as well as implementation complexity under error constraints.
Sub-expression sharing is a technique that can be applied to reduce the complexity of linear time-invariant non-recursive computations by identifying common patterns. It has recently been proposed that it is possible to improve the performance of single and multiple constant multiplication by identifying overlapping digit patterns. In this work we extend the concept of overlapping digit patterns to arbitrary shift dimensions, such as shift in time (FIR filters). Â© 2010 IEEE.
Recently, a novel technique for the multiple constant multiplication (MCM) problem using minimum spanning trees (MSTs) has been proposed. The approach works by finding simple differences between the coefficients to realize and then applying the same method to the differences (which is an MCM problem as well). Each iteration is divided into two steps. First, finding a minimum spanning tree in the graph describing the differences between the coefficients. Second, as each edge in the graph may correspond to more than one difference, one difference is selected for each edge in the MST. Generally, both these stages have multiple solutions. The aim of this work is to more closely study how the MST and the differences should be selected to give better total results. It is also discussed how the two stages in each iteration may be joined into one problem.
In this paper a novel approach for realizing constant coefficient matrix multiplication using few additions and subtractions is proposed. This method is applicable in, e.g., FIR filter banks, transforms, and polyphase form FIR filters for sample rate changes. Examples show that the proposed method yields good results compared to realizing the matrix multiplication by utilizing multiple coefficient multiplication techniques for the rows or columns separately.
In this work a novel approach to multiple constant multiplication based on minimum spanning trees is proposed. Each required coefficient is assigned to a vertex in a graph. The vertices are connected with weighted edges, where each edge weight corresponds to the number of adders required to derive one of the coefficient from the previous. The graph can be used to solve for the minimum spanning tree, which leads to a realization with a small number of adders. The optimal minimum spanning tree can be found in polynomial time. It is also possible to add extra constraints to the spanning tree, such as limited out-degree (corresponds to fan-out) and limited tree height (corresponds to delay). These problems are harder to solve, but there are good heuristics available. It is shown by simulation that the performance of the proposed algorithm is comparable with recently published algorithms.
Handbook of Signal Processing Systems is organized in three parts. The first part motivates representative applications that drive and apply state-of-the art methods for design and implementation of signal processing systems; the second part discusses architectures for implementing these applications; the third part focuses on compilers and simulation tools, describes models of computation and their associated design tools and methodologies. This handbook is an essential tool for professionals in many fields and researchers of all levels.
General‐purpose DSP processors, application‐specific processors, and algorithm‐specific processors are used to implement different types of DSP systems or subsystems. They are typically used in applications involving complex and irregular algorithms while application‐specific processors provide lower unit cost and higher performance for a specific application, particularly when the volume of production is high. Most DSP applications use fractional arithmetic instead of integer arithmetic. Multimedia and communication applications involve real‐time audio and video/image processing which very often require sum‐of‐products (SOP) computation. The need of computing non‐linear functions arises in many different applications. The straightforward method of approximating an elementary function is to just store the values in a look‐up table typically leads to large tables, even though the resulting area from standard cell synthesis grows slower than the number of memory bits. It is of interest to find ways to approximate elementary functions using a trade‐off between arithmetic operations and look‐up tables.
In this work, we introduce a novel approach to digit-serial/parallel multiplication. This general class of multipliers is based on shift-accumulation which also makes the approach suitable for implementation of shift-accumulators in distributed arithmetic. As a variable in the design process, the maximal number of cascaded full-adders can be selected. Thus, it is possible to, as a special case, obtain a bit-level pipelined multiplier. Both general and fixed coefficient multiplication is considered. The hardware complexity is low compared with other approaches.
In this work we formulate a mixed integer linear programming (MILP) problem for designing linear-phase FIR filters with low arithmetic complexity. By incorporating subexpression sharing in the problem formulation, the number of adders will be lower compared with previous approaches, where minimizing the number of non-zero bits of the coefficients has been the objective.
In this work a mixed integer linear programming (MILP) formulation for the design of a class of linear-phase FIR filters are presented. The formulation can be solved using general purpose MILP solvers to obtain filter implementationwith a minimum number of signed-power-of-two (SPT) terms given a filter specification. The filter structures considered are based on reduced complexity polyphase decomposition. It is shown that the total number of SPT terms per sample can be reduced using this filter architecture. However, the savings are not as large as other work propose, when optimal design techniques are used.
Subexpression sharing is an important implementation issue when one data is multiplied with many constants or a sum of products is computed. By modelling the subexpression sharing problem using integer linear programming (ILP) an optimal solution can be found. Further, the model can be directly incorporated with the design of algorithms that have linear design constraints, e.g., linear-phase FIR filters. The proposed method is compared with previously reported algorithms. It produces better results than other subexpression sharing methods, even though it is still not comparable with the optimal method based on graph representation. However, the possibility to expand the ILP model beyond subexpression sharing is discussed. This would then produce identical results to the optimal adder graph method.
In this work we discuss the realization of constant multiplication using a minimum number of carry-save adders. We consider both non-redundant and carry-save representation for the input data. For both cases we present all possible interconnection topologies, using up to six and five adders, respectively. These are sufficient to realize constant multiplications for all coefficients with a wordlength up to 19 bits.
In this paper, we discuss an optimization-based approach for design space exploration to find limitations and possible trade-offs between performance metrics in analog circuits. The exploration guides the designer when making design decisions. For the design space exploration, which is expensive in terms of computation time, we use an optimization-based device sizing tool that runs concurrent optimization tasks on a network of workstations. The tool enables efficient and accurate exploration of the available design space. As a design example, we investigate three operational transconductance amplifiers, OTAs, implemented in a standard 0.35-mum CMOS process. This example shows that large savings in terms of chip area and power consumption can be made by selecting the most suitable circuit.