liu.seSearch for publications in DiVA
Change search
Refine search result
1234 1 - 50 of 173
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Computational and Implementation Complexity of Polynomial Evaluation Schemes2011In: Proceedings of NORCHIP, 2011 Date:14-15 Nov. 2011, IEEE conference proceedings, 2011, p. 1-6Conference paper (Refereed)
    Abstract [en]

    In this work, we consider the computational complexity of different polynomial evaluation schemes. By considering the number of operations of different types, critical path, pipelining complexity, and latency after pipelining, high-level comparisons are obtained. These can then be used to short list suitable candidates for an implementation given the specifications. Not only multiplications are considered, but they are divided into data-data multiplications, squarers, and data-coefficient multiplications, as the latter can be optimized depending on implementation architecture and application.

  • 2.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Integer Linear Programming Modeling of Addition Sequences With Additional Constraints for Evaluation of Power TermsManuscript (preprint) (Other academic)
    Abstract [en]

    In this work, an integer linear programming (ILP) based model is proposed for the computation of a minimal cost addition sequence for a given set of integers. Since exponents are additive under multiplication, the minimal length addition sequence will provide an optimal solution for the evaluation of a requested set of power terms. This in turn finds application in, e.g., window-based exponentiation for cryptography and polynomial evaluation. Not only is an optimal model proposed, the model is extended to consider different costs for multipliers and squarers as well as controlling the depth of the resulting addition sequence.

  • 3.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Switching Activity Estimation of CIC Filter Integrators2010In: Proceedings of Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), 2010, Date:22-24 Sept. 2010, IEEE , 2010, p. 21-24Conference paper (Refereed)
    Abstract [en]

    In this work, a method for estimation of the switching activity in integrators is presented. To achieve low power, it is always necessary to develop accurate and efficient methods to estimate the switching activity. The switching activities are then used to estimate the power consumption. In our work, the switching activity is first estimated for the general purpose integrators and then it is extended for the estimation of switching activity in cascaded integrators in CIC filters.

  • 4.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Switching Activity Estimation of DDFS Phase AccumulatorsManuscript (preprint) (Other academic)
    Abstract [en]

    In this letter, equations for the one’s probability and switching activities for direct digital frequency synthesis (DDFS) phase accumulators are derived. These results are useful for obtaining good accuracy estimated of both leakage and dynamic power consumption for the phase accumulator and the phase-to-magnitude converter.

  • 5.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Blad, Anton
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Low-Complexity Parallel Evaluation of Powers Exploiting Bit-Level Redundancy2010In: Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 2010, 7-10 Nov. 2010 / [ed] Michael B. Matthews, Washington, DC, USA: IEEE Computer Society , 2010, p. 1168-1172Conference paper (Refereed)
    Abstract [en]

    In this work, we investigate the problem of computing any requested set of power terms in parallel using summations trees. This problem occurs in applications like polynomial approximation, Farrow filters (polynomial evaluation part) etc. In the proposed technique, the partial product of each power term is initially computed independently. A redundancy check is then made in each and among all partial products matrices at bit level. The redundancy here relates to the fact that same three partial products may be present in more than one columns, and, hence, can be mapped to the same full adder. The proposed algorithm is tested for different sets of powers and wordlengths to exploit the sharing potential.

  • 6.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Johansson, Håkan
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    On the Fixed-Point Implementation of Fractional-Delay Filters Based on the Farrow Structure2013In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 60, no 4, p. 926-937Article in journal (Refereed)
    Abstract [en]

    In this paper, the fixed-point implementation of adjustable fractional-delay filters using the Farrow structure is considered. Based on the observation that the sub-filters approximate differentiators, closed-form expressions for the L-2-norm scaling values at the outputs of each sub-filter as well as at the inputs of each delay multiplier are derived. The scaling values can then be used to derive suitable word lengths by also considering the round-off noise analysis and optimization. Different approaches are proposed to derive suitable word lengths including one based on integer linear programming, which always gives an optimal allocation. Finally, a new approach for multiplierless implementation of the sub-filters in the Farrow structure is suggested. This is shown to reduce register complexity and, for most word lengths, require less number of adders and subtracters when compared to existing approaches.

  • 7.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Johansson, Håkan
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Scaling of fractional delay filters based on the Farrow structure2009In: Proceedings of IEEE International Symposium on Circuits and Systems, 2009. ISCAS 2009, Piscataway: IEEE , 2009, p. 489-492Conference paper (Refereed)
    Abstract [en]

    In this work we consider scaling of fractional delay filters using the Farrow structure. Based on the observation that the subfilters approximate the Taylor expansion of a differentiator, we derive estimates of the L2-norm scaling values at the outputs of each subfilter as well as at the inputs of each delay multiplier. The scaling values can then be used to derive suitable wordlengths in a fixed-point implementation.

  • 8.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Wanhammar, Lars
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Power Estimation of Recursive and Non-Recursive CIC Filters Implemented in Deep-Submicron Technology2010In: Proceedings of International Conference on Green Circuits and Systems (ICGCS), 2010, Date: 21-23 June, 2010, IEEE , 2010, p. 221-225Conference paper (Refereed)
    Abstract [en]

    The power modeling of different realizations of cascaded integrator-comb (CIC) decimation filters has been a subject of several recent works. In this work we have extended these with modeling of leakage power, which is an important factor since the input sample rate may differ several orders of magnitude. Furthermore, we have pointed out the importance of the input wordlength on the comparison of recursive and nonrecursive implementations.

  • 9.
    Abbas, Muhammad
    et al.
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Qureshi, Fahad
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Ullah Sheikh, Zaka
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Johansson, Håkan
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Johansson, Kenny
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Comparison of Multiplierless Implementation of Nonlinear-Phase Versus Linear-Phase FIR filters2008Conference paper (Refereed)
    Abstract [en]

    FIR filters are often used because of their linear-phase response. However, there are certain applications where the linear-phase property is not required, such as signal energy estimation, but IIR filters can not be used due to the limitation of sample rate imposed by the recursive algorithm. In this work, we discuss multiplierless implementation of minimum order, and therefore nonlinear-phase, FIR filters and compare it to the linear-phase counterpart.

  • 10.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wikner, J. Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    On Scaling and Output Cardinality of Multi-Bit Digital Error-Feedback Modulators2012Manuscript (preprint) (Other academic)
    Abstract [en]

    In order to determine a maximum allowed input scale for the stable operation of higher-order delta-sigma modulators, the designers largely depend on the analytical and numerical analysis. In this brief, the maximum allowed input scale to a multi-bit digital error-feedback  deltasigma modulator of arbitrary order is derived, mathematically. The digital modulator with an arbitrary output word length is stable if its output does not overflow. Thus, to avoid overflow of the modulator output, the relations between the peak values of the involved digital signals are devised. A number of example configurations are presented to illustrate the usefulness of the derivations.

  • 11.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Wikner, J. Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Digital Multi-bit Cascaded Error-Feedback ΔΣ Modulators With Reduced Hardware and Power Consumption2012Manuscript (preprint) (Other academic)
    Abstract [en]

    The hardware of the multi-bit digital error feedback modulator (EFM) of arbitrary order has recently been reduced by using multiple EFMs in cascade. In this paper, a modified cascading strategy is devised. Parts of the processing of consecutively placed EFM stages are merged such that a significant amount of circuitry is removed in each stage. In the proposed design, the modulated output is represented by a set of encoded signals to be used by the signal processing block placed after the EFM.

    To illustrate the savings, a number of configurations of fourth-order EFM designs, composed of two- and three-cascaded stages, have been synthesized in a 65 nm CMOS process technology using conventional and the proposed implementation techniques. Savings of 52.7% and 47%, in terms of area and power consumption, respectively, at an oversampling ratio of 4 could be obtain. The trade-off between sampling frequency and hardware cost is also presented. Due to reduced hardware an increase of up to 600 MHz in the sampling frequency is achieved.

  • 12.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Reducing Complexity and Power of Digital Multibit Error-Feedback Delta Sigma Modulators2014In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 61, no 9, p. 641-645Article in journal (Refereed)
    Abstract [en]

    In this brief, we propose how the hardware complexity of arbitrary-order digital multibit error-feedback delta-sigma modulators can be reduced. This is achieved by splitting the combinatorial circuitry of the modulators into two parts, i.e., one producing the modulator output and another producing the error signal fed back. The part producing modulator output is removed by utilizing a unit-element-based digital-to-analog converter. To illustrate the reduced complexity and power consumption, we compare the synthesized results with those of conventional structures. Fourth-order modulators implemented with the proposed technique use up to 26% less area compared with conventional implementations. Due to the area reduction, the designs consume up to 33% less dynamic power. Furthermore, it can operate at a frequency 100 MHz higher than that of the conventional.

  • 13.
    Ahmed, Tanvir
    et al.
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido, Mario
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    A 512-point 8-parallel pipelined feedforward FFT for WPAN2011In: 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), IEEE , 2011, p. 981-984Conference paper (Refereed)
    Abstract [en]

    This paper presents a 512-point feedforward FFT architecture for wireless personal area network (WPAN). The architecture processes a continuous flow of 8 samples in parallel, leading to a throughput of 2.64 GSamples/s. The FFT is computed in three stages that use radix-8 butterflies. This radix reduces significantly the number of rotators with respect to previous approaches based on radix-2. Besides, the proposed architecture uses the minimum memory that is required for a 512-point 8-parallel FFT. Experimental results show that besides its high throughput, the design is efficient in area and power consumption, improving the results of previous approaches. Specifically, for a wordlength of 16 bits, the proposed design consumes 61.5 mW and its area is 1.43 mm2.

  • 14.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliersManuscript (preprint) (Other academic)
    Abstract [en]

    A unified approach to the design and implementation of computation sharing multiplier based on Booth and standard high-radix multiplication schemes is presented here. Both of these multiplication schemes have various building blocks and one of which is the pre-computer which can be shared across a number of multiplications if the multiplicand to the multipliers is same, like in a transposed direct form (TDF) finitelength impulse response (FIR) filter. Closed form expressions to estimate the cost of different building blocks based on different schemes have been developed and analyzed in different dimensions. Standalone multipliers and as part of computation sharing in FIR filters and complex multipliers have been realized in hardware and synthesized using standard cell library.

    It is shown that apart from word length and filter length, the ratio  between the cost of implementing adders and multiplexers has an effect on the choice of optimal radix. The higher the ratio, the lower is the cost of implementing multiplexers which will benefit high radix. Higher radix will also benefit from computation sharing if the cost of one multiplication for it is less than the lower radix and it is shown that radix-16 Booth multiplier achieves lower area complexity and power consumption by an average of 7% and 17%, respectively.

  • 15.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain2014In: VLSI design (Print), ISSN 1065-514X, E-ISSN 1563-5171, Vol. 2014, no 217495Article in journal (Refereed)
    Abstract [en]

    Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because ofmultiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directlyreplaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in theLNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branchingstrategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditionalfinite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturallyprovides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length propertiesof FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error comparedto a standard FIR filter.

  • 16.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Generalized Division-Free Architecture and Compact Memory Structure for Resampling in Particle Filters2015In: 2015 European Conference on Circuit Theory and Design (ECCTD), IEEE Press, 2015, p. 416-419Conference paper (Refereed)
    Abstract [en]

    The most challenging step of implementing particle filtering is the resampling step which replicates particles with large weights and discards those with small weights. In this paper, we propose a generic architecture for resampling which uses double multipliers to avoid normalization divisions and make the architecture  equally efficient for non-powers-of-two number of particles. Furthermore, the complexity of resampling is greatly affected by the size of memories used to store weights. We illustrate that by storing the original weights instead of their cumulative sum and calculating them online reduces the total complexity, in terms of area, ranging from 21% to 45%, while giving up to 50% reduction in memory usage.

  • 17.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Implementation of Narrow-Band Frequency-Response Masking for Efficient Narrow Transition Band FIR Filters on FPGAs2011In: NORCHIP, 2011, IEEE conference proceedings, 2011, p. 1-4Conference paper (Refereed)
    Abstract [en]

    The complexity of narrow transition band FIR filters is highand can be reduced by using frequency response masking (FRM) techniques. Thesetechniques use a combination of periodic model filters and masking filters. Inthis paper, we show that time-multiplexed FRM filters achieve lowercomplexity, not only in terms of multipliers, but also logic elements compared to time-multiplexed singlestage filters. The reduced complexity also leads to a lower power consumption. Furthermore, we show that theoptimal period of the model filter is dependent on the time-multiplexing factor.

  • 18.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Implementation of Time-Multiplexed Sparse Periodic FIR Filters for FRM on FPGAs2011Conference paper (Refereed)
    Abstract [en]

    Frequency-response masking (FRM) is a set of techniques for lowering the computational complexity of narrow transition band FIR filters. These FRM use a combination of sparse periodic filters and non-sparse filters. In this work we consider the implementation of these filters in a time-multiplexed manner on FPGAs. It is shown that the proposed architectures produce lower complexity realizations compared to the vendor provided IP blocks, which do not take the sparseness into consideration. The designs are implemented on a Virtex-6 device utilizing the built-in DSP blocks.

  • 19.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved particle filter resampling architecturesManuscript (preprint) (Other academic)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step which replicates particles with large weights and discards those with small weights because it has a high latency and can only be partially executed in parallel with the other steps of particle filtering. To reduce the latency, an improved resampling scheme is proposed in this work which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator. Architectures for realizing the pre-fetch technique are also proposed. The trade-off between the latency reduction achieved by increasing the size of the pre-fetch memory and the architectural implementation complexity has been analyzed. Results show that a pre-fetch of five achieves the best area-latency trade-off while on average achieving an 85% reduction in the latency.

    We also propose a generic double multiplier architecture for resampling which avoids normalization divisions and makes the architecture equally efficient for non-powers-of-two number of particles as well as removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. It is further improved by computing the cumulative sum of weights on-the-fly which helps in reducing the size of the weight memories by up to 50%.

  • 20.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    On the implementation of time-multiplexed frequency-response masking filters2016In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed)
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

  • 21.
    Andersson, Niklas
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Vesterbacka, Mark
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oskar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Steady-state cycles in digital oscillators2014Manuscript (preprint) (Other academic)
    Abstract [en]

    Digital recursive oscillators locked in steady-state can be used to generate sinusoids with high spectral purity. The locking occurs when the oscillator returns to a previously visited state and repeats its sequence. In this work we propose a new search algorithm and two new search strategies to find all steady-states for a given oscillator configuration. The improvement in spurious-free dynamic range is between 7 and 40 dB compared to previously reported results. The algorithm is also able to find oscillator sequences for more frequencies than previously reported work. A key part of the method is the reduction of the search space made possible by a proposed extension of existing theory on recursive oscillators. Specific properties of digital oscillators in a steady-state are also discussed. It is shown that the initial states can be used to individually control the phase, amplitude, spectral purity, and also cycle length of the oscillator output.

  • 22.
    Ashrafi, Ashkan
    et al.
    San Diego State University.
    Strollo, Antonio G. M.
    University of Napoli Federico II.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Hardware implementation of digital signal processing algorithms2013In: Journal of Electrical and Computer Engineering, ISSN 2090-0147, E-ISSN 2090-0155, Vol. 2013, no 782575, p. 1-2Article in journal (Other academic)
  • 23.
    Athar, Saima
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Qureshi, Fahad
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Kale, Izzet
    University of Westminster, London, United Kingdom.
    On the efficient computation of single-bit input word length pipelined FFTs2011In: IEICE Electronics Express, ISSN 1349-2543, E-ISSN 1349-2543, Vol. 8, no 17, p. 1437-1443Article in journal (Refereed)
    Abstract [en]

    This letter describes an efficient architecture for the computation of fast Fourier transform (FFT) algorithms with single-bit input. The proposed architecture is aimed for the first stages of pipelined FFT architectures, processing one sample per clock cycle, hence making it suiable for real-time FFT computation. Since natural input order pipeline FFTs use large memories in the early stages, it is important to keep the word length shorter in the beginning of the pipeline. By replacing the initial butterflies and rotators of an architecture with that of the proposed block, the memory requirements can be significantly reduced. Comparisons with the commonly used single delay feedback (SDF) architecture show that more than 50% of the required memory can be saved in some cases.

  • 24.
    Backenius, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Säll, Erik
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Bidirectional Conversion to Minimum Signed-Digit Representation2006In: Circuits and Systems, 2006. ISCAS 2006., 2006Conference paper (Other academic)
    Abstract [en]

    In this work an approach to converting a number in two's complement representation to a minimum signed-digit representation is proposed. The novelty in this work is that this conversion is done from left-to-right and right-to-left concurrently. Hence, the execution time is significantly decreased, while the area overhead is small.

  • 25.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing2016In: 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, Washington: IEEE COMPUTER SOC , 2016, p. 864-868Conference paper (Refereed)
    Abstract [en]

    Massive MIMO-systems have received considerable attention in recent years as an enabler in future wireless communication systems. As the idea is based on having a large number of antennas at the base station it is important to have both a scalable and distributed realization of such a system to ease deployment. Most work so far have focused on the theoretical aspects although a few demonstrators have been reported. In this work, we propose a base station architecture based on connecting the processing nodes in a K-ary tree, allowing simple scalability. Furthermore, it is shown that most of the processing can be performed locally in each node. Further analysis of the node processing shows that it should be enough that each node contains one or two complex multipliers and a few complex adders/subtracters operating at some hundred MHz. It is also shown that a communication link of some Gbps is required between the nodes, and, hence, it is fully feasible to have one or a few links between the nodes to cope with the communication requirements.

  • 26.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO2017Conference paper (Refereed)
    Abstract [en]

    Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.

  • 27.
    Blad, Anton
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Gustafsson, Oscar
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Bit-level optimized high-speed architectures for decimation filter applications2008In: IEEE International Symposium on Circuits and Systems,2008, Piscataway, NJ: IEEE , 2008, p. 1914-Conference paper (Refereed)
  • 28.
    Blad, Anton
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Energy-Efficient Data Representation in LDPC Decoders2006In: Electronics Letters, ISSN 0013-5194, E-ISSN 1350-911X, Vol. 42, no 18, p. 1051-1052Article in journal (Refereed)
    Abstract [en]

    Data representations for LDPC decoders using the sum-product algorithm in the log-likelihood domain are considered. It is suggested that the look-up table implementation of the domain transform function is separated into two parts, allowing a compact representation of the internal state data. Memories and bus widths can be reduced by typically 16\%, while the imposed hardware overhead is insignificant.

  • 29.
    Blad, Anton
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    FPGA implementation of rate-compatible QC-LDPC code decoder2011Conference paper (Other academic)
    Abstract [en]

    The use of rate-compatible error correcting codes offers severaladvantages as compared to the use of fixed-rate codes: a smooth adaptationto the channel conditions, the possibility of incremental Hybrid ARQschemes, as well as simplified code representations in the encoder anddecoder. In this paper, the implementation of a decoder for rate-compatiblequasi-cyclic LDPC codes is considered. The decoder uses check node mergingto increase the convergence speed of the algorithm. Check node mergingallows the decoder to achieve the same performance with a significantlylower number of iterations, thereby increasing the throughput.

    The feasibility of a check node merging decoder is investigated for codesfrom IEEE 802.16e and IEEE 802.11n. The faster convergence rate of the checknode merging algorithm allows the decoder to be implemented using lowerparallelization factors, thereby reducing the logic complexity. The designshave been synthesized to an Altera Cyclone II FPGA, and results showsignificant increases in throughput at high SNR.

  • 30.
    Blad, Anton
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Integer Linear Programming-Based Bit-Level Optimization for High-Speed FIR Decimation Filter Architectures2010In: CIRCUITS SYSTEMS AND SIGNAL PROCESSING, ISSN 0278-081X, Vol. 29, no 1, p. 81-101Article in journal (Refereed)
    Abstract [en]

    Analog-to-digital converters based on sigma-delta modulation have shown promising performance, with steadily increasing bandwidth. However, associated with the increasing bandwidth is an increasing modulator sampling rate, which becomes costly to decimate in the digital domain. Several architectures exist for the digital decimation filter, and among the more common and efficient are polyphase decomposed finite-length impulse response (FIR) filter structures. In this paper, we consider such filters implemented with partial product generation for the multiplications, and carry-save adders to merge the partial products. The focus is on the efficient pipelined reduction of the partial products, which is done using a bit-level optimization algorithm for the tree design. However, the method is not limited only to filter design, but may also be used in other applications where high-speed reduction of partial products is required. The presentation of the reduction method is carried out through a comparison between the main architectural choices for FIR filters: the direct-form and transposed direct-form structures. For the direct-form structure, usage of symmetry adders for linear-phase filters is investigated, and a new scheme utilizing partial symmetry adders is introduced. The optimization results are complemented with energy dissipation and cell area estimations for a 90 nm CMOS process.

  • 31.
    Blad, Anton
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Redundancy reduction for high-speed FIR filter architectures based on carry-save adder trees2010In: International Symposium on Circuits and Systems, IEEE , 2010Conference paper (Refereed)
    Abstract [en]

    In this work we consider high-speed FIR filter architectures implemented using, possibly pipelined, carry-save adder trees for accumulating the partial products. In particular we focus on the mapping between partial products and full adders and propose a technique to reduce the number of carry-save adders based on the inherent redundancy of the partial products. The redundancy reduction is performed on the bit-level to also work for short wordlength data such as those obtained from sigma-delta modulators.

  • 32.
    Blad, Anton
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering.
    Wanhammar, Lars
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    A Hybrid Early Decision-Probability Propagation Decoding Algorithm for Low-Density Parity-Check Codes2005In: Asilomar Conference on Signals, Systems and Computers,2005, IEEE , 2005, p. 586-Conference paper (Refereed)
    Abstract [en]

    Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we investigate the performance of a hybrid decoding algorithm, using an approximating early decision algorithm and a regular probability propagation algorithm. When the early decision algorithm fails, the block is re-decoded using a probability propagation decoder. As almost all errors are detectable, the error correction performance of the hybrid algorithm is negligibly detoriated. However, simulations still achieve a 32% decrease of memory accesses.

  • 33.
    Blad, Anton
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering.
    Wanhammar, Lars
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    An early decision decoding algorithm for LDPC codes using dynamic thresholds2005In: European Conference on Circuit Theory and Design,2005, IEEE , 2005, p. III/285-Conference paper (Refereed)
    Abstract [en]

    Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we investigate a modification to the decoding algorithm, using early decisions for bits with high reliabilities. This reduces the amount of messages passed by the algorithm, which can be expected to reduce the switching activity of a hardware implementation. While direct application of the modification results in severe performance penalties, we show how to adapt the algorithm to reduce the impact, resulting in a negligible decrease in error correction performance.

  • 34.
    Blad, Anton
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering.
    Wanhammar, Lars
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    An LDPC decoding algorithm utilizing early decisions2005In: National Conference of Radio Science RVK,2005, 2005Conference paper (Refereed)
    Abstract [en]

    We investigate a modification to the sum-product algorithm used for decoding low-density parity-check (LDPC) codes. The sum-product algorithm is algorithmically simple and highly parallelizable, but suffers from high memory usage, making LDPC codes unsuitable for usage in battery powered devices such as cell phones and PDAs. The proposed modification defines a measure of bit reliabilities during the decoding process. Whenever the reliability of a bit is over a certain threshold, the bit is declared decided, and its messages are no longer calculated. We give experimental results for white Gaussian channels, and show that the amount of memory accesses can be substantially reduced, while performance does not suffer significantly. At a bit error rate of 10^-4, the number of memory accesses is halved, while the required transmitter power increases about 0.3 dB.

  • 35.
    Blad, Anton
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering.
    Wanhammar, Lars
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Early decision decoding methods for low-density parity-check codes2005In: Swedish System-on-Chip Conference,2005, 2005Conference paper (Other academic)
    Abstract [en]

    Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we investigate a modification to the decoding algorithm, using early decisions for bits with high reliabilities. Currently, there are two early decision schemes proposed. We compare their theoretical performances and their suitability for hardware implementation. We also propose a new decision method, which we call weak decisions, that offers an increase in performance by a factor of two.

  • 36.
    Blad, Anton
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering.
    Wanhammar, Lars
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Implementation aspects of an early decision decoder for LDPC codes2005In: NORCHIP Conference,2005, IEEE , 2005, p. 157-Conference paper (Refereed)
    Abstract [en]

    Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we propose an architecture for an early decision decoding algorithm. The algorithm significantly reduces the number of memory accesses. Simulation results show that the increased energy dissipation of the components is small compared to the reduced dissipation of the memories.

  • 37.
    Blad, Anton
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Zheng, Meng
    Beijing Institute of Technology, Dept. E. E..
    Fei, Zesong
    Beijing Institute of Technology, Dept. E. E..
    Integer linear programming based optimization of puncturing sequences for quasi-cyclic low-density parity-check codes2010In: Proceedings of International Symposium on Turbo Codes and Iterative Information Processing, IEEE , 2010Conference paper (Refereed)
    Abstract [en]

    An optimization algorithm for the design of puncturing patterns for low-density parity-check codes is proposed. The algorithm is applied to the base matrix of a quasi-cyclic code, and is expanded for each block size used. Thus, storing puncturing patterns specific to each block size is not required. Using the optimization algorithm, the number of 1-step recoverable nodes in the base matrix is maximized. The obtained sequence is then used as a base to obtain longer puncturing sequences by a sequential increase of the allowed recovery delay. The proposed algorithm is compared to one previous greedy algorithm, and shows superior performance for high rates when the heuristics are applied to the base matrix in order to create block size-independent puncturing patterns.

  • 38.
    Blad, Anton
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Zheng, Meng
    Beijing Institute of Technology, Dept. E. E..
    Fei, Zesong
    Beijing Institute of Technology, Dept. E. E..
    Rate-compatible LDPC code decoder using check-node merging2010In: Proceedings of Asilomar Conference on Signals, Systems and Computers, IEEE , 2010, p. 1119-1123Conference paper (Refereed)
    Abstract [en]

    The use of rate-compatible error correcting codes offers several advantages as compared to the use of fixed-rate codes: a smooth adaptation to the channel conditions, the possibility of incremental Hybrid ARQ schemes, as well as sharing of the encoder and decoder implementations between the codes of different rates. In this paper, the implementation of a decoder for rate-compatible quasi-cyclic LDPC codes is considered. Assuming the use of a code ensemble obtained through puncturing of a low-rate mother code, the decoder achieves significantly reduced convergence rates by merging the check node neighbours of the punctured variable nodes. The architecture uses the min-sum algorithm with serial node processing elements to efficiently handle the wide spread of node degrees that results from the merging of the check nodes.

  • 39.
    Boopal, Padma Prasad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    A Reconfigurable FFT Architecture for Variable-Length and Multi-Streaming OFDM Standards2013In: IEEE International Symposium on Circuits and Systems (ISCAS), 2013, IEEE , 2013, p. 2066-2070Conference paper (Refereed)
    Abstract [en]

    This paper presents a reconfigurable FFT architecture for variable-length and multi-streaming WiMax wireless standard. The architecture processes 1 stream of 2048-point FFT, up to 2 streams of 1024-point FFT or up to 4 streams of 512-point FFT. The architecture consists of a modified radix-2 single delay feedback (SDF) FFT. The sampling frequency of the system is varied in accordance with the FFT length. The latch-free clock gating technique is used to reduce power consumption. The proposed architecture has been synthesized for the Virtex-6 XCVLX760 FPGA. Experimental results show that the architecture achieves the throughput that is required by the WiMax standard and the design has additional features compared to the previous approaches. The design uses 1% of the total available FPGA resources and maximum clock frequency of 313.67 MHz is achieved. Furthermore, this architecture can be expanded to suit other wireless standards.

  • 40. Dempster, Andrew
    et al.
    Gustafsson, Oscar
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronics System.
    Coleman, Jeffrey
    Towards an Algorithm for Matrix Multiplier Blocks2003In: European Conference on Circuit Theory and Design,2003, Kraków: European Circuit Society , 2003, p. 25-Conference paper (Refereed)
    Abstract [en]

    The basic elements of an algorithm for designing multiplier blocks for matrices are presented. The new algorithm often produces results superior to the best of the older algorithms applied only to columns.

  • 41.
    Eghbali, Amir
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Johansson, Håkan
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Savory, Seb J.
    UCL, England .
    Optimal Least-Squares FIR Digital Filters for Compensation of Chromatic Dispersion in Digital Coherent Optical Receivers2014In: Journal of Lightwave Technology, ISSN 0733-8724, E-ISSN 1558-2213, Vol. 32, no 8, p. 1449-1456Article in journal (Refereed)
    Abstract [en]

    This paper proposes optimal finite-length impulse response (FIR) digital filters, in the least-squares (LS) sense, for compensation of chromatic dispersion (CD) in digital coherent optical receivers. The proposed filters are based on the convex minimization of the energy of the complex error between the frequency responses of the actual CD compensation filter and the ideal CD compensation filter. The paper utilizes the fact that pulse shaping filters limit the effective bandwidth of the signal. Then, the filter design for CD compensation needs to be performed over a smaller frequency range, as compared to the whole frequency band in the existing CD compensation methods. By means of design examples, we show that our proposed optimal LS FIR CD compensation filters outperform the existing filters in terms of performance, implementation complexity, and delay.

  • 42.
    Faust, M
    et al.
    Nanyang Technology University.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Chang, C-H
    Nanyang Technology University.
    Fast and VLSI efficient binary-to-CSD encoder using bypass signal2011In: ELECTRONICS LETTERS, ISSN 0013-5194, Vol. 47, no 1, p. 18-19Article in journal (Refereed)
    Abstract [en]

    The generation of a canonical signed digit representation from a binary representation is revisited. Based on the property that each nonzero digit is surrounded by a zero digit, a hardware-efficient conversion method using bypass instead of carry propagation is proposed. The proposed method requires less area per digit and the required bypass signal can be generated or propagated with only a single NOR gate. It is shown that the proposed converter outperforms previous converters and a look-ahead circuitry to speed up the generation of bypass signals is also proposed.

  • 43.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Andersson, Rikard
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, The Institute of Technology.
    Qureshi, Fahad
    Tampere University of Technology, Finland.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Multiplierless Unity-Gain SDF FFTs2016In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 24, no 9, p. 3003-3007Article in journal (Refereed)
    Abstract [en]

    In this brief, we propose a novel approach to implement multiplierless unity-gain single-delay feedback fast Fourier transforms (FFTs). Previous methods achieve unity-gain FFTs by using either complex multipliers or nonunity-gain rotators with additional scaling compensation. Conversely, this brief proposes unity-gain FFTs without compensation circuits, even when using nonunity-gain rotators. This is achieved by a joint design of rotators, so that the entire FFT is scaled by a power of two, which is then shifted to unity. This reduces the amount of hardware resources of the FFT architecture, while having high accuracy in the calculations. The proposed approach can be applied to any FFT size, and various designs for different FFT sizes are presented.

  • 44.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Grajal, J
    University of Politecn Madrid, Spain .
    Sanchez, M A.
    University of Politecn Madrid, Spain .
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Pipelined Radix-2(k) Feedforward FFT Architectures2013In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 21, no 1, p. 23-32Article in journal (Refereed)
    Abstract [en]

    The appearance of radix-2(2) was a milestone in the design of pipelined FFT hardware architectures. Later, radix-2(2) was extended to radix-2(k). However, radix-2(k) was only proposed for single-path delay feedback (SDF) architectures, but not for feedforward ones, also called multi-path delay commutator (MDC). This paper presents the radix-2(k) feedforward (MDC) FFT architectures. In feedforward architectures radix-2(k) canbe used for any number of parallel samples which is a power of two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be used. In addition to this, the designs can achieve very high throughputs, which makes them suitable for the most demanding applications. Indeed, the proposed radix-2(k) feedforward architectures require fewer hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when several samples in parallel must be processed. As a result, the proposed radix-2(k) feedforward architectures not only offer an attractive solution for current applications, but also open up a new research line on feedforward structures.

  • 45.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Grajal, Jesus
    University of Politecn Madrid.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Optimum Circuits for Bit Reversal2011In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 58, no 10, p. 657-661Article in journal (Refereed)
    Abstract [en]

    This brief presents novel circuits for calculating bit reversal on a series of data. The circuits are simple and consist of buffers and multiplexers connected in series. The circuits are optimum in two senses: they use the minimum number of registers that are necessary for calculating the bit reversal and have minimum latency. This makes them very suitable for calculating the bit reversal of the output frequencies in hardware fast Fourier transform (FFT) architectures. This brief also proposes optimum solutions for reordering the output frequencies of the FFT when different common radices are used, including radix-2, radix-2(k), radix-4, and radix-8.

  • 46.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Grajal, Jesus
    University of Politecn Madrid.
    Accurate Rotations Based on Coefficient Scaling2011In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 58, no 10, p. 662-666Article in journal (Refereed)
    Abstract [en]

    This brief presents a novel approach for improving the accuracy of rotations implemented by complex multipliers, based on scaling the complex coefficients that define these rotations. A method for obtaining the optimum coefficients that lead to the lowest error is proposed. This approach can be used to get more accurate rotations without increasing the coefficient word length and to reduce the word length without increasing the rotation error. This brief analyzes two different situations where the optimization method can be applied: rotations that can be optimized independently and sets of rotations that require the same scaling. These cases appear in important signal processing algorithms such as the discrete cosine transform and the fast Fourier transform (FFT). Experimental results show that the use of scaling for the coefficients clearly improves the accuracy of the algorithms. For instance, improvements of about 8 dB in the Frobenius norm of the FFT are achieved with respect to using non-scaled coefficients.

  • 47.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Chen, Sau-Gee
    National Chiao Tung University, Taiwan.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    The Serial Commutator FFT2016In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, no 10, p. 974-978Article in journal (Refereed)
    Abstract [en]

    This brief presents a new type of fast Fourier transform (FFT) hardware architectures called serial commutator (SC) FFT. The SC FFT is characterized by the use of circuits for bit-dimension permutation of serial data. The proposed architectures are based on the observation that, in the radix-2 FFT algorithm, only half of the samples at each stage must be rotated. This fact, together with a proper data management, makes it possible to allocate rotations only every other clock cycle. This allows for simplifying the rotator, halving the complexity with respect to conventional serial FFT architectures. Likewise, the proposed approach halves the number of adders in the butterflies with respect to previous architectures. As a result, the proposed architectures use the minimum number of adders, rotators, and memory that are necessary for a pipelined FFT of serial data, with 100% utilization ratio.

  • 48.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Källström, Petter
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Kumm, Martin
    University of Kassel, Germany.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    CORDIC II: A New Improved CORDIC Algorithm2016In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, no 2, p. 186-190Article in journal (Refereed)
    Abstract [en]

    In this brief, we present the CORDIC II algorithm. Like previous CORDIC algorithms, the CORDIC II calculates rotations by breaking down the rotation angle into a series of microrotations. However, the CORDIC II algorithm uses a novel angle set, different from the angles used in previous CORDIC algorithms. The new angle set provides a faster convergence that reduces the number of adders with respect to previous approaches.

  • 49.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Qureshi, Fahad
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low-Complexity Multiplierless Constant Rotators Based on Combined Coefficient Selection and Shift-and-Add Implementation (CCSSI)2014In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 61, no 7, p. 2002-2012Article in journal (Refereed)
    Abstract [en]

    This paper presents a new approach to design multiplierless constant rotators. The approach is based on a combined coefficient selection and shift-and-add implementation (CCSSI) for the design of the rotators. First, complete freedom is given to the selection of the coefficients, i.e., no constraints to the coefficients are set in advance and all the alternatives are taken into account. Second, the shift-and-add implementation uses advanced single constant multiplication (SCM) and multiple constant multiplication (MCM) techniques that lead to low-complexity multiplierless implementations. Third, the design of the rotators is done by a joint optimization of the coefficient selection and shift-and-add implementation. As a result, the CCSSI provides an extended design space that offers a larger number of alternatives with respect to previous works. Furthermore, the design space is explored in a simple and efficient way. The proposed approach has wide applications in numerous hardware scenarios. This includes rotations by single or multiple angles, rotators in single or multiple branches, and different scaling of the outputs. Experimental results for various scenarios are provided. In all of them, the proposed approach achieves significant improvements with respect to state of the art.

  • 50.
    Garrido, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Acevedo, Miguel
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Challenging the Limits of FFT Performance on FPGAs2014Conference paper (Refereed)
    Abstract [en]

    This paper analyzes the limits of FFT performance on FPGAs. For this purpose, a FFT generation tool has been developed. This tool is highly parameterizable and allows for generating FFTs with different FFT sizes and amount of parallelization. Experimental results for FFT sizes from 16 to 65536, and 4 to 64 parallel samples have been obtained. They show that even the largest FFT architectures fit well in today's FPGAs, achieving throughput rates from several GSamples/s to tens of GSamples/s.

1234 1 - 50 of 173
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf