liu.seSearch for publications in DiVA
Change search
Refine search result
1 - 9 of 9
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Alam, Syed Asad
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Techniques for Efficient Implementation of FIR and Particle Filtering2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    FIR filters occupy a central place many signal processing applications which either alter the shape, frequency or the sampling frequency of the signal. FIR filters are used because of their stability and possibility to have linear-phase but require a high filter order to achieve the same magnitude specifications as compared to IIR filters. Depending on the size of the required transition bandwidth the filter order can range from tens to hundreds to even thousands. Since the implementation of the filters in digital domain requires multipliers and adders, high filter orders translate to a large number of these arithmetic units for its implementation. Research towards reducing the complexity of FIR filters has been going on for decades and the techniques used can be roughly divided into two categories; reduction in the number of multipliers and simplification of the multiplier implementation. 

    One technique to reduce the number of multipliers is to use cascaded sub-filters with lower complexity to achieve the desired specification, known as FRM. One of the sub-filters is a upsampled model filter whose band edges are an integer multiple, termed as the period L, of the target filter's band edges. Other sub-filters may include complement and masking filters which filter different parts of the spectrum to achieve the desired response. From an implementation point-of-view, time-multiplexing is beneficial because generally the allowable maximum clock frequency supported by the current state-of-the-art semiconductor technology does not correspond to the application bound sample rate. A combination of these two techniques plays a significant role towards efficient implementation of FIR filters. Part of the work presented in this dissertation is architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of the periodic model filters.

    These time-multiplexed FRM filters not only reduce the number of multipliers but lowers the memory usage. Although the FRM technique requires a higher number delay elements, it results in fewer memories and more energy efficient memory schemes when time-multiplexed. Different memory arrangements and memory access schemes have also been discussed and compared in terms of their efficiency when using both single and dual-port memories. An efficient pipelining scheme has been proposed which reduces the number of pipelining registers while achieving similar clock frequencies. The single optimal point where the number of multiplications is minimum for non-time-multiplexed FRM filters is shown to become a function of both the period, L and time-multiplexing factor, M. This means that the minimum number of multipliers does not always correspond to the minimum number of multiplications which also increases the flexibility of implementation. These filters are shown to achieve power reduction between 23% and 68% for the considered examples.

    To simplify the multiplier, alternate number systems like the LNS have been used to implement FIR filters, which reduces the multiplications to additions. FIR filters are realized by directly designing them using ILP in the LNS domain in the minimax sense using finite word length constraints. The branch and bound algorithm, a typical algorithm to implement ILP problems, is implemented based on LNS integers and several branching strategies are proposed and evaluated. The filter coefficients thus obtained are compared with the traditional finite word length coefficients obtained in the linear domain. It is shown that LNS FIR filters provide a better approximation  error compared to a standard FIR filter for a given coefficient word length.

    FIR filters also offer an opportunity in complexity reduction by implementing the multipliers using Booth or standard high-radix multiplication. Both of these multiplication schemes generate pre-computed multiples of the multiplicand which are then selected based on the encoded bits of the multiplier. In TDF FIR filters, one input data is multiplied with a number of coefficients and complexity can be reduced by sharing the pre-computation of the multiplies of the input data for all multiplications. Part of this work includes a systematic and unified approach to the design of such computation sharing multipliers and a comparison of the two forms of multiplication. It also gives closed form expressions for the cost of different parts of multiplication and gives an overview of various ways to implement the select unit with respect to the design of multiplexers.

    Particle filters are used to solve problems that require estimation of a system. Improved resampling schemes for reducing the latency of the resampling stage is proposed which uses a pre-fetch technique to reduce the latency between 50% to 95%  dependent on the number of pre-fetches. Generalized division-free architectures and compact memory structures are also proposed that map to different resampling algorithms and also help in reducing the complexity of the multinomial resampling algorithm and reduce the number of memories required by up to 50%.

    List of papers
    1. A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliers
    Open this publication in new window or tab >>A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliers
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    A unified approach to the design and implementation of computation sharing multiplier based on Booth and standard high-radix multiplication schemes is presented here. Both of these multiplication schemes have various building blocks and one of which is the pre-computer which can be shared across a number of multiplications if the multiplicand to the multipliers is same, like in a transposed direct form (TDF) finitelength impulse response (FIR) filter. Closed form expressions to estimate the cost of different building blocks based on different schemes have been developed and analyzed in different dimensions. Standalone multipliers and as part of computation sharing in FIR filters and complex multipliers have been realized in hardware and synthesized using standard cell library.

    It is shown that apart from word length and filter length, the ratio  between the cost of implementing adders and multiplexers has an effect on the choice of optimal radix. The higher the ratio, the lower is the cost of implementing multiplexers which will benefit high radix. Higher radix will also benefit from computation sharing if the cost of one multiplication for it is less than the lower radix and it is shown that radix-16 Booth multiplier achieves lower area complexity and power consumption by an average of 7% and 17%, respectively.

    Keywords
    Computation sharing multipliers, standard high-radix multiplier, Booth multiplier, FIR filter
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124194 (URN)
    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2016-02-02Bibliographically approved
    2. On the implementation of time-multiplexed frequency-response masking filters
    Open this publication in new window or tab >>On the implementation of time-multiplexed frequency-response masking filters
    2016 (English)In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed) Published
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

    Place, publisher, year, edition, pages
    Institute of Electrical and Electronics Engineers (IEEE), 2016
    Keywords
    Frequency-response masking, FIR filter, FPGA, ASIC, time-multiplexing, memories
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124190 (URN)10.1109/TSP.2016.2557298 (DOI)000379699800009 ()
    Note

    Vid tiden för disputation förelåg publikationen som manuskript

    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2017-11-30Bibliographically approved
    3. Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain
    Open this publication in new window or tab >>Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain
    2014 (English)In: VLSI design (Print), ISSN 1065-514X, E-ISSN 1563-5171, Vol. 2014, no 217495Article in journal (Refereed) Published
    Abstract [en]

    Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because ofmultiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directlyreplaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in theLNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branchingstrategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditionalfinite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturallyprovides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length propertiesof FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error comparedto a standard FIR filter.

    Place, publisher, year, edition, pages
    Egypt: Hindawi Publishing Corporation, 2014
    Keywords
    Logarithmic Number System, FIR Filter, Integer Linear Programming, Branch and Bound
    National Category
    Signal Processing
    Identifiers
    urn:nbn:se:liu:diva-105861 (URN)10.1155/2014/217495 (DOI)
    Available from: 2014-04-10 Created: 2014-04-10 Last updated: 2017-12-05Bibliographically approved
    4. Improved particle filter resampling architectures
    Open this publication in new window or tab >>Improved particle filter resampling architectures
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step which replicates particles with large weights and discards those with small weights because it has a high latency and can only be partially executed in parallel with the other steps of particle filtering. To reduce the latency, an improved resampling scheme is proposed in this work which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator. Architectures for realizing the pre-fetch technique are also proposed. The trade-off between the latency reduction achieved by increasing the size of the pre-fetch memory and the architectural implementation complexity has been analyzed. Results show that a pre-fetch of five achieves the best area-latency trade-off while on average achieving an 85% reduction in the latency.

    We also propose a generic double multiplier architecture for resampling which avoids normalization divisions and makes the architecture equally efficient for non-powers-of-two number of particles as well as removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. It is further improved by computing the cumulative sum of weights on-the-fly which helps in reducing the size of the weight memories by up to 50%.

    Keywords
    Particle filters, resampling algorithm, resampling architecture
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124193 (URN)
    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2016-02-02Bibliographically approved
  • 2.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliersManuscript (preprint) (Other academic)
    Abstract [en]

    A unified approach to the design and implementation of computation sharing multiplier based on Booth and standard high-radix multiplication schemes is presented here. Both of these multiplication schemes have various building blocks and one of which is the pre-computer which can be shared across a number of multiplications if the multiplicand to the multipliers is same, like in a transposed direct form (TDF) finitelength impulse response (FIR) filter. Closed form expressions to estimate the cost of different building blocks based on different schemes have been developed and analyzed in different dimensions. Standalone multipliers and as part of computation sharing in FIR filters and complex multipliers have been realized in hardware and synthesized using standard cell library.

    It is shown that apart from word length and filter length, the ratio  between the cost of implementing adders and multiplexers has an effect on the choice of optimal radix. The higher the ratio, the lower is the cost of implementing multiplexers which will benefit high radix. Higher radix will also benefit from computation sharing if the cost of one multiplication for it is less than the lower radix and it is shown that radix-16 Booth multiplier achieves lower area complexity and power consumption by an average of 7% and 17%, respectively.

  • 3.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain2014In: VLSI design (Print), ISSN 1065-514X, E-ISSN 1563-5171, Vol. 2014, no 217495Article in journal (Refereed)
    Abstract [en]

    Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because ofmultiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directlyreplaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in theLNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branchingstrategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditionalfinite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturallyprovides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length propertiesof FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error comparedto a standard FIR filter.

  • 4.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Generalized Division-Free Architecture and Compact Memory Structure for Resampling in Particle Filters2015In: 2015 European Conference on Circuit Theory and Design (ECCTD), IEEE Press, 2015, p. 416-419Conference paper (Refereed)
    Abstract [en]

    The most challenging step of implementing particle filtering is the resampling step which replicates particles with large weights and discards those with small weights. In this paper, we propose a generic architecture for resampling which uses double multipliers to avoid normalization divisions and make the architecture  equally efficient for non-powers-of-two number of particles. Furthermore, the complexity of resampling is greatly affected by the size of memories used to store weights. We illustrate that by storing the original weights instead of their cumulative sum and calculating them online reduces the total complexity, in terms of area, ranging from 21% to 45%, while giving up to 50% reduction in memory usage.

  • 5.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Implementation of Narrow-Band Frequency-Response Masking for Efficient Narrow Transition Band FIR Filters on FPGAs2011In: NORCHIP, 2011, IEEE conference proceedings, 2011, p. 1-4Conference paper (Refereed)
    Abstract [en]

    The complexity of narrow transition band FIR filters is highand can be reduced by using frequency response masking (FRM) techniques. Thesetechniques use a combination of periodic model filters and masking filters. Inthis paper, we show that time-multiplexed FRM filters achieve lowercomplexity, not only in terms of multipliers, but also logic elements compared to time-multiplexed singlestage filters. The reduced complexity also leads to a lower power consumption. Furthermore, we show that theoptimal period of the model filter is dependent on the time-multiplexing factor.

  • 6.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Implementation of Time-Multiplexed Sparse Periodic FIR Filters for FRM on FPGAs2011Conference paper (Refereed)
    Abstract [en]

    Frequency-response masking (FRM) is a set of techniques for lowering the computational complexity of narrow transition band FIR filters. These FRM use a combination of sparse periodic filters and non-sparse filters. In this work we consider the implementation of these filters in a time-multiplexed manner on FPGAs. It is shown that the proposed architectures produce lower complexity realizations compared to the vendor provided IP blocks, which do not take the sparseness into consideration. The designs are implemented on a Virtex-6 device utilizing the built-in DSP blocks.

  • 7.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved particle filter resampling architecturesManuscript (preprint) (Other academic)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step which replicates particles with large weights and discards those with small weights because it has a high latency and can only be partially executed in parallel with the other steps of particle filtering. To reduce the latency, an improved resampling scheme is proposed in this work which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator. Architectures for realizing the pre-fetch technique are also proposed. The trade-off between the latency reduction achieved by increasing the size of the pre-fetch memory and the architectural implementation complexity has been analyzed. Results show that a pre-fetch of five achieves the best area-latency trade-off while on average achieving an 85% reduction in the latency.

    We also propose a generic double multiplier architecture for resampling which avoids normalization divisions and makes the architecture equally efficient for non-powers-of-two number of particles as well as removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. It is further improved by computing the cumulative sum of weights on-the-fly which helps in reducing the size of the weight memories by up to 50%.

  • 8.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    On the implementation of time-multiplexed frequency-response masking filters2016In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed)
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

  • 9.
    Qureshi, Fahad
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Alam, Syed Asad
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    4-k point FFT algorithms based on optimized twiddle factor multiplication for FPGAs2010In: The Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), Shanghai, Sept. 22-24, 2010., 2010, p. 225-228Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose higher point FFT (fast Fourier transform) algorithms for a single delay feedback pipelined FFT architecture considering the 4096-point FFT. These algorithms are different from each other in terms of twiddle factor multiplication. Twiddle factor multiplication complexity comparison is presented when implemented on Field-Programmable Gate Arrays (FPGAs) for all proposed algorithms. We also discuss the design criteria of the twiddle factor multiplication. Finally it is shown that there is a trade-off between twiddle factor memory complexity and switching activity in the introduced algorithms.

1 - 9 of 9
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf