liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 169) Show all publications
Garrido, M., Grajal, J. & Gustafsson, O. (2019). Optimum Circuits for Bit-Dimension Permutations. IEEE Transactions on Very Large Scale Integration (vlsi) Systems, 27(5), 1148-1160
Open this publication in new window or tab >>Optimum Circuits for Bit-Dimension Permutations
2019 (English)In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 27, no 5, p. 1148-1160Article in journal (Refereed) Published
Abstract [en]

In this paper, we present a systematic approach to design hardware circuits for bit-dimension permutations. The proposed approach is based on decomposing any bit-dimension permutation into elementary bit-exchanges. Such decomposition is proven to achieve the theoretical minimum number of delays required for the permutation. This offers optimum solutions for multiple well-known problems in the literature that make use of bit-dimension permutations. This includes the design of permutation circuits for the fast Fourier transform, bit reversal, matrix transposition, stride permutations, and Viterbi decoders.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
Bit-dimension permutation; bit reversal; data management; fast Fourier transform (FFT); matrix transposition; pipelined architecture; streaming data; Viterbi decoder
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-158360 (URN)10.1109/TVLSI.2019.2892322 (DOI)000466226400014 ()2-s2.0-85065104805 (Scopus ID)
Note

Funding Agencies|Swedish ELLIIT Program; FPU Fellowship of the Spanish Ministry of Education [AP2005-0544]; Spanish National Research and Development Program [TEC2014-53815-R]; Madrid Regional Government [S2013/ICE3000]

Available from: 2019-07-02 Created: 2019-07-02 Last updated: 2019-08-09Bibliographically approved
Kumm, M., Gustafsson, O., de Dinechin, F., Kappauf, J. & Zipf, P. (2018). Karatsuba with Rectangular Multipliers for FPGAs. In: 2018 IEEE 25TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH): . Paper presented at 25th International Symposium on Computer Arithmetic, Amherst, MA, USA, June 25-27, 2018 (pp. 13-20). IEEE
Open this publication in new window or tab >>Karatsuba with Rectangular Multipliers for FPGAs
Show others...
2018 (English)In: 2018 IEEE 25TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), IEEE, 2018, p. 13-20Conference paper, Published paper (Refereed)
Abstract [en]

This work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18x25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18x18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPGA case, this can be obtained by using the embedded multipliers as 16x24 unsigned and as 17x25 signed ones.The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits.

Place, publisher, year, edition, pages
IEEE, 2018
Series
International Symposium on Computer Arithmetic, ISSN 1063-6889
National Category
Computer Systems Embedded Systems Signal Processing
Identifiers
urn:nbn:se:liu:diva-150920 (URN)10.1109/ARITH.2018.8464809 (DOI)000454795600003 ()978-1-5386-2613-9 (ISBN)
Conference
25th International Symposium on Computer Arithmetic, Amherst, MA, USA, June 25-27, 2018
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2019-01-21Bibliographically approved
Mohammadi Sarband, N., Gustafsson, O. & Garrido, M. (2018). Obtaining Minimum Depth Sum of Products from Multiple Constant Multiplication. In: PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), IEEE: . Paper presented at IEEE Workshop on Signal Processing Systems, Cape Town, South Africa, 21-24 October, 2018 (pp. 134-139). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Obtaining Minimum Depth Sum of Products from Multiple Constant Multiplication
2018 (English)In: PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), IEEE, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 134-139Conference paper, Published paper (Refereed)
Abstract [sv]

In this work, an approach for transposing solutions to the multiple constant multiplication (MCM) problem to obtain a sum of product (SOP) computation with minimum depth is proposed. The reason for doing this is that solving the SOP problem directly is highly computationally intensive when adder graph algorithms are used. Compared to using subexpression sharing algorithms, which has a lower computational complexity, directly for the SOP problem, it is shown that the proposed approach, as expected, results in lower complexity for the SOP. It is also shown that there is no obvious way to construct the MCM solution in such a way that the SOP solution has the minimum theoretical depth. However, the proposed approach guarantees minimum depth subject to the MCM solution given as input.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Series
IEEE Workshop on Signal Processing Systems, ISSN 1520-6130
Keywords
multiple constant multiplication (MCM), shift-and-add, Sum of Product (SOP), minimum depth expansion algorithm, and Vertex reduced SOP adder Graph (VSG).
National Category
Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-152653 (URN)10.1109/SiPS.2018.8598365 (DOI)000465106800023 ()978-1-5386-6318-9 (ISBN)
Conference
IEEE Workshop on Signal Processing Systems, Cape Town, South Africa, 21-24 October, 2018
Available from: 2018-11-09 Created: 2018-11-09 Last updated: 2019-06-28Bibliographically approved
Ingemarsson, C. & Gustafsson, O. (2018). SFF—The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture. Journal of Signal Processing Systems, 90(11), 1583-1592
Open this publication in new window or tab >>SFF—The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture
2018 (English)In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 90, no 11, p. 1583-1592Article in journal (Refereed) Published
Abstract [en]

In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art.

Place, publisher, year, edition, pages
Springer, 2018
Keywords
Fast Fourier transform (FFT), Field-programmable gate arrays (FPGAs), Pipeline FFT, FPGA optimization, Single-stream FFT
National Category
Computer Systems Signal Processing Embedded Systems
Identifiers
urn:nbn:se:liu:diva-150930 (URN)10.1007/s11265-018-1370-y (DOI)000446439900007 ()2-s2.0-85046136448 (Scopus ID)
Note

Special Issue on fast Fourier transform (FFT) hardware implementations

Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-10-17Bibliographically approved
Gustafsson, O., Bertilsson, E., Klasson, J. & Ingemarsson, C. (2017). Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper). In: Neil Burgess, Javier Bruguera, and Florent de Dinechin (Ed.), Proceedings 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017: . Paper presented at The 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017 (pp. 62-63). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)
2017 (English)In: Proceedings 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017 / [ed] Neil Burgess, Javier Bruguera, and Florent de Dinechin, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 62-63Conference paper, Published paper (Refereed)
Abstract [en]

Approximate matrix inversion based on Neumann series has seen a recent increased interest motivated by massive MIMO systems. There, the matrices are in many cases diagonally dominant, and, hence, a reasonable approximation can be obtained within a few iterations of a Neumann series. In this work, we clarify that the complexity of exact methods are about the same as when three terms are used for the Neumann series, so in this case, the complexity is not lower as often claimed. The second common argument for Neumann series approximation, higher parallelism, is indeed correct. However, in most current practical use cases, such a high degree of parallelism is not required to obtain a low latency realization. Hence, we conclude that a careful evaluation, based on accuracy and latency requirements must be performed and that exact matrix inversion is in fact viable in many more cases than the current literature claims.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Series
Proceedings Symposium on Computer Arithmetic, ISSN 1063-6889 ; 2017
Keywords
matrix inversion, complexity, parallel processing, massive MIMO
National Category
Computer Systems Signal Processing Communication Systems
Identifiers
urn:nbn:se:liu:diva-139337 (URN)10.1109/ARITH.2017.11 (DOI)000424786700011 ()9781538619650 (ISBN)9781538619643 (ISBN)9781538619667 (ISBN)
Conference
The 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017
Available from: 2017-07-10 Created: 2017-07-10 Last updated: 2019-05-09Bibliographically approved
Gustafsson, O. & Wanhammar, L. (2017). Basic Arithmetic Circuits. In: Pramod Kumar Meher, Thanos Stouraitis (Ed.), Arithmetic Circuits for DSP Applications: (pp. 1-32). John Wiley & Sons
Open this publication in new window or tab >>Basic Arithmetic Circuits
2017 (English)In: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, p. 1-32Chapter in book (Other academic)
Abstract [en]

General‐purpose DSP processors, application‐specific processors, and algorithm‐specific processors are used to implement different types of DSP systems or subsystems. They are typically used in applications involving complex and irregular algorithms while application‐specific processors provide lower unit cost and higher performance for a specific application, particularly when the volume of production is high. Most DSP applications use fractional arithmetic instead of integer arithmetic. Multimedia and communication applications involve real‐time audio and video/image processing which very often require sum‐of‐products (SOP) computation. The need of computing non‐linear functions arises in many different applications. The straightforward method of approximating an elementary function is to just store the values in a look‐up table typically leads to large tables, even though the resulting area from standard cell synthesis grows slower than the number of memory bits. It is of interest to find ways to approximate elementary functions using a trade‐off between arithmetic operations and look‐up tables.

Place, publisher, year, edition, pages
John Wiley & Sons, 2017
Keywords
arithmetic circuits, arithmetic operations, complex multiplication, DSP applications, look‐up tables, non‐linear functions, root computation, sum‐of‐products circuits, sum‐of‐products computation
National Category
Computer Systems Embedded Systems
Identifiers
urn:nbn:se:liu:diva-150915 (URN)10.1002/9781119206804.ch1 (DOI)9781119206774 (ISBN)
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-05Bibliographically approved
Bertilsson, E., Gustafsson, O. & Larsson, E. G. (2017). Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO. In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS: . Paper presented at 51th Asilomar Conference on Signals, Systems, and Computers (ASILOMARSSC) (pp. 466-469).
Open this publication in new window or tab >>Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO
2017 (English)In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, p. 466-469Conference paper, Published paper (Refereed)
Abstract [en]

Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.

Keywords
Massive MIMO, Matrix inversion
National Category
Communication Systems
Identifiers
urn:nbn:se:liu:diva-151315 (URN)10.1109/ACSSC.2017.8335382 (DOI)000442659900082 ()978-1-5386-1823-3 (ISBN)
Conference
51th Asilomar Conference on Signals, Systems, and Computers (ASILOMARSSC)
Available from: 2018-09-17 Created: 2018-09-17 Last updated: 2018-09-21
Kovalev, A., Gustafsson, O. & Garrido, M. (2017). Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters. In: Michael B. Matthews (Ed.), Conference Record of The Fifty-First Asilomar Conference on Signals, Systems & Computers: . Paper presented at The 51st Asilomar Conference on Signals, Systems & Computers, November 6-9, 2016, Pacific Grove, California, USA (pp. 1779-1783). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters
2017 (English)In: Conference Record of The Fifty-First Asilomar Conference on Signals, Systems & Computers / [ed] Michael B. Matthews, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 1779-1783Conference paper, Published paper (Refereed)
Abstract [en]

In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. Furthermore, the theoretical multiplication complexity estimates are not good predictors for the energy consumption. The results show that the implementation with 10% more multiplications per sample has half the power consumption and one third of the area consumption. The best architecture for this specification results in a power consumption of 3.12 W in a 65 nm technology, corresponding to an energy per complex filter tap of 0.10 mW/GHz.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Series
Signals, Systems, and Computers, E-ISSN 2576-2303 ; 2017
Keywords
Finite impulse response filters, Computer architecture, Complexity theory, Clocks, Chromatic dispersion, Discrete Fourier transforms, Frequency-domain analysis, adaptive filters, compensation, fast Fourier transforms, FIR filters, optical filters
National Category
Signal Processing Communication Systems Embedded Systems
Identifiers
urn:nbn:se:liu:diva-150912 (URN)10.1109/ACSSC.2017.8335667 (DOI)000442659900316 ()2-s2.0-85050969687 (Scopus ID)978-1-5386-1823-3 (ISBN)978-1-5386-0666-7 (ISBN)978-1-5386-1824-0 (ISBN)
Conference
The 51st Asilomar Conference on Signals, Systems & Computers, November 6-9, 2016, Pacific Grove, California, USA
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2019-06-28Bibliographically approved
Gustafsson, O. (2017). On Lifting-Based Fixed-Point Complex Multiplications and Rotations. In: Neil Burgess, Javier Bruguera and Florent de Dinechin (Ed.), Proceedings 24th IEEE Symposium on Computer Arithmetic 24–26 July 2017 London, United Kingdom: . Paper presented at The 24th IEEE Symposium on Computer Arithmetic 24–26 July 2017 London, United Kingdom (pp. 43-49). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>On Lifting-Based Fixed-Point Complex Multiplications and Rotations
2017 (English)In: Proceedings 24th IEEE Symposium on Computer Arithmetic 24–26 July 2017 London, United Kingdom / [ed] Neil Burgess, Javier Bruguera and Florent de Dinechin, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 43-49Conference paper, Published paper (Refereed)
Abstract [en]

Lifting-based complex multiplications and rotations are integer invertible, i.e., an integer input value is mapped to the same integer output value when rotating forward and backward. This is an important aspect for lossless transform-based source coding, but since the structure only require three real-valued multiplications and three real-valued additions it is also a potentially attractive way to perform complex multiplications when the coefficient has unity magnitude. In this work, we consider two aspects of these structures. First, we show that both the magnitude and angular error is dependent on the angle of input value and derive both exact and approximated expressions for these. Second, we discuss how to design such structures without the typical separation into three subsequent matrix multiplications. It is shown that the proposed design method allows many more values which are integer invertible, but can not be separated into three subsequent matrix multiplications with fixed-point values. The results show good correspondence between the error approximations and the actual error as well as a significantly increased design space.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Series
Proceedings Symposium on Computer Arithmetic, ISSN 1063-6889 ; 2017
Keywords
Rotation, complex multiplication, error analysis, fixed-point representation
National Category
Computer Systems Signal Processing
Identifiers
urn:nbn:se:liu:diva-139336 (URN)10.1109/ARITH.2017.10 (DOI)000424786700007 ()9781538619650 (ISBN)9781538619643 (ISBN)9781538619667 (ISBN)
Conference
The 24th IEEE Symposium on Computer Arithmetic 24–26 July 2017 London, United Kingdom
Available from: 2017-07-10 Created: 2017-07-10 Last updated: 2019-05-09Bibliographically approved
Meher, P. K., Chang, C.-H., Gustafsson, O., Vinod, A. & Faust, M. (2017). Shift‐Add Circuits for Constant Multiplications. In: Pramod Kumar Meher, Thanos Stouraitis (Ed.), Arithmetic Circuits for DSP Applications: (pp. 33-76). John Wiley & Sons
Open this publication in new window or tab >>Shift‐Add Circuits for Constant Multiplications
Show others...
2017 (English)In: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, p. 33-76Chapter in book (Other academic)
Abstract [en]

The optimization of shift‐and‐add network for constant multiplications is found to have great potential for reducing the area, delay, and power consumption of implementation of multiplications in several computation‐intensive applications not only in dedicated hardware but also in programmable computing systems. To simplify the shift‐and‐add network in single constant multiplication (SCM) circuits, this chapter discusses three design approaches, including direct simplification from a given number representation, simplification by redundant signed digit (SD) representation, and simplification by adder graph. Examples of the multiple constant multiplication (MCM) methods are constant matrix multiplication, discrete cosine transform (DCT) or fast Fourier transform (FFT), and polyphase finite impulse response (FIR) filters and filter banks. The given constant multiplication methods can be used for matrix multiplications and inner‐product; and can be applied easily to image/video processing and graphics applications. The chapter further discusses some of the shortcomings in the current research on constant multiplications, and possible scopes of improvement.

Place, publisher, year, edition, pages
John Wiley & Sons, 2017
Keywords
adder graph, constant multiplication methods, fast Fourier transform, polyphase finite impulse response filters, programmable computing systems, redundant signed digit representation, shift‐add circuits
National Category
Computer Systems Embedded Systems Signal Processing
Identifiers
urn:nbn:se:liu:diva-150919 (URN)10.1002/9781119206804.ch2 (DOI)9781119206774 (ISBN)9781119206798 (ISBN)9781119206804 (ISBN)
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3470-3911

Search in DiVA

Show all publications