liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 165) Show all publications
Kumm, M., Gustafsson, O., de Dinechin, F., Kappauf, J. & Zipf, P. (2018). Karatsuba with Rectangular Multipliers for FPGAs. In: : . Paper presented at IEEE Symposium on Computer Arithmetic, Amherst, MA, USA, June 25-27, 2018. IEEE
Open this publication in new window or tab >>Karatsuba with Rectangular Multipliers for FPGAs
Show others...
2018 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18x25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18x18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPGA case, this can be obtained by using the embedded multipliers as 16x24 unsigned and as 17x25 signed ones.The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits.

Place, publisher, year, edition, pages
IEEE, 2018
National Category
Computer Systems Embedded Systems Signal Processing
Identifiers
urn:nbn:se:liu:diva-150920 (URN)
Conference
IEEE Symposium on Computer Arithmetic, Amherst, MA, USA, June 25-27, 2018
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-21Bibliographically approved
Ingemarsson, C. & Gustafsson, O. (2018). SFF—The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture. Journal of Signal Processing Systems
Open this publication in new window or tab >>SFF—The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture
2018 (English)In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115Article in journal (Refereed) Epub ahead of print
Abstract [en]

In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art.

Place, publisher, year, edition, pages
Springer, 2018
Keywords
Fast Fourier transform (FFT), Field-programmable gate arrays (FPGAs), Pipeline FFT, FPGA optimization, Single-stream FFT
National Category
Computer Systems Signal Processing Embedded Systems
Identifiers
urn:nbn:se:liu:diva-150930 (URN)10.1007/s11265-018-1370-y (DOI)2-s2.0-85046136448 (Scopus ID)
Note

Special Issue on fast Fourier transform (FFT) hardware implementations

Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-25Bibliographically approved
Gustafsson, O. & Wanhammar, L. (2017). Basic Arithmetic Circuits. In: Pramod Kumar Meher, Thanos Stouraitis (Ed.), Arithmetic Circuits for DSP Applications: (pp. 1-32). John Wiley & Sons
Open this publication in new window or tab >>Basic Arithmetic Circuits
2017 (English)In: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, p. 1-32Chapter in book (Other academic)
Abstract [en]

General‐purpose DSP processors, application‐specific processors, and algorithm‐specific processors are used to implement different types of DSP systems or subsystems. They are typically used in applications involving complex and irregular algorithms while application‐specific processors provide lower unit cost and higher performance for a specific application, particularly when the volume of production is high. Most DSP applications use fractional arithmetic instead of integer arithmetic. Multimedia and communication applications involve real‐time audio and video/image processing which very often require sum‐of‐products (SOP) computation. The need of computing non‐linear functions arises in many different applications. The straightforward method of approximating an elementary function is to just store the values in a look‐up table typically leads to large tables, even though the resulting area from standard cell synthesis grows slower than the number of memory bits. It is of interest to find ways to approximate elementary functions using a trade‐off between arithmetic operations and look‐up tables.

Place, publisher, year, edition, pages
John Wiley & Sons, 2017
Keywords
arithmetic circuits, arithmetic operations, complex multiplication, DSP applications, look‐up tables, non‐linear functions, root computation, sum‐of‐products circuits, sum‐of‐products computation
National Category
Computer Systems Embedded Systems
Identifiers
urn:nbn:se:liu:diva-150915 (URN)10.1002/9781119206804.ch1 (DOI)9781119206774 (ISBN)
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-05Bibliographically approved
Bertilsson, E., Gustafsson, O. & Larsson, E. G. (2017). Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO. In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS: . Paper presented at 51th Asilomar Conference on Signals, Systems, and Computers (ASILOMARSSC) (pp. 466-469).
Open this publication in new window or tab >>Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO
2017 (English)In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, p. 466-469Conference paper, Published paper (Refereed)
Abstract [en]

Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.

Keywords
Massive MIMO, Matrix inversion
National Category
Communication Systems
Identifiers
urn:nbn:se:liu:diva-151315 (URN)10.1109/ACSSC.2017.8335382 (DOI)000442659900082 ()978-1-5386-1823-3 (ISBN)
Conference
51th Asilomar Conference on Signals, Systems, and Computers (ASILOMARSSC)
Available from: 2018-09-17 Created: 2018-09-17 Last updated: 2018-09-21
Kovalev, A., Gustafsson, O. & Garrido, M. (2017). Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters. In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS: . Paper presented at 51st IEEE Asilomar Conference on Signals Systems and Computers (pp. 1779-1783).
Open this publication in new window or tab >>Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters
2017 (English)In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, p. 1779-1783Conference paper, Published paper (Refereed)
Abstract [en]

In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. Furthermore, the theoretical multiplication complexity estimates are not good predictors for the energy consumption. The results show that the implementation with 10% more multiplications per sample has half the power consumption and one third of the area consumption. The best architecture for this specification results in a power consumption of 3.12 W in a 65 nm technology, corresponding to an energy per complex filter tap of 0.10 mW/GHz.

Keywords
Finite impulse response filters, Computer architecture, Complexity theory, Clocks, Chromatic dispersion, Discrete Fourier transforms, Frequency-domain analysis, adaptive filters, compensation, fast Fourier transforms, FIR filters, optical filters
National Category
Signal Processing Communication Systems Embedded Systems
Identifiers
urn:nbn:se:liu:diva-150912 (URN)10.1109/ACSSC.2017.8335667 (DOI)000442659900316 ()2-s2.0-85050969687 (Scopus ID)978-1-5386-1823-3 (ISBN)978-1-5386-0666-7 (ISBN)978-1-5386-1824-0 (ISBN)
Conference
51st IEEE Asilomar Conference on Signals Systems and Computers
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-21Bibliographically approved
Meher, P. K., Chang, C.-H., Gustafsson, O., Vinod, A. & Faust, M. (2017). Shift‐Add Circuits for Constant Multiplications. In: Pramod Kumar Meher, Thanos Stouraitis (Ed.), Arithmetic Circuits for DSP Applications: (pp. 33-76). John Wiley & Sons
Open this publication in new window or tab >>Shift‐Add Circuits for Constant Multiplications
Show others...
2017 (English)In: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, p. 33-76Chapter in book (Other academic)
Abstract [en]

The optimization of shift‐and‐add network for constant multiplications is found to have great potential for reducing the area, delay, and power consumption of implementation of multiplications in several computation‐intensive applications not only in dedicated hardware but also in programmable computing systems. To simplify the shift‐and‐add network in single constant multiplication (SCM) circuits, this chapter discusses three design approaches, including direct simplification from a given number representation, simplification by redundant signed digit (SD) representation, and simplification by adder graph. Examples of the multiple constant multiplication (MCM) methods are constant matrix multiplication, discrete cosine transform (DCT) or fast Fourier transform (FFT), and polyphase finite impulse response (FIR) filters and filter banks. The given constant multiplication methods can be used for matrix multiplications and inner‐product; and can be applied easily to image/video processing and graphics applications. The chapter further discusses some of the shortcomings in the current research on constant multiplications, and possible scopes of improvement.

Place, publisher, year, edition, pages
John Wiley & Sons, 2017
Keywords
adder graph, constant multiplication methods, fast Fourier transform, polyphase finite impulse response filters, programmable computing systems, redundant signed digit representation, shift‐add circuits
National Category
Computer Systems Embedded Systems Signal Processing
Identifiers
urn:nbn:se:liu:diva-150919 (URN)10.1002/9781119206804.ch2 (DOI)9781119206774 (ISBN)9781119206798 (ISBN)9781119206804 (ISBN)
Available from: 2018-09-05 Created: 2018-09-05 Last updated: 2018-09-05Bibliographically approved
Garrido Gálvez, M., Källström, P., Kumm, M. & Gustafsson, O. (2016). CORDIC II: A New Improved CORDIC Algorithm. IEEE Transactions on Circuits and Systems - II - Express Briefs, 63(2), 186-190
Open this publication in new window or tab >>CORDIC II: A New Improved CORDIC Algorithm
2016 (English)In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, no 2, p. 186-190Article in journal (Refereed) Published
Abstract [en]

In this brief, we present the CORDIC II algorithm. Like previous CORDIC algorithms, the CORDIC II calculates rotations by breaking down the rotation angle into a series of microrotations. However, the CORDIC II algorithm uses a novel angle set, different from the angles used in previous CORDIC algorithms. The new angle set provides a faster convergence that reduces the number of adders with respect to previous approaches.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2016
Keywords
CORDIC; friend angles; nanorotation; rotation; uniformly scaled redundant (USR) CORDIC
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-126139 (URN)10.1109/TCSII.2015.2483422 (DOI)000370533000014 ()
Available from: 2016-03-15 Created: 2016-03-15 Last updated: 2017-11-30
Källström, P. & Gustafsson, O. (2016). Fast and Area Efficient Adder for Wide Data in Recent Xilinx FPGAs. In: 26th International Conference on Field-Programmable Logic and Applications: . Paper presented at 26th International Conference on Field-Programmable Logic and Applications, Lausanne, Switzerland August 29 - September 2, 2016 (pp. 338-341). Lausanne: IEEE
Open this publication in new window or tab >>Fast and Area Efficient Adder for Wide Data in Recent Xilinx FPGAs
2016 (English)In: 26th International Conference on Field-Programmable Logic and Applications, Lausanne: IEEE , 2016, p. 338-341Conference paper, Published paper (Refereed)
Abstract [en]

Most modern FPGAs have very optimised carry logic for efficient implementations of ripple carry adders (RCA). Some FPGAs also have a six input look up table (LUT) per cell, whereof two inputs are used during normal addition. In this paper we present an architecture that compresses the carry chain length to N/2 in recent Xilinx FPGA, by utilising the LUTs better. This carry compression was implemented by letting some cells calculate the carry chain in two bits per cell, while some others calculate the summary output bits. In total the proposed design uses no more hardware than the normal adder. The result shows that the proposed adder is faster than a normal adder for word length larger than 64 bits in Virtex-6 FPGAs.

Place, publisher, year, edition, pages
Lausanne: IEEE, 2016
Series
Field Programmable Logic and Applications, International Conference on, ISSN 1946-1488
National Category
Embedded Systems
Identifiers
urn:nbn:se:liu:diva-131088 (URN)10.1109/FPL.2016.7577348 (DOI)000386610400050 ()9782839918442 (ISBN)9781509008513 (ISBN)
Conference
26th International Conference on Field-Programmable Logic and Applications, Lausanne, Switzerland August 29 - September 2, 2016
Available from: 2016-09-09 Created: 2016-09-07 Last updated: 2016-12-06Bibliographically approved
Alam, S. A. & Gustafsson, O. (2016). On the implementation of time-multiplexed frequency-response masking filters. IEEE Transactions on Signal Processing, 64(15), 3933-3944
Open this publication in new window or tab >>On the implementation of time-multiplexed frequency-response masking filters
2016 (English)In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed) Published
Abstract [en]

The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2016
Keywords
Frequency-response masking, FIR filter, FPGA, ASIC, time-multiplexing, memories
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-124190 (URN)10.1109/TSP.2016.2557298 (DOI)000379699800009 ()
Note

Vid tiden för disputation förelåg publikationen som manuskript

Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2017-11-30Bibliographically approved
Gustafsson, O. & Johansson, H. (2015). Decimation Filters for High-Speed Delta-Sigma Modulators With Passband Constraints: General Versus CIC-Based FIR Filters. In: 2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS): . Paper presented at IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 2205-2208). IEEE conference proceedings
Open this publication in new window or tab >>Decimation Filters for High-Speed Delta-Sigma Modulators With Passband Constraints: General Versus CIC-Based FIR Filters
2015 (English)In: 2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE conference proceedings, 2015, p. 2205-2208Conference paper, Published paper (Refereed)
Abstract [en]

For high-speed delta-sigma modulators the decimation filters are typically polyphase FIR filters as the recursive CIC filters can not be implemented because of the iteration period bound. In addition, the high clock frequency and short input word length make multiple constant multiplication techniques less beneficial. Instead a realistic complexity measure in this setting is the number of non-zero digits of the FIR filter tap coefficients. As there is limited control of the passband approximation error for CIC-based filters these must in most cases be compensated to meet a passband specification. In this work we investigate the complexity of decimation filters meeting CIC-like stopband behavior, but with a well defined passband approximation error. It is found that the general approach can in many cases produce filters with much smaller passband approximation error at a similar complexity.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
Series
IEEE International Symposium on Circuits and Systems, ISSN 0271-4302
National Category
Signal Processing
Identifiers
urn:nbn:se:liu:diva-114500 (URN)10.1109/ISCAS.2015.7169119 (DOI)000371471002135 ()978-1-4799-8391-9 (ISBN)
Conference
IEEE International Symposium on Circuits and Systems (ISCAS)
Funder
eLLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2015-02-24 Created: 2015-02-24 Last updated: 2016-04-07
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3470-3911

Search in DiVA

Show all publications