liu.seSearch for publications in DiVA
Change search
Refine search result
12345 1 - 50 of 207
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Hansson, Olle
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Grailootanha, Mahdieh
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training2024In: APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2024, SPRINGER INTERNATIONAL PUBLISHING AG , 2024, Vol. 14553, p. 33-47Conference paper (Refereed)
    Abstract [en]

    In this paper, we investigate the benefits of hardware-aware quantization in the gFADES hardware accelerator targeting Graph Convolutional Networks (GCNs). GCNs are a type of Graph Neural Networks (GNNs) that combine sparse and dense data compute requirements that are challenging to meet in resource-constrained embedded hardware. The gFADES architecture is optimized to work with the pruned data representations typically present in graph neural networks for the graph structure and features. It is described in High-Level Synthesis (HLS) which enables efficient design-space exploration of mixed precision hardware configurations. In this work, the mixed-precision design is embedded in the forward pass of the PyTorch back-propagation training loop to enable run-time hardware-aware training. It uses different data types to represent adjacency, feature, weight, internal, and output values which allows for a fine-grained optimization at the tensor level. The resulting hardware configuration after training reduces precision to a 4-bit data type for all inputs. It achieves little to no degradation in the classification accuracy, when training on the Planetoid database dataset, compared to the original 32-bit floating-point. The optimized hardware design running on an AMD/Xilinx Zynq Ultrascale+ FPGA device achieves over 600x speedup compared to the optimized PyTorch software implementation running on the multi-core ARM CPU in the processing system.

  • 2.
    Skarman, Frans
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Abstraction in the Spade Hardware Description Language2023Conference paper (Refereed)
    Abstract [en]

    Spade is an HDL that enhances the productivity of HDL designers byadding useful abstractions for hardware design. These abstractionsare zero- or low-cost, meaning that the designer still has full controlover what hardware gets generated.

  • 3.
    Khan, Mohd Tasleem
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Analyzing Step-Size Approximation for Fixed-Point Implementation of LMS and BLMS Algorithms2023In: 2023 IEEE Nordic Circuits and Systems Conference (NorCAS), IEEE, 2023Conference paper (Refereed)
    Abstract [en]

    In this work, we analyze the step-size approximation for fixed-point least-mean-square (LMS) and block LMS (BLMS) algorithms. Our primary focus is on investigating how step size approximation impacts the convergence rate and steady-state mean square error (MSE) across varying block sizes and filter lengths. We consider three different FP quantized LMS and BLMS algorithms. The results demonstrate that the algorithm with two quantizers in single precision behaves approximately the same as one quantizer under quantized weights, regardless of block size and filter lengths. Subsequently, we explore the approximation effects of nearest power-of-two and their combinations with different design parameters on the convergence performance. Simulation results for within the context of a system identification problem under these approximations reveal intriguing insights. For instance, a single quantizer algorithm without quantized error is more robust than its counterpart under these approximations. Additionally, both single quantizer algorithms with combined power-of-two approximations matches the behavior of the actual step-size.

  • 4.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    FFT-Size Implementation Tradeoffs for Chromatic Dispersion Compensation Filters2023Conference paper (Other academic)
    Abstract [en]

    FIR filtering realized in frequency domain can use different FFT sizes leading to different arithmetic complexities. The implementation results indicate that not only arithmetic complexities must be considered for minimal power consumption.

  • 5.
    Skarman, Frans
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Spade: An Expression-Based HDL With Pipelines2023In: Proceedings of the 3rd Workshop on Open-Source Design Automation (OSDA), 2023, 2023, p. 7-12Conference paper (Refereed)
    Abstract [en]

    Spade is a new open source hardware descriptionlanguage (HDL) designed to increase developer productivitywithout sacrificing the low-level control offered by HDLs. Itis a standalone language which takes inspiration from modernsoftware languages, and adds useful abstractions for commonhardware constructs. It also comes with a convenient set of tool-ing, such as a helpful compiler, a build system with dependencymanagement, tools for debugging, and editor integration.

  • 6.
    Henriksson, Mikael
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Streaming Matrix Transposition on FPGAs Using Distributed Memories2023In: Proceeding of the IEEE Nordic Circuits and Systems Conference (NorCAS), Aalborg, Denmark: Institute of Electrical and Electronics Engineers (IEEE), 2023Conference paper (Refereed)
    Abstract [en]

    Matrix transposition, the procedure of swapping rows and columns of a matrix, has applications in various signal processing applications, such as massive multiple-input multiple-output (MIMO) communication systems, data compression, and multidimensional fast Fourier transforms – which are used in MIMO radar systems. In low-latency high-throughput streaming applications, specialized circuits for matrix transposition are needed in order to perform transposition in real-time. This is in contrast to "slower" applications, where transposition can be adequately performed by storing a matrix in a shared memory and afterward reading it back in a transposed order. In this paper, a design procedure for streaming matrix transposition on field-programmable gate arrays (FPGAs) using distributed memories is presented. It is shown that significantly fewer FPGA resources are required for small- to medium-sized streaming matrix transpositions compared to recent related works.

  • 7.
    Khan, Mohd Tasleem
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    ASIC Implementation Trade-Offs for High-Speed LMS and Block LMS Adaptive Filters2022In: 65th International Midwest Symposium on Circuits and Systems (MWSCAS), Fukuoka, Japan: Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 1-4Conference paper (Other academic)
    Abstract [en]

    In this work, implementation trade-offs for ASIC-implementation of least-mean-square (LMS) and block LMS (BLMS) adaptive filters are presented. We explore the design trade-offs by increasing the block size and/or relying on the synthesis tool for increased sample rate. For area, lower block size is advantageous as long as the synthesis tool can meet timing. Energy optimum is however found at a different point in design space. Simulation confirms that longer block sizes leads to lower MSE errors for identical step-size. Hence, the design-point should be decided based on weighted requirements for area, energy and MSE.

    Download full text (pdf)
    fulltext
  • 8.
    Gustafsson, Oscar
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Bae, Cheolyong
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Shift-and-Add Realization Trade-Offs for Chromatic Dispersion Compensation FIR Filters2022In: Optica Advanced Photonics Congress 2022, Optical Society of America, 2022, article id SpTh1I.3Conference paper (Refereed)
    Abstract [en]

    Approaches to shift-and-add realization of time-domain chromatic dispersion compensation FIR filters are considered. The coefficient word length has larger impact than filter length on both BER penalty and adder complexity.

  • 9.
    Skarman, Frans
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Spade: An HDL Inspired by Modern Software Languages2022In: 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 454-455Conference paper (Refereed)
    Abstract [en]

    Spade is a new hardware description language which aims to make hardware description easier and less error prone. It does this by taking lessons from software programming languages, and adding language level support for common hardware constructs, all without compromising the low level control over what hardware gets generated.

    Download full text (pdf)
    fulltext
  • 10.
    Gustafsson, Oscar
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Hellman, Noah
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Approximate Floating-Point Operations with Integer Units by Processing in the Logarithmic Domain2021In: 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH), Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 45-52Conference paper (Refereed)
    Abstract [en]

    Floating-point numbers represented using a hidden one can readily be approximately converted to the logarithmic domain using Mitchell's approximation. Once in the logarithmic domain, several arithmetic operations including multiplication, division, and square-root can be easily computed using the integer arithmetic unit. This has earlier been used in fast reciprocal square-root algorithms, sometimes referred to as magic number algorithms. The proposed approximate operations are realized by performing an integer operation using an integer unit on floating-point data and adding an integer constant to obtain the approximate floating-point result. In this work, we derive easy to use equations and constants for multiple floating-point formats and operations.

    Download full text (pdf)
    fulltext
  • 11.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Finite Word Length Analysis for FFT-Based Chromatic Dispersion Compensation Filters2021In: Signal Processing in Photonic Communications 2021, OPTICA , 2021Conference paper (Refereed)
    Abstract [en]

    Finite word length effects for frequency-domain implementation of chromatic dispersion compensation is analyzed. The results show a significant difference for the different factors when it comes to power consumption and receiver penalty.

    Download full text (pdf)
    fulltext
  • 12.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Ingemarsson, Carl
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Low-Latency Parallel Hermitian Positive-Definite Matrix Inversion for Massive MIMO2021In: 2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 23-28Conference paper (Refereed)
    Abstract [en]

    In this work, the effect of latency for three different positive definite matrix inversion algorithms when implemented on parallel and pipelined processing elements is considered. The work is motivated by the fact that in a massive MIMO system, matrix inversion needs to be performed between estimating the channels and producing the transmitted downlink signal, which means that the latency of the matrix inversion has a significant impact on the system performance. It is shown that, despite the algorithms having different complexity, all three algorithms can have the lowest latency for different number of processing elements and pipeline levels. Especially, in systems with many processing elements, the algorithm with the highest complexity has the lowest latency.

    Download full text (pdf)
    fulltext
  • 13.
    Mohammadi Sarband, Narges
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Becirovic, Ema
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Krysander, Mattias
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Massive Machine-Type Communication Pilot-Hopping Sequence Detection Architectures Based on Non-Negative Least Squares for Grant-Free Random Access2021In: IEEE Open Journal of Circuits and Systems, ISSN 2644-1225, Vol. 2, p. 253-264Article in journal (Refereed)
    Abstract [en]

    User activity detection in grant-free random access massive machine type communication (mMTC) using pilot-hopping sequences can be formulated as solving a non-negative least squares (NNLS) problem. In this work, two architectures using different algorithms to solve the NNLS problem is proposed. The algorithms are implemented using a fully parallel approach and fixed-point arithmetic, leading to high detection rates and low power consumption. The first algorithm, fast projected gradients, converges faster to the optimal value. The second algorithm, multiplicative updates, is partially implemented in the logarithmic domain, and provides a smaller chip area and lower power consumption. For a detection rate of about one million detections per second, the chip area for the fast algorithm is about 0.7 mm 2 compared to about 0.5 mm 2 for the multiplicative algorithm when implemented in a 28 nm FD-SOI standard cell process at 1 V power supply voltage. The energy consumption is about 300 nJ/detection for the fast projected gradient algorithm using 256 iterations, leading to a convergence close to the theoretical. With 128 iterations, about 250 nJ/detection is required, with a detection performance on par with 192 iterations of the multiplicative algorithm for which about 100 nJ/detection is required.

    Download full text (pdf)
    fulltext
  • 14.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Overlap-Save Commutators for High-Speed Streaming Data Filtering2021In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE , 2021Conference paper (Other academic)
    Abstract [en]

    Overlap-save and overlap-add methods enable efficient implementation of FIR filters. In this paper, a compact method for handling the overlap and shuffle of samples for realtime processing using pipelined FFT architectures is presented. It is suitable for cases when the sample rate is equal to or higher than the clock frequency

  • 15.
    Skarman, Frans
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Jung, Daniel
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, Faculty of Science & Engineering.
    Krysander, Mattias
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A Tool to Enable FPGA-Accelerated Dynamic Programming for Energy Management of Hybrid Electric Vehicles2020In: IFAC PAPERSONLINE, ELSEVIER , 2020, Vol. 53, no 2, p. 15104-15109Conference paper (Refereed)
    Abstract [en]

    When optimising the vehicle trajectory and powertrain energy management of hybrid electric vehicles, it is important to include look-ahead information such as road conditions and other traffic. One method for doing so is dynamic programming, but the execution time of such an algorithm on a general purpose CPU is too slow for it to be useable in real time. Significant improvements in execution time can be achieved by utilising parallel computations, for example, using a Field-Programmable Gate Array (FPGA). A tool for automatically converting a vehicle model written in C++ into code that can executed on an FPGA which can be used for dynamic programming-based control is presented in this paper. A vehicle model with a mild-hybrid powertrain is used as a case study to evaluate the developed tool and the output quality and execution time of the resulting hardware. Copyright (C) 2020 The Authors.

  • 16.
    Skarman, Frans
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Jung, Daniel
    Linköping University, Department of Electrical Engineering, Vehicular Systems. Linköping University, Faculty of Science & Engineering.
    Krysander, Mattias
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Acceleration of Simulation Models Through Automatic Conversion to FPGA Hardware2020In: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), IEEE , 2020, p. 359-360Conference paper (Refereed)
    Abstract [en]

    By running simulation models on FPGAs, their execution speed can be significantly improved, at the cost of increased development effort. This paper describes a project to develop a tool which converts simulation models written in high level languages into fast FPGA hardware. The tool currently converts code written using custom C++ data types into Verilog. A model of a hybrid electric vehicle is used as a case study, and the resulting hardware runs significantly faster than on a general purpose CPU.

  • 17.
    Henriksson, Mikael
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Kunnath Ganesan, Unnikrishnan
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    An Architecture for Grant-Free Random Access Massive Machine Type Communication Using Coordinate Descent2020In: Proceedings of Fifty-Fourth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA: IEEE, 2020, Vol. 54, p. 1112-1116Conference paper (Refereed)
    Abstract [en]

    An implementation of activity detection for grant-free massive machine type communication is presented. The implemented algorithm is based on coordinate descent which shows a rapid convergence time. A number of modifications to the original algorithm is proposed to allow efficient implementation in hardware. In addition, the implementation is based on fixed-point representation, and, hence, exhaustive word length simulations have been performed for the different processing steps.

    Download full text (pdf)
    fulltext
  • 18.
    Fougstedt, Christoffer
    et al.
    Chalmers Univ Technol, Sweden.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Bae, Cheolyong
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Borjeson, Erik
    Chalmers Univ Technol, Sweden.
    Larsson-Edefors, Per
    Chalmers Univ Technol, Sweden.
    ASIC Design Exploration for DSP and FEC of 400-Gbitis Coherent Data-Center Interconnect Receivers2020In: 2020 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), IEEE , 2020Conference paper (Refereed)
    Abstract [en]

    We perform exploratory ASIC design of key DSP and FEC units for 400-Gbit/s coherent data-center interconnect receivers. In 22-nm CMOS, the considered units together dissipate 5 W, suggesting implementation feasibility in power-constrained form factors. (C) 2020 The Authors

  • 19.
    Fougstedt, Christoffer
    et al.
    Chalmers University of Technology, Gothenburg, Sweden.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Bae, Cheolyong
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Börjesson, Erik
    Chalmers University of Technology, Gothenburg, Sweden.
    Larsson-Edefors, Per
    Chalmers University of Technology, Gothenburg, Sweden.
    ASIC Design Exploration for DSP and FEC of 400-Gbit/s Coherent Data-Center Interconnect Receivers2020Conference paper (Refereed)
    Abstract [en]

    We perform exploratory ASIC design of key DSP and FEC units for 400-Gbit/s coherent data-center interconnect receivers. In 22-nm CMOS, the considered units together dissipate 5 W, suggesting implementation feasibility in power-constrained form factors.

  • 20.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson-Edefors, Per
    Chalmers University of Technology, Gothenburg, Sweden.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Benefit of Prime Factor FFTs in Fully Parallel 60 GBaud CDC Filters2020Conference paper (Refereed)
    Abstract [en]

    Prime factor algorithms are beneficial in fully parallel frequency-domain implementation of CDC filters and enable a more continuous scaling of filter lengths. ASICimplementation results in 28-nm CMOS for 60 GBd are provided.

  • 21.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    High-Speed Chromatic Dispersion Compensation Filtering in FPGAs for Coherent Optical Communication2020In: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), IEEE, 2020, p. 357-358Conference paper (Refereed)
    Abstract [en]

    Chromatic dispersion is one of the error sources limiting the transmission capacity in coherent optical communication that can be mitigated with digital signal processing. In this paper, the current status and plans of implementation of chromatic dispersion compensation (CDC) filters on FPGAs are discussed. As these high-speed filters are most efficiently implemented in the frequency-domain, different approaches for high-speed FFT-based architectures are considered and preliminary results of fully parallel FFT implementation by utilizing FPGA hardware features are presented.

  • 22.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering. Namal Inst, Pakistan.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved Particle Filter Resampling Architectures2020In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 92, no 6, p. 555-568Article in journal (Refereed)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step. This is because of high latency as it can be only partially executed in parallel with the other steps of particle filtering and has no inherent parallelism inside it. To reduce the latency, an improved resampling architecture is proposed which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator along with architectures for realizing the pre-fetch technique. This enables a particle filter using M particles with otherwise streaming operation to get new inputs more often than 2M cycles as the previously best approach gives. Results show that a pre-fetch buffer of five values achieves the best area-latency reduction trade-off while on average achieving an 85% reduction in latency for the resampling step leading to a sample time reduction of more than 40%. We also propose a generic division-free architecture for the resampling steps. It also removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. In addition, on-the-fly computation of the cumulative sum of weights is proposed which helps reduce the word length of the particle weight memory. FPGA implementation results show that the memory size is reduced by up to 50%.

    Download full text (pdf)
    fulltext
  • 23.
    Mohammadi Sarband, Narges
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Becirovic, Ema
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Krysander, Mattias
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Pilot-Hopping Sequence Detection Architecture for Grant-Free Random Access using Massive MIMO2020In: 2020 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2020Conference paper (Refereed)
    Abstract [en]

    In this work, an implementation of a pilot-hopping sequence detector for massive machine type communication is presented. The architecture is based on solution a non-negative least squares problem. The results show that the architecture supporting 1024 users can perform more than one million detections per second with a power consumption of less than 70 mW when implemented in a 28 nm FD-SOI process.

  • 24.
    Mohammadi Sarband, Narges
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Using Transposition to Efficiently Solve Constant Matrix-Vector Multiplication and Sum of Product Problems2020In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 92, no 10, p. 1075-1089Article in journal (Refereed)
    Abstract [en]

    In this work, we present an approach to alleviate the potential benefit of adder graph algorithms by solving the transposed form of the problem and then transposing the solution. The key contribution is a systematic way to obtain the transposed realization with a minimum number of cascaded adders subject to the input realization. In this way, wide and low constant matrix multiplication problems, with sum of products as a special case, which are normally exceptionally time consuming to solve using adder graph algorithms, can be solved by first transposing the matrix and then transposing the solution. Examples show that while the relation between the adder depth of the solution to the transposed problem and the original problem is not straightforward, there are many cases where the reduction in adder cost will more than compensate for the potential increase in adder depth and result in implementations with reduced power consumption compared to using sub-expression sharing algorithms, which can both solve the original problem directly in reasonable time and guarantee a minimum adder depth.

    Download full text (pdf)
    fulltext
  • 25.
    Kanders, Hans
    et al.
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Mellqvist, Tobias
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Palmkvist, Kent
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A 1 Million-Point FFT on a Single FPGA2019In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 66, no 10, p. 3863-3873Article in journal (Refereed)
    Abstract [en]

    In this paper, we present the first implementation of a 1 million-point fast Fourier transform (FFT) completely integrated on a single field-programmable gate array (FPGA), without the need for external memory or multiple interconnected FPGAs. The proposed architecture is a pipelined single-delay feedback (SDF) FFT. The architecture includes a specifically designed 1 million-point rotator with high accuracy and a thorough study of the word length at the different FFT stages in order to increase the signal-to-quantization-noise ratio (SQNR) and keep the area low. This also results in low power consumption.

    Download full text (pdf)
    fulltext
  • 26.
    Tran, Markus
    et al.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Källström, Petter
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Senel, Kamil
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    An Architecture for Grant-Free Massive MIMO MTC Based on Compressive Sensing2019In: CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEE , 2019, p. 901-905Conference paper (Refereed)
    Abstract [en]

    In this work, a processing architecture for grant-free machine type communication based on compressive sensing is proposed. The architecture can be adapted for a number of parameters. An instantiation for 128 terminals and 96 antennas is implemented. Without memories it consumes 1.52 W and occupies and area of 5.1 mm(2) in a 28 nm SOI CMOS process. The implemented instance can process about 10k messages per second, each containing four bits.

  • 27.
    Gustafsson, Oscar
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Wanhammar, Lars
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Arithmetic2019In: Handbook of signal processing systems / [ed] Bhattacharyya, S.S., Deprettere, E.F., Leupers, R., Takala, J., Cham: Springer, 2019, 3, p. 381-426Chapter in book (Other academic)
    Abstract [en]

    In this chapter fundamentals of arithmetic operations and number representations used in DSP systems are discussed. Different relevant number systems are outlined with a focus on fixed-point representations. Structures for accelerating the carry-propagation of addition are discussed, as well as multi-operand addition. For multiplication, different schemes for generating and accumulating partial products are presented. In addition to that, optimization for constant coefficient multiplication is discussed. Division and square-rooting are also briefly outlined. Furthermore, floating-point arithmetic and the IEEE 754 floating-point arithmetic standard are presented. Finally, some methods for computing elementary functions, e.g., trigonometric functions, are presented.

  • 28.
    Sadeghifar, Mohammad Reza
    et al.
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, Faculty of Science & Engineering. Ericsson AB, Sweden.
    Bengtsson, Hakan
    Ericsson AB, Sweden.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Direct digital-to-RF converter employing semi-digital FIR voltage-mode RF DAC2019In: Integration, ISSN 0167-9260, E-ISSN 1872-7522, Vol. 66, p. 128-134Article in journal (Refereed)
    Abstract [en]

    A direct digital-to-RF converter (DRFC) is presented in this work. Due to its digital-in-nature design, the DRFC benefits from technology scaling and can be monolithically integrated into advance digital VLSI systems. A fourth-order single-bit quantizer bandpass digital EA modulator is used preceding the DRFC, resulting in a high in-band signal-to-noise ratio (SNR). The out-of-band spectrally-shaped quantization noise is attenuated by an embedded semi-digital FIR filter (SDFIR). The RF output frequencies are synthesized by a novel configurable voltage-mode RF DAC solution with a high linearity performance. The configurable RF DAC is directly synthesizing RF signals up to 10 GHz in first or second Nyquist zone. The proposed DRFC is designed in 22 nm FDSOI CMOS process and with the aid of Monte-Carlo simulation, shows 78.6 dBc and 63.2 dBc worse case third intermodulation distortion (IM3) under process mismatch in 2.5 GHz and 7.5 GHz output frequency respectively.

    Download full text (pdf)
    fulltext
  • 29.
    Garrido, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Qureshi, Fahad
    Tampere University of Technology.
    Takala, Jarmo
    Tampere University of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Hardware architectures for the fast Fourier transform2019In: Handbook of signal processing systems / [ed] Bhattacharyya, S.S., Deprettere, E.F., Leupers, R., Takala, J., Cham: Springer, 2019, 3, p. 613-647Chapter in book (Other academic)
    Abstract [en]

    The fast Fourier transform (FFT) is a widely used algorithm in signal processing applications. FFT hardware architectures are designed to meet the requirements of the most demanding applications in terms of performance, circuit area, and/or power consumption. This chapter summarizes the research on FFT hardware architectures by presenting the FFT algorithms, the building blocks in FFT hardware architectures, the architectures themselves, and the bit reversal algorithm.

  • 30.
    Sadeghifar, Mohammad Reza
    et al.
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, Faculty of Science & Engineering. Ericsson AB, Sweden.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Integrated Circuits and Systems. Linköping University, Faculty of Science & Engineering.
    Optimization problem formulation for semi-digital FIR digital-to-analog converter considering coefficients precision and analog metrics2019In: Analog Integrated Circuits and Signal Processing, ISSN 0925-1030, E-ISSN 1573-1979, Vol. 99, no 2, p. 287-298Article in journal (Refereed)
    Abstract [en]

    Optimization problem formulation for semi-digital FIR digital-to-analog converter (SDFIR DAC) is investigated in this work. Magnitude and energy metrics with variable coefficient precision are defined for cascaded digital sigma modulators, semi-digital FIR filter, and Sinc roll-off frequency response of the DAC. A set of analog metrics as hardware cost is also defined to be included in SDFIR DAC optimization problem formulation. It is shown in this work, that hardware cost of the SDFIR DAC, can be significantly reduced by introducing flexible coefficient precision while the SDFIR DAC is not over designed either. Different use-cases are selected to demonstrate the optimization problem formulations. A combination of magnitude metric, energy metric, coefficient precision and analog metrics are used in different use cases of optimization problem formulation and solved to find out the optimum set of analog FIR taps. A new method with introducing the variable coefficient precision in optimization procedure was proposed to avoid non-convex optimization problems. It was shown that up to 22% in the total number of unit elements of the SDFIR filter can be saved when targeting the analog metric as the optimization objective subject to magnitude constraint in pass-band and stop-band.

    Download full text (pdf)
    fulltext
  • 31.
    Garrido, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Grajal, Jesus
    Univ Politecn Madrid, Spain.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Optimum Circuits for Bit-Dimension Permutations2019In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 27, no 5, p. 1148-1160Article in journal (Refereed)
    Abstract [en]

    In this paper, we present a systematic approach to design hardware circuits for bit-dimension permutations. The proposed approach is based on decomposing any bit-dimension permutation into elementary bit-exchanges. Such decomposition is proven to achieve the theoretical minimum number of delays required for the permutation. This offers optimum solutions for multiple well-known problems in the literature that make use of bit-dimension permutations. This includes the design of permutation circuits for the fast Fourier transform, bit reversal, matrix transposition, stride permutations, and Viterbi decoders.

    Download full text (pdf)
    fulltext
  • 32.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    A Modular Base Station Architecture for Massive MIMO with Antenna and User Scalability per Processing Node2018In: 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, IEEE , 2018, p. 1649-1653Conference paper (Refereed)
    Abstract [en]

    Massive MIMO is key technology for the upcoming fifth generation cellular networks (5G), promising high spectral efficiency, low power consumption, and the use of cheap hardware to reduce costs. Previous work has shown how to create a distributed processing architecture, where each node in a network performs the computations related to one or more antennas. The required total number of antennas, M, at the base station depends on the number of simultaneously operating terminals, K. In this work, a flexible node architecture is presented, where the number of terminals can he traded for additional antennas at the same node. This means that the same node can be used with a wide range of system configurations. The computational complexity, along with the order in which to compute incoming and outgoing symbols is explored.

  • 33.
    Jang, Jeong Keun
    et al.
    Dongbu Hitek, South Korea.
    Kim, Ho Keun
    Ajou Univ, South Korea.
    Sunwoo, Myung Hoon
    Ajou Univ, South Korea.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Area-Efficient Scheduling Scheme Based FFT Processor for Various OFDM Systems2018In: 2018 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2018), IEEE , 2018, p. 338-341Conference paper (Refereed)
    Abstract [en]

    This paper presents an area-efficient fast Fourier transform (FFT) processor for orthogonal frequency-division multiplexing systems based on multi-path delay commutator architecture. This paper proposes a data scheduling scheme to reduce the number of complex constant multipliers. The proposed mixed-radix multi-path delay commutator FFT processor can support 128-, 256-, and 512-point FFT sizes. The proposed processor was synthesized using the Samsung 65-nm CMOS standard cell library. The proposed processor with eight parallel data paths can achieve a high throughput rate of up to 2.64 GSample/s at 330 MHz.

  • 34.
    Jang, Jeong Keun
    et al.
    Dongbu Hitek, South Korea.
    Kim, Ho Keun
    Department of Electrical and Computer Engineering, Ajou University, Suwon, Korea.
    Sunwoo, Myung Hoon
    Department of Electrical and Computer Engineering, Ajou University, Suwon, Korea.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Area-efficient scheduling scheme based FFT processor for various OFDM systems2018Conference paper (Other academic)
    Abstract [en]

    This paper presents an area-efficient fast Fouriertransform (FFT) processor for orthogonal frequency-division multiplexing systems based on multi-path delay commutator architecture. This paper proposes a data scheduling scheme to reduce the number of complex constant multipliers. The proposed mixed-radix multi-path delay commutator FFT processor can support 128-, 256-, and 512-point FFT sizes. The proposed processor was synthesized using the Samsung 65-nm CMOS standard cell library. The proposed processor with eight parallel data paths can achieve a high throughput rate of up to 2.64 GSample/s at 330 MHz.

    Download full text (pdf)
    fulltext
  • 35.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gokhale, Madhur
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved Implementation Approaches for 512-tap 60 GSa/s Chromatic Dispersion FIR Filters2018In: 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, IEEE , 2018, p. 213-217Conference paper (Refereed)
    Abstract [en]

    In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. In this work, it is shown that it is not required to use two overlapping FFTs to obtain continuous filtering. In addition, efficient highly parallel implementation of FFTs is discussed and an unproved FFT compared to our earlier work is proposed. The results are compared to using an approach with a shorter FFT and FIR filters.

  • 36.
    Kumm, Martin
    et al.
    University of Kassel, Digital Technology Group, Germany.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    de Dinechin, Florent
    Univ Lyon, INSA Lyon, Inria, CITI, France.
    Kappauf, Johannes
    University of Kassel, Digital Technology Group, Germany.
    Zipf, Peter
    University of Kassel, Digital Technology Group, Germany.
    Karatsuba with Rectangular Multipliers for FPGAs2018In: 2018 IEEE 25TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), IEEE, 2018, p. 13-20Conference paper (Refereed)
    Abstract [en]

    This work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18x25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18x18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPGA case, this can be obtained by using the embedded multipliers as 16x24 unsigned and as 17x25 signed ones.The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits.

    Download full text (pdf)
    fulltext
  • 37.
    Mohammadi Sarband, Narges
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Obtaining Minimum Depth Sum of Products from Multiple Constant Multiplication2018In: PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), IEEE, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 134-139Conference paper (Refereed)
    Abstract [sv]

    In this work, an approach for transposing solutions to the multiple constant multiplication (MCM) problem to obtain a sum of product (SOP) computation with minimum depth is proposed. The reason for doing this is that solving the SOP problem directly is highly computationally intensive when adder graph algorithms are used. Compared to using subexpression sharing algorithms, which has a lower computational complexity, directly for the SOP problem, it is shown that the proposed approach, as expected, results in lower complexity for the SOP. It is also shown that there is no obvious way to construct the MCM solution in such a way that the SOP solution has the minimum theoretical depth. However, the proposed approach guarantees minimum depth subject to the MCM solution given as input.

    Download full text (pdf)
    fulltext
  • 38.
    Kumm, Martin
    et al.
    Univ Kassel, Germany.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Zipf, Peter
    Univ Kassel, Germany.
    Optimal Single Constant Multiplication Using Ternary Adders2018In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 65, no 7, p. 928-932Article in journal (Refereed)
    Abstract [en]

    The single constant coefficient multiplication is a frequently used operation in many numeric algorithms. Extensive previous work is available on how to reduce constant multiplications to additions, subtractions, and bit shifts. However, on previous work, only common two-input adders were used. As modern field-programmable gate arrays (FPGAs) support efficient ternary adders, i.e., adders with three inputs, this brief investigates constant multiplications that are built from ternary adders in an optimal way. The results show that the multiplication with any constant up to 22 bits can be realized by only three ternary adders. Average adder reductions of more than 33% compared to optimal constant multiplication circuits using two-input adders are achieved for coefficient word sizes of more than five bits. Synthesis experiments show FPGA average slice reductions in the order of 25% and a similar or higher speed than their two-input adder counterparts.

  • 39.
    Ingemarsson, Carl
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    SFF—The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture2018In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 90, no 11, p. 1583-1592Article in journal (Refereed)
    Abstract [en]

    In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art.

    Download full text (pdf)
    fulltext
  • 40.
    Gustafsson, Oscar
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Bertilsson, Erik
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Klasson, Johannes
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Ingemarsson, Carl
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)2017In: Proceedings 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017 / [ed] Neil Burgess, Javier Bruguera, and Florent de Dinechin, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 62-63Conference paper (Refereed)
    Abstract [en]

    Approximate matrix inversion based on Neumann series has seen a recent increased interest motivated by massive MIMO systems. There, the matrices are in many cases diagonally dominant, and, hence, a reasonable approximation can be obtained within a few iterations of a Neumann series. In this work, we clarify that the complexity of exact methods are about the same as when three terms are used for the Neumann series, so in this case, the complexity is not lower as often claimed. The second common argument for Neumann series approximation, higher parallelism, is indeed correct. However, in most current practical use cases, such a high degree of parallelism is not required to obtain a low latency realization. Hence, we conclude that a careful evaluation, based on accuracy and latency requirements must be performed and that exact matrix inversion is in fact viable in many more cases than the current literature claims.

    Download full text (pdf)
    Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)
  • 41.
    Gustafsson, Oscar
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Wanhammar, Lars
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Basic Arithmetic Circuits2017In: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, p. 1-32Chapter in book (Other academic)
    Abstract [en]

    General‐purpose DSP processors, application‐specific processors, and algorithm‐specific processors are used to implement different types of DSP systems or subsystems. They are typically used in applications involving complex and irregular algorithms while application‐specific processors provide lower unit cost and higher performance for a specific application, particularly when the volume of production is high. Most DSP applications use fractional arithmetic instead of integer arithmetic. Multimedia and communication applications involve real‐time audio and video/image processing which very often require sum‐of‐products (SOP) computation. The need of computing non‐linear functions arises in many different applications. The straightforward method of approximating an elementary function is to just store the values in a look‐up table typically leads to large tables, even though the resulting area from standard cell synthesis grows slower than the number of memory bits. It is of interest to find ways to approximate elementary functions using a trade‐off between arithmetic operations and look‐up tables.

  • 42.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO2017In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, p. 466-469Conference paper (Refereed)
    Abstract [en]

    Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.

  • 43.
    Ingemarsson, Carl
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Källström, Petter
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Qureshi, Fahad
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering. Tampere University of Technology, Finland.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Efficient FPGA Mapping of Pipeline SDF FFT Cores2017In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, no 9, p. 2486-2497Article in journal (Refereed)
    Abstract [en]

    In this paper, an efficient mapping of the pipeline single-path delay feedback (SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays (FPGAs) is proposed. By considering the architectural features of the target FPGA, significantly better implementation results are obtained. This is illustrated by mapping an R22SDF 1024-point FFT core toward both Xilinx Virtex-4 and Virtex-6 devices. The optimized FPGA mapping is explored in detail. Algorithmic transformations that allow a better mapping are proposed, resulting in implementation achievements that by far outperforms earlier published work. For Virtex-4, the results show a 350% increase in throughput per slice and 25% reduction in block RAM (BRAM) use, with the same amount of DSP48 resources, compared with the best earlier published result. The resulting Virtex-6 design sees even larger increases in throughput per slice compared with Xilinx FFT IP core, using half as many DSP48E1 blocks and less BRAM resources. The results clearly show that the FPGA mapping is crucial, not only the architecture and algorithm choices.

  • 44.
    Kovalev, Anton
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters2017In: Conference Record of The Fifty-First Asilomar Conference on Signals, Systems & Computers / [ed] Michael B. Matthews, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 1779-1783Conference paper (Refereed)
    Abstract [en]

    In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. Furthermore, the theoretical multiplication complexity estimates are not good predictors for the energy consumption. The results show that the implementation with 10% more multiplications per sample has half the power consumption and one third of the area consumption. The best architecture for this specification results in a power consumption of 3.12 W in a 65 nm technology, corresponding to an energy per complex filter tap of 0.10 mW/GHz.

    Download full text (pdf)
    Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters
  • 45.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    On Lifting-Based Fixed-Point Complex Multiplications and Rotations2017In: Proceedings 24th IEEE Symposium on Computer Arithmetic 24–26 July 2017 London, United Kingdom / [ed] Neil Burgess, Javier Bruguera and Florent de Dinechin, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 43-49Conference paper (Refereed)
    Abstract [en]

    Lifting-based complex multiplications and rotations are integer invertible, i.e., an integer input value is mapped to the same integer output value when rotating forward and backward. This is an important aspect for lossless transform-based source coding, but since the structure only require three real-valued multiplications and three real-valued additions it is also a potentially attractive way to perform complex multiplications when the coefficient has unity magnitude. In this work, we consider two aspects of these structures. First, we show that both the magnitude and angular error is dependent on the angle of input value and derive both exact and approximated expressions for these. Second, we discuss how to design such structures without the typical separation into three subsequent matrix multiplications. It is shown that the proposed design method allows many more values which are integer invertible, but can not be separated into three subsequent matrix multiplications with fixed-point values. The results show good correspondence between the error approximations and the actual error as well as a significantly increased design space.

    Download full text (pdf)
    On Lifting-Based Fixed-Point Complex Multiplications and Rotations
  • 46.
    Meher, Pramod Kumar
    et al.
    Independent Hardware Consultant.
    Chang, Chip-Hong
    Nanyang Technological University, Singapore, Singapore.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Vinod, A.P.
    Nanyang Technological University, Singapore, Singapore.
    Faust, Mattias
    mfnet gmbh, Switzerland.
    Shift‐Add Circuits for Constant Multiplications2017In: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, p. 33-76Chapter in book (Other academic)
    Abstract [en]

    The optimization of shift‐and‐add network for constant multiplications is found to have great potential for reducing the area, delay, and power consumption of implementation of multiplications in several computation‐intensive applications not only in dedicated hardware but also in programmable computing systems. To simplify the shift‐and‐add network in single constant multiplication (SCM) circuits, this chapter discusses three design approaches, including direct simplification from a given number representation, simplification by redundant signed digit (SD) representation, and simplification by adder graph. Examples of the multiple constant multiplication (MCM) methods are constant matrix multiplication, discrete cosine transform (DCT) or fast Fourier transform (FFT), and polyphase finite impulse response (FIR) filters and filter banks. The given constant multiplication methods can be used for matrix multiplications and inner‐product; and can be applied easily to image/video processing and graphics applications. The chapter further discusses some of the shortcomings in the current research on constant multiplications, and possible scopes of improvement.

  • 47.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing2016In: 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, Washington: IEEE COMPUTER SOC , 2016, p. 864-868Conference paper (Refereed)
    Abstract [en]

    Massive MIMO-systems have received considerable attention in recent years as an enabler in future wireless communication systems. As the idea is based on having a large number of antennas at the base station it is important to have both a scalable and distributed realization of such a system to ease deployment. Most work so far have focused on the theoretical aspects although a few demonstrators have been reported. In this work, we propose a base station architecture based on connecting the processing nodes in a K-ary tree, allowing simple scalability. Furthermore, it is shown that most of the processing can be performed locally in each node. Further analysis of the node processing shows that it should be enough that each node contains one or two complex multipliers and a few complex adders/subtracters operating at some hundred MHz. It is also shown that a communication link of some Gbps is required between the nodes, and, hence, it is fully feasible to have one or a few links between the nodes to cope with the communication requirements.

  • 48.
    Garrido Gálvez, Mario
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Källström, Petter
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Kumm, Martin
    University of Kassel, Germany.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    CORDIC II: A New Improved CORDIC Algorithm2016In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, no 2, p. 186-190Article in journal (Refereed)
    Abstract [en]

    In this brief, we present the CORDIC II algorithm. Like previous CORDIC algorithms, the CORDIC II calculates rotations by breaking down the rotation angle into a series of microrotations. However, the CORDIC II algorithm uses a novel angle set, different from the angles used in previous CORDIC algorithms. The new angle set provides a faster convergence that reduces the number of adders with respect to previous approaches.

    Download full text (pdf)
    fulltext
  • 49.
    Källström, Petter
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Fast and Area Efficient Adder for Wide Data in Recent Xilinx FPGAs2016In: 26th International Conference on Field-Programmable Logic and Applications, Lausanne: IEEE , 2016, p. 338-341Conference paper (Refereed)
    Abstract [en]

    Most modern FPGAs have very optimised carry logic for efficient implementations of ripple carry adders (RCA). Some FPGAs also have a six input look up table (LUT) per cell, whereof two inputs are used during normal addition. In this paper we present an architecture that compresses the carry chain length to N/2 in recent Xilinx FPGA, by utilising the LUTs better. This carry compression was implemented by letting some cells calculate the carry chain in two bits per cell, while some others calculate the summary output bits. In total the proposed design uses no more hardware than the normal adder. The result shows that the proposed adder is faster than a normal adder for word length larger than 64 bits in Virtex-6 FPGAs.

    Download full text (pdf)
    fulltext
  • 50.
    Ingemarsson, Carl
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Hardware Architecture for Positive Definite Matrix Inversion Based on LDL Decomposition and Back-Substitution2016In: 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, IEEE COMPUTER SOC , 2016, p. 859-863Conference paper (Refereed)
    Abstract [en]

    In this paper we propose an efficient hardware architecture for computation of matrix inversion of positive definite matrices. The algorithm chosen is LDL decomposition followed directly by equation system solving using back substitution. The architecture combines a high throughput with an efficient utilization of its hardware units. We also report FPGA implementation results that show that the architecture is well tailored for implementation in real-time applications.

12345 1 - 50 of 207
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf