liu.seSök publikationer i DiVA
Ändra sökning
Avgränsa sökresultatet
12345 1 - 50 av 202
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Khan, Mohd Tasleem
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    ASIC Implementation Trade-Offs for High-Speed LMS and Block LMS Adaptive Filters2022Ingår i: 65th International Midwest Symposium on Circuits and Systems (MWSCAS), Fukuoka, Japan: Institute of Electrical and Electronics Engineers (IEEE), 2022Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In this work, implementation trade-offs for ASIC-implementation of least-mean-square (LMS) and block LMS (BLMS) adaptive filters are presented. We explore the design trade-offs by increasing the block size and/or relying on the synthesis tool for increased sample rate. For area, lower block size is advantageous as long as the synthesis tool can meet timing. Energy optimum is however found at a different point in design space. Simulation confirms that longer block sizes leads to lower MSE errors for identical step-size. Hence, the design-point should be decided based on weighted requirements for area, energy and MSE.

  • 2.
    Gustafsson, Oscar
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Bae, Cheolyong
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Shift-and-Add Realization Trade-Offs for Chromatic Dispersion Compensation FIR Filters2022Ingår i: Optica Advanced Photonics Congress 2022, Optical Society of America, 2022, artikel-id SpTh1I.3Konferensbidrag (Refereegranskat)
    Abstract [en]

    Approaches to shift-and-add realization of time-domain chromatic dispersion compensation FIR filters are considered. The coefficient word length has larger impact than filter length on both BER penalty and adder complexity.

  • 3.
    Skarman, Frans
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Spade: An HDL Inspired by Modern Software Languages2022Ingår i: 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), Institute of Electrical and Electronics Engineers (IEEE), 2022, s. 454-455Konferensbidrag (Refereegranskat)
    Abstract [en]

    Spade is a new hardware description language which aims to make hardware description easier and less error prone. It does this by taking lessons from software programming languages, and adding language level support for common hardware constructs, all without compromising the low level control over what hardware gets generated.

  • 4.
    Gustafsson, Oscar
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Hellman, Noah
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska fakulteten.
    Approximate Floating-Point Operations with Integer Units by Processing in the Logarithmic Domain2021Ingår i: 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH), Institute of Electrical and Electronics Engineers (IEEE), 2021, s. 45-52Konferensbidrag (Refereegranskat)
    Abstract [en]

    Floating-point numbers represented using a hidden one can readily be approximately converted to the logarithmic domain using Mitchell's approximation. Once in the logarithmic domain, several arithmetic operations including multiplication, division, and square-root can be easily computed using the integer arithmetic unit. This has earlier been used in fast reciprocal square-root algorithms, sometimes referred to as magic number algorithms. The proposed approximate operations are realized by performing an integer operation using an integer unit on floating-point data and adding an integer constant to obtain the approximate floating-point result. In this work, we derive easy to use equations and constants for multiple floating-point formats and operations.

    Ladda ner fulltext (pdf)
    fulltext
  • 5.
    Bae, Cheolyong
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Finite Word Length Analysis for FFT-Based Chromatic Dispersion Compensation Filters2021Ingår i: Signal Processing in Photonic Communications 2021, OPTICA , 2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    Finite word length effects for frequency-domain implementation of chromatic dispersion compensation is analyzed. The results show a significant difference for the different factors when it comes to power consumption and receiver penalty.

    Ladda ner fulltext (pdf)
    fulltext
  • 6.
    Bertilsson, Erik
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Ingemarsson, Carl
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Low-Latency Parallel Hermitian Positive-Definite Matrix Inversion for Massive MIMO2021Ingår i: 2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), Institute of Electrical and Electronics Engineers (IEEE), 2021, s. 23-28Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this work, the effect of latency for three different positive definite matrix inversion algorithms when implemented on parallel and pipelined processing elements is considered. The work is motivated by the fact that in a massive MIMO system, matrix inversion needs to be performed between estimating the channels and producing the transmitted downlink signal, which means that the latency of the matrix inversion has a significant impact on the system performance. It is shown that, despite the algorithms having different complexity, all three algorithms can have the lowest latency for different number of processing elements and pipeline levels. Especially, in systems with many processing elements, the algorithm with the highest complexity has the lowest latency.

    Ladda ner fulltext (pdf)
    fulltext
  • 7.
    Mohammadi Sarband, Narges
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Becirovic, Ema
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Larsson, Erik G.
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Massive Machine-Type Communication Pilot-Hopping Sequence Detection Architectures Based on Non-Negative Least Squares for Grant-Free Random Access2021Ingår i: IEEE Open Journal of Circuits and Systems, ISSN 2644-1225, Vol. 2, s. 253-264Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    User activity detection in grant-free random access massive machine type communication (mMTC) using pilot-hopping sequences can be formulated as solving a non-negative least squares (NNLS) problem. In this work, two architectures using different algorithms to solve the NNLS problem is proposed. The algorithms are implemented using a fully parallel approach and fixed-point arithmetic, leading to high detection rates and low power consumption. The first algorithm, fast projected gradients, converges faster to the optimal value. The second algorithm, multiplicative updates, is partially implemented in the logarithmic domain, and provides a smaller chip area and lower power consumption. For a detection rate of about one million detections per second, the chip area for the fast algorithm is about 0.7 mm 2 compared to about 0.5 mm 2 for the multiplicative algorithm when implemented in a 28 nm FD-SOI standard cell process at 1 V power supply voltage. The energy consumption is about 300 nJ/detection for the fast projected gradient algorithm using 256 iterations, leading to a convergence close to the theoretical. With 128 iterations, about 250 nJ/detection is required, with a detection performance on par with 192 iterations of the multiplicative algorithm for which about 100 nJ/detection is required.

    Ladda ner fulltext (pdf)
    fulltext
  • 8.
    Bae, Cheolyong
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Overlap-Save Commutators for High-Speed Streaming Data Filtering2021Ingår i: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE , 2021Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Overlap-save and overlap-add methods enable efficient implementation of FIR filters. In this paper, a compact method for handling the overlap and shuffle of samples for realtime processing using pipelined FFT architectures is presented. It is suitable for cases when the sample rate is equal to or higher than the clock frequency

  • 9.
    Skarman, Frans
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Jung, Daniel
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    A Tool to Enable FPGA-Accelerated Dynamic Programming for Energy Management of Hybrid Electric Vehicles2020Ingår i: IFAC PAPERSONLINE, ELSEVIER , 2020, Vol. 53, nr 2, s. 15104-15109Konferensbidrag (Refereegranskat)
    Abstract [en]

    When optimising the vehicle trajectory and powertrain energy management of hybrid electric vehicles, it is important to include look-ahead information such as road conditions and other traffic. One method for doing so is dynamic programming, but the execution time of such an algorithm on a general purpose CPU is too slow for it to be useable in real time. Significant improvements in execution time can be achieved by utilising parallel computations, for example, using a Field-Programmable Gate Array (FPGA). A tool for automatically converting a vehicle model written in C++ into code that can executed on an FPGA which can be used for dynamic programming-based control is presented in this paper. A vehicle model with a mild-hybrid powertrain is used as a case study to evaluate the developed tool and the output quality and execution time of the resulting hardware. Copyright (C) 2020 The Authors.

  • 10.
    Skarman, Frans
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Jung, Daniel
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Acceleration of Simulation Models Through Automatic Conversion to FPGA Hardware2020Ingår i: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), IEEE , 2020, s. 359-360Konferensbidrag (Refereegranskat)
    Abstract [en]

    By running simulation models on FPGAs, their execution speed can be significantly improved, at the cost of increased development effort. This paper describes a project to develop a tool which converts simulation models written in high level languages into fast FPGA hardware. The tool currently converts code written using custom C++ data types into Verilog. A model of a hybrid electric vehicle is used as a case study, and the resulting hardware runs significantly faster than on a general purpose CPU.

  • 11.
    Henriksson, Mikael
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Kunnath Ganesan, Unnikrishnan
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Larsson, Erik G.
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    An Architecture for Grant-Free Random Access Massive Machine Type Communication Using Coordinate Descent2020Ingår i: Proceedings of Fifty-Fourth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA: IEEE, 2020, Vol. 54, s. 1112-1116Konferensbidrag (Refereegranskat)
    Abstract [en]

    An implementation of activity detection for grant-free massive machine type communication is presented. The implemented algorithm is based on coordinate descent which shows a rapid convergence time. A number of modifications to the original algorithm is proposed to allow efficient implementation in hardware. In addition, the implementation is based on fixed-point representation, and, hence, exhaustive word length simulations have been performed for the different processing steps.

    Ladda ner fulltext (pdf)
    fulltext
  • 12.
    Fougstedt, Christoffer
    et al.
    Chalmers Univ Technol, Sweden.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Bae, Cheolyong
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Borjeson, Erik
    Chalmers Univ Technol, Sweden.
    Larsson-Edefors, Per
    Chalmers Univ Technol, Sweden.
    ASIC Design Exploration for DSP and FEC of 400-Gbitis Coherent Data-Center Interconnect Receivers2020Ingår i: 2020 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), IEEE , 2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    We perform exploratory ASIC design of key DSP and FEC units for 400-Gbit/s coherent data-center interconnect receivers. In 22-nm CMOS, the considered units together dissipate 5 W, suggesting implementation feasibility in power-constrained form factors. (C) 2020 The Authors

  • 13.
    Fougstedt, Christoffer
    et al.
    Chalmers University of Technology, Gothenburg, Sweden.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Bae, Cheolyong
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Börjesson, Erik
    Chalmers University of Technology, Gothenburg, Sweden.
    Larsson-Edefors, Per
    Chalmers University of Technology, Gothenburg, Sweden.
    ASIC Design Exploration for DSP and FEC of 400-Gbit/s Coherent Data-Center Interconnect Receivers2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    We perform exploratory ASIC design of key DSP and FEC units for 400-Gbit/s coherent data-center interconnect receivers. In 22-nm CMOS, the considered units together dissipate 5 W, suggesting implementation feasibility in power-constrained form factors.

  • 14.
    Bae, Cheolyong
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Larsson-Edefors, Per
    Chalmers University of Technology, Gothenburg, Sweden.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Benefit of Prime Factor FFTs in Fully Parallel 60 GBaud CDC Filters2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    Prime factor algorithms are beneficial in fully parallel frequency-domain implementation of CDC filters and enable a more continuous scaling of filter lengths. ASICimplementation results in 28-nm CMOS for 60 GBd are provided.

  • 15.
    Bae, Cheolyong
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    High-Speed Chromatic Dispersion Compensation Filtering in FPGAs for Coherent Optical Communication2020Ingår i: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), IEEE, 2020, s. 357-358Konferensbidrag (Refereegranskat)
    Abstract [en]

    Chromatic dispersion is one of the error sources limiting the transmission capacity in coherent optical communication that can be mitigated with digital signal processing. In this paper, the current status and plans of implementation of chromatic dispersion compensation (CDC) filters on FPGAs are discussed. As these high-speed filters are most efficiently implemented in the frequency-domain, different approaches for high-speed FFT-based architectures are considered and preliminary results of fully parallel FFT implementation by utilizing FPGA hardware features are presented.

  • 16.
    Alam, Syed Asad
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten. Namal Inst, Pakistan.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Improved Particle Filter Resampling Architectures2020Ingår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 92, nr 6, s. 555-568Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step. This is because of high latency as it can be only partially executed in parallel with the other steps of particle filtering and has no inherent parallelism inside it. To reduce the latency, an improved resampling architecture is proposed which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator along with architectures for realizing the pre-fetch technique. This enables a particle filter using M particles with otherwise streaming operation to get new inputs more often than 2M cycles as the previously best approach gives. Results show that a pre-fetch buffer of five values achieves the best area-latency reduction trade-off while on average achieving an 85% reduction in latency for the resampling step leading to a sample time reduction of more than 40%. We also propose a generic division-free architecture for the resampling steps. It also removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. In addition, on-the-fly computation of the cumulative sum of weights is proposed which helps reduce the word length of the particle weight memory. FPGA implementation results show that the memory size is reduced by up to 50%.

    Ladda ner fulltext (pdf)
    fulltext
  • 17.
    Mohammadi Sarband, Narges
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Becirovic, Ema
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Larsson, Erik G.
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Pilot-Hopping Sequence Detection Architecture for Grant-Free Random Access using Massive MIMO2020Ingår i: 2020 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this work, an implementation of a pilot-hopping sequence detector for massive machine type communication is presented. The architecture is based on solution a non-negative least squares problem. The results show that the architecture supporting 1024 users can perform more than one million detections per second with a power consumption of less than 70 mW when implemented in a 28 nm FD-SOI process.

  • 18.
    Mohammadi Sarband, Narges
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Garrido Gálvez, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Using Transposition to Efficiently Solve Constant Matrix-Vector Multiplication and Sum of Product Problems2020Ingår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 92, nr 10, s. 1075-1089Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this work, we present an approach to alleviate the potential benefit of adder graph algorithms by solving the transposed form of the problem and then transposing the solution. The key contribution is a systematic way to obtain the transposed realization with a minimum number of cascaded adders subject to the input realization. In this way, wide and low constant matrix multiplication problems, with sum of products as a special case, which are normally exceptionally time consuming to solve using adder graph algorithms, can be solved by first transposing the matrix and then transposing the solution. Examples show that while the relation between the adder depth of the solution to the transposed problem and the original problem is not straightforward, there are many cases where the reduction in adder cost will more than compensate for the potential increase in adder depth and result in implementations with reduced power consumption compared to using sub-expression sharing algorithms, which can both solve the original problem directly in reasonable time and guarantee a minimum adder depth.

    Ladda ner fulltext (pdf)
    fulltext
  • 19.
    Kanders, Hans
    et al.
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska fakulteten.
    Mellqvist, Tobias
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska fakulteten.
    Garrido Gálvez, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Palmkvist, Kent
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    A 1 Million-Point FFT on a Single FPGA2019Ingår i: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 66, nr 10, s. 3863-3873Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we present the first implementation of a 1 million-point fast Fourier transform (FFT) completely integrated on a single field-programmable gate array (FPGA), without the need for external memory or multiple interconnected FPGAs. The proposed architecture is a pipelined single-delay feedback (SDF) FFT. The architecture includes a specifically designed 1 million-point rotator with high accuracy and a thorough study of the word length at the different FFT stages in order to increase the signal-to-quantization-noise ratio (SQNR) and keep the area low. This also results in low power consumption.

    Ladda ner fulltext (pdf)
    fulltext
  • 20.
    Tran, Markus
    et al.
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Källström, Petter
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Senel, Kamil
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Larsson, Erik G
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    An Architecture for Grant-Free Massive MIMO MTC Based on Compressive Sensing2019Ingår i: CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEE , 2019, s. 901-905Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this work, a processing architecture for grant-free machine type communication based on compressive sensing is proposed. The architecture can be adapted for a number of parameters. An instantiation for 128 terminals and 96 antennas is implemented. Without memories it consumes 1.52 W and occupies and area of 5.1 mm(2) in a 28 nm SOI CMOS process. The implemented instance can process about 10k messages per second, each containing four bits.

  • 21.
    Gustafsson, Oscar
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Wanhammar, Lars
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Arithmetic2019Ingår i: Handbook of signal processing systems / [ed] Bhattacharyya, S.S., Deprettere, E.F., Leupers, R., Takala, J., Cham: Springer, 2019, 3, s. 381-426Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    In this chapter fundamentals of arithmetic operations and number representations used in DSP systems are discussed. Different relevant number systems are outlined with a focus on fixed-point representations. Structures for accelerating the carry-propagation of addition are discussed, as well as multi-operand addition. For multiplication, different schemes for generating and accumulating partial products are presented. In addition to that, optimization for constant coefficient multiplication is discussed. Division and square-rooting are also briefly outlined. Furthermore, floating-point arithmetic and the IEEE 754 floating-point arithmetic standard are presented. Finally, some methods for computing elementary functions, e.g., trigonometric functions, are presented.

  • 22.
    Sadeghifar, Mohammad Reza
    et al.
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten. Ericsson AB, Sweden.
    Bengtsson, Hakan
    Ericsson AB, Sweden.
    Wikner, Jacob
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Direct digital-to-RF converter employing semi-digital FIR voltage-mode RF DAC2019Ingår i: Integration, ISSN 0167-9260, E-ISSN 1872-7522, Vol. 66, s. 128-134Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A direct digital-to-RF converter (DRFC) is presented in this work. Due to its digital-in-nature design, the DRFC benefits from technology scaling and can be monolithically integrated into advance digital VLSI systems. A fourth-order single-bit quantizer bandpass digital EA modulator is used preceding the DRFC, resulting in a high in-band signal-to-noise ratio (SNR). The out-of-band spectrally-shaped quantization noise is attenuated by an embedded semi-digital FIR filter (SDFIR). The RF output frequencies are synthesized by a novel configurable voltage-mode RF DAC solution with a high linearity performance. The configurable RF DAC is directly synthesizing RF signals up to 10 GHz in first or second Nyquist zone. The proposed DRFC is designed in 22 nm FDSOI CMOS process and with the aid of Monte-Carlo simulation, shows 78.6 dBc and 63.2 dBc worse case third intermodulation distortion (IM3) under process mismatch in 2.5 GHz and 7.5 GHz output frequency respectively.

    Ladda ner fulltext (pdf)
    fulltext
  • 23.
    Garrido, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Qureshi, Fahad
    Tampere University of Technology.
    Takala, Jarmo
    Tampere University of Technology.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Hardware architectures for the fast Fourier transform2019Ingår i: Handbook of signal processing systems / [ed] Bhattacharyya, S.S., Deprettere, E.F., Leupers, R., Takala, J., Cham: Springer, 2019, 3, s. 613-647Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    The fast Fourier transform (FFT) is a widely used algorithm in signal processing applications. FFT hardware architectures are designed to meet the requirements of the most demanding applications in terms of performance, circuit area, and/or power consumption. This chapter summarizes the research on FFT hardware architectures by presenting the FFT algorithms, the building blocks in FFT hardware architectures, the architectures themselves, and the bit reversal algorithm.

  • 24.
    Sadeghifar, Mohammad Reza
    et al.
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten. Ericsson AB, Sweden.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Wikner, Jacob
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Optimization problem formulation for semi-digital FIR digital-to-analog converter considering coefficients precision and analog metrics2019Ingår i: Analog Integrated Circuits and Signal Processing, ISSN 0925-1030, E-ISSN 1573-1979, Vol. 99, nr 2, s. 287-298Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Optimization problem formulation for semi-digital FIR digital-to-analog converter (SDFIR DAC) is investigated in this work. Magnitude and energy metrics with variable coefficient precision are defined for cascaded digital sigma modulators, semi-digital FIR filter, and Sinc roll-off frequency response of the DAC. A set of analog metrics as hardware cost is also defined to be included in SDFIR DAC optimization problem formulation. It is shown in this work, that hardware cost of the SDFIR DAC, can be significantly reduced by introducing flexible coefficient precision while the SDFIR DAC is not over designed either. Different use-cases are selected to demonstrate the optimization problem formulations. A combination of magnitude metric, energy metric, coefficient precision and analog metrics are used in different use cases of optimization problem formulation and solved to find out the optimum set of analog FIR taps. A new method with introducing the variable coefficient precision in optimization procedure was proposed to avoid non-convex optimization problems. It was shown that up to 22% in the total number of unit elements of the SDFIR filter can be saved when targeting the analog metric as the optimization objective subject to magnitude constraint in pass-band and stop-band.

    Ladda ner fulltext (pdf)
    fulltext
  • 25.
    Garrido, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Grajal, Jesus
    Univ Politecn Madrid, Spain.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Optimum Circuits for Bit-Dimension Permutations2019Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 27, nr 5, s. 1148-1160Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we present a systematic approach to design hardware circuits for bit-dimension permutations. The proposed approach is based on decomposing any bit-dimension permutation into elementary bit-exchanges. Such decomposition is proven to achieve the theoretical minimum number of delays required for the permutation. This offers optimum solutions for multiple well-known problems in the literature that make use of bit-dimension permutations. This includes the design of permutation circuits for the fast Fourier transform, bit reversal, matrix transposition, stride permutations, and Viterbi decoders.

    Ladda ner fulltext (pdf)
    fulltext
  • 26.
    Bertilsson, Erik
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Larsson, Erik G
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    A Modular Base Station Architecture for Massive MIMO with Antenna and User Scalability per Processing Node2018Ingår i: 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, IEEE , 2018, s. 1649-1653Konferensbidrag (Refereegranskat)
    Abstract [en]

    Massive MIMO is key technology for the upcoming fifth generation cellular networks (5G), promising high spectral efficiency, low power consumption, and the use of cheap hardware to reduce costs. Previous work has shown how to create a distributed processing architecture, where each node in a network performs the computations related to one or more antennas. The required total number of antennas, M, at the base station depends on the number of simultaneously operating terminals, K. In this work, a flexible node architecture is presented, where the number of terminals can he traded for additional antennas at the same node. This means that the same node can be used with a wide range of system configurations. The computational complexity, along with the order in which to compute incoming and outgoing symbols is explored.

  • 27.
    Jang, Jeong Keun
    et al.
    Dongbu Hitek, South Korea.
    Kim, Ho Keun
    Ajou Univ, South Korea.
    Sunwoo, Myung Hoon
    Ajou Univ, South Korea.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Area-Efficient Scheduling Scheme Based FFT Processor for Various OFDM Systems2018Ingår i: 2018 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2018), IEEE , 2018, s. 338-341Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents an area-efficient fast Fourier transform (FFT) processor for orthogonal frequency-division multiplexing systems based on multi-path delay commutator architecture. This paper proposes a data scheduling scheme to reduce the number of complex constant multipliers. The proposed mixed-radix multi-path delay commutator FFT processor can support 128-, 256-, and 512-point FFT sizes. The proposed processor was synthesized using the Samsung 65-nm CMOS standard cell library. The proposed processor with eight parallel data paths can achieve a high throughput rate of up to 2.64 GSample/s at 330 MHz.

  • 28.
    Jang, Jeong Keun
    et al.
    Dongbu Hitek, South Korea.
    Kim, Ho Keun
    Department of Electrical and Computer Engineering, Ajou University, Suwon, Korea.
    Sunwoo, Myung Hoon
    Department of Electrical and Computer Engineering, Ajou University, Suwon, Korea.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Area-efficient scheduling scheme based FFT processor for various OFDM systems2018Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper presents an area-efficient fast Fouriertransform (FFT) processor for orthogonal frequency-division multiplexing systems based on multi-path delay commutator architecture. This paper proposes a data scheduling scheme to reduce the number of complex constant multipliers. The proposed mixed-radix multi-path delay commutator FFT processor can support 128-, 256-, and 512-point FFT sizes. The proposed processor was synthesized using the Samsung 65-nm CMOS standard cell library. The proposed processor with eight parallel data paths can achieve a high throughput rate of up to 2.64 GSample/s at 330 MHz.

    Ladda ner fulltext (pdf)
    fulltext
  • 29.
    Bae, Cheolyong
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gokhale, Madhur
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Garrido Gálvez, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Improved Implementation Approaches for 512-tap 60 GSa/s Chromatic Dispersion FIR Filters2018Ingår i: 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, IEEE , 2018, s. 213-217Konferensbidrag (Refereegranskat)
    Abstract [en]

    In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. In this work, it is shown that it is not required to use two overlapping FFTs to obtain continuous filtering. In addition, efficient highly parallel implementation of FFTs is discussed and an unproved FFT compared to our earlier work is proposed. The results are compared to using an approach with a shorter FFT and FIR filters.

  • 30.
    Kumm, Martin
    et al.
    University of Kassel, Digital Technology Group, Germany.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    de Dinechin, Florent
    Univ Lyon, INSA Lyon, Inria, CITI, France.
    Kappauf, Johannes
    University of Kassel, Digital Technology Group, Germany.
    Zipf, Peter
    University of Kassel, Digital Technology Group, Germany.
    Karatsuba with Rectangular Multipliers for FPGAs2018Ingår i: 2018 IEEE 25TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), IEEE, 2018, s. 13-20Konferensbidrag (Refereegranskat)
    Abstract [en]

    This work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18x25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18x18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPGA case, this can be obtained by using the embedded multipliers as 16x24 unsigned and as 17x25 signed ones.The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits.

    Ladda ner fulltext (pdf)
    fulltext
  • 31.
    Mohammadi Sarband, Narges
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Garrido, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Obtaining Minimum Depth Sum of Products from Multiple Constant Multiplication2018Ingår i: PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), IEEE, Institute of Electrical and Electronics Engineers (IEEE), 2018, s. 134-139Konferensbidrag (Refereegranskat)
    Abstract [sv]

    In this work, an approach for transposing solutions to the multiple constant multiplication (MCM) problem to obtain a sum of product (SOP) computation with minimum depth is proposed. The reason for doing this is that solving the SOP problem directly is highly computationally intensive when adder graph algorithms are used. Compared to using subexpression sharing algorithms, which has a lower computational complexity, directly for the SOP problem, it is shown that the proposed approach, as expected, results in lower complexity for the SOP. It is also shown that there is no obvious way to construct the MCM solution in such a way that the SOP solution has the minimum theoretical depth. However, the proposed approach guarantees minimum depth subject to the MCM solution given as input.

    Ladda ner fulltext (pdf)
    fulltext
  • 32.
    Kumm, Martin
    et al.
    Univ Kassel, Germany.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Garrido Gálvez, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Zipf, Peter
    Univ Kassel, Germany.
    Optimal Single Constant Multiplication Using Ternary Adders2018Ingår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 65, nr 7, s. 928-932Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The single constant coefficient multiplication is a frequently used operation in many numeric algorithms. Extensive previous work is available on how to reduce constant multiplications to additions, subtractions, and bit shifts. However, on previous work, only common two-input adders were used. As modern field-programmable gate arrays (FPGAs) support efficient ternary adders, i.e., adders with three inputs, this brief investigates constant multiplications that are built from ternary adders in an optimal way. The results show that the multiplication with any constant up to 22 bits can be realized by only three ternary adders. Average adder reductions of more than 33% compared to optimal constant multiplication circuits using two-input adders are achieved for coefficient word sizes of more than five bits. Synthesis experiments show FPGA average slice reductions in the order of 25% and a similar or higher speed than their two-input adder counterparts.

  • 33.
    Ingemarsson, Carl
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    SFF—The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture2018Ingår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 90, nr 11, s. 1583-1592Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art.

    Ladda ner fulltext (pdf)
    fulltext
  • 34.
    Gustafsson, Oscar
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Bertilsson, Erik
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Klasson, Johannes
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Ingemarsson, Carl
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)2017Ingår i: Proceedings 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), London, UK, 24-26 July 2017 / [ed] Neil Burgess, Javier Bruguera, and Florent de Dinechin, Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 62-63Konferensbidrag (Refereegranskat)
    Abstract [en]

    Approximate matrix inversion based on Neumann series has seen a recent increased interest motivated by massive MIMO systems. There, the matrices are in many cases diagonally dominant, and, hence, a reasonable approximation can be obtained within a few iterations of a Neumann series. In this work, we clarify that the complexity of exact methods are about the same as when three terms are used for the Neumann series, so in this case, the complexity is not lower as often claimed. The second common argument for Neumann series approximation, higher parallelism, is indeed correct. However, in most current practical use cases, such a high degree of parallelism is not required to obtain a low latency realization. Hence, we conclude that a careful evaluation, based on accuracy and latency requirements must be performed and that exact matrix inversion is in fact viable in many more cases than the current literature claims.

    Ladda ner fulltext (pdf)
    Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? (Invited Paper)
  • 35.
    Gustafsson, Oscar
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Wanhammar, Lars
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Basic Arithmetic Circuits2017Ingår i: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, s. 1-32Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    General‐purpose DSP processors, application‐specific processors, and algorithm‐specific processors are used to implement different types of DSP systems or subsystems. They are typically used in applications involving complex and irregular algorithms while application‐specific processors provide lower unit cost and higher performance for a specific application, particularly when the volume of production is high. Most DSP applications use fractional arithmetic instead of integer arithmetic. Multimedia and communication applications involve real‐time audio and video/image processing which very often require sum‐of‐products (SOP) computation. The need of computing non‐linear functions arises in many different applications. The straightforward method of approximating an elementary function is to just store the values in a look‐up table typically leads to large tables, even though the resulting area from standard cell synthesis grows slower than the number of memory bits. It is of interest to find ways to approximate elementary functions using a trade‐off between arithmetic operations and look‐up tables.

  • 36.
    Bertilsson, Erik
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Larsson, Erik G.
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO2017Ingår i: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, s. 466-469Konferensbidrag (Refereegranskat)
    Abstract [en]

    Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.

  • 37.
    Ingemarsson, Carl
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Källström, Petter
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Qureshi, Fahad
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska fakulteten. Tampere University of Technology, Finland.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Efficient FPGA Mapping of Pipeline SDF FFT Cores2017Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 25, nr 9, s. 2486-2497Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, an efficient mapping of the pipeline single-path delay feedback (SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays (FPGAs) is proposed. By considering the architectural features of the target FPGA, significantly better implementation results are obtained. This is illustrated by mapping an R22SDF 1024-point FFT core toward both Xilinx Virtex-4 and Virtex-6 devices. The optimized FPGA mapping is explored in detail. Algorithmic transformations that allow a better mapping are proposed, resulting in implementation achievements that by far outperforms earlier published work. For Virtex-4, the results show a 350% increase in throughput per slice and 25% reduction in block RAM (BRAM) use, with the same amount of DSP48 resources, compared with the best earlier published result. The resulting Virtex-6 design sees even larger increases in throughput per slice compared with Xilinx FFT IP core, using half as many DSP48E1 blocks and less BRAM resources. The results clearly show that the FPGA mapping is crucial, not only the architecture and algorithm choices.

  • 38.
    Kovalev, Anton
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Garrido, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters2017Ingår i: Conference Record of The Fifty-First Asilomar Conference on Signals, Systems & Computers / [ed] Michael B. Matthews, Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 1779-1783Konferensbidrag (Refereegranskat)
    Abstract [en]

    In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. Furthermore, the theoretical multiplication complexity estimates are not good predictors for the energy consumption. The results show that the implementation with 10% more multiplications per sample has half the power consumption and one third of the area consumption. The best architecture for this specification results in a power consumption of 3.12 W in a 65 nm technology, corresponding to an energy per complex filter tap of 0.10 mW/GHz.

    Ladda ner fulltext (pdf)
    Implementation approaches for 512-tap 60 GSa/s chromatic dispersion FIR filters
  • 39.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    On Lifting-Based Fixed-Point Complex Multiplications and Rotations2017Ingår i: Proceedings 24th IEEE Symposium on Computer Arithmetic 24–26 July 2017 London, United Kingdom / [ed] Neil Burgess, Javier Bruguera and Florent de Dinechin, Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 43-49Konferensbidrag (Refereegranskat)
    Abstract [en]

    Lifting-based complex multiplications and rotations are integer invertible, i.e., an integer input value is mapped to the same integer output value when rotating forward and backward. This is an important aspect for lossless transform-based source coding, but since the structure only require three real-valued multiplications and three real-valued additions it is also a potentially attractive way to perform complex multiplications when the coefficient has unity magnitude. In this work, we consider two aspects of these structures. First, we show that both the magnitude and angular error is dependent on the angle of input value and derive both exact and approximated expressions for these. Second, we discuss how to design such structures without the typical separation into three subsequent matrix multiplications. It is shown that the proposed design method allows many more values which are integer invertible, but can not be separated into three subsequent matrix multiplications with fixed-point values. The results show good correspondence between the error approximations and the actual error as well as a significantly increased design space.

    Ladda ner fulltext (pdf)
    On Lifting-Based Fixed-Point Complex Multiplications and Rotations
  • 40.
    Meher, Pramod Kumar
    et al.
    Independent Hardware Consultant.
    Chang, Chip-Hong
    Nanyang Technological University, Singapore, Singapore.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Vinod, A.P.
    Nanyang Technological University, Singapore, Singapore.
    Faust, Mattias
    mfnet gmbh, Switzerland.
    Shift‐Add Circuits for Constant Multiplications2017Ingår i: Arithmetic Circuits for DSP Applications / [ed] Pramod Kumar Meher, Thanos Stouraitis, John Wiley & Sons, 2017, s. 33-76Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    The optimization of shift‐and‐add network for constant multiplications is found to have great potential for reducing the area, delay, and power consumption of implementation of multiplications in several computation‐intensive applications not only in dedicated hardware but also in programmable computing systems. To simplify the shift‐and‐add network in single constant multiplication (SCM) circuits, this chapter discusses three design approaches, including direct simplification from a given number representation, simplification by redundant signed digit (SD) representation, and simplification by adder graph. Examples of the multiple constant multiplication (MCM) methods are constant matrix multiplication, discrete cosine transform (DCT) or fast Fourier transform (FFT), and polyphase finite impulse response (FIR) filters and filter banks. The given constant multiplication methods can be used for matrix multiplications and inner‐product; and can be applied easily to image/video processing and graphics applications. The chapter further discusses some of the shortcomings in the current research on constant multiplications, and possible scopes of improvement.

  • 41.
    Bertilsson, Erik
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Larsson, Erik G
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing2016Ingår i: 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, Washington: IEEE COMPUTER SOC , 2016, s. 864-868Konferensbidrag (Refereegranskat)
    Abstract [en]

    Massive MIMO-systems have received considerable attention in recent years as an enabler in future wireless communication systems. As the idea is based on having a large number of antennas at the base station it is important to have both a scalable and distributed realization of such a system to ease deployment. Most work so far have focused on the theoretical aspects although a few demonstrators have been reported. In this work, we propose a base station architecture based on connecting the processing nodes in a K-ary tree, allowing simple scalability. Furthermore, it is shown that most of the processing can be performed locally in each node. Further analysis of the node processing shows that it should be enough that each node contains one or two complex multipliers and a few complex adders/subtracters operating at some hundred MHz. It is also shown that a communication link of some Gbps is required between the nodes, and, hence, it is fully feasible to have one or a few links between the nodes to cope with the communication requirements.

  • 42.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Källström, Petter
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Kumm, Martin
    University of Kassel, Germany.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    CORDIC II: A New Improved CORDIC Algorithm2016Ingår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, nr 2, s. 186-190Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this brief, we present the CORDIC II algorithm. Like previous CORDIC algorithms, the CORDIC II calculates rotations by breaking down the rotation angle into a series of microrotations. However, the CORDIC II algorithm uses a novel angle set, different from the angles used in previous CORDIC algorithms. The new angle set provides a faster convergence that reduces the number of adders with respect to previous approaches.

    Ladda ner fulltext (pdf)
    fulltext
  • 43.
    Källström, Petter
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Fast and Area Efficient Adder for Wide Data in Recent Xilinx FPGAs2016Ingår i: 26th International Conference on Field-Programmable Logic and Applications, Lausanne: IEEE , 2016, s. 338-341Konferensbidrag (Refereegranskat)
    Abstract [en]

    Most modern FPGAs have very optimised carry logic for efficient implementations of ripple carry adders (RCA). Some FPGAs also have a six input look up table (LUT) per cell, whereof two inputs are used during normal addition. In this paper we present an architecture that compresses the carry chain length to N/2 in recent Xilinx FPGA, by utilising the LUTs better. This carry compression was implemented by letting some cells calculate the carry chain in two bits per cell, while some others calculate the summary output bits. In total the proposed design uses no more hardware than the normal adder. The result shows that the proposed adder is faster than a normal adder for word length larger than 64 bits in Virtex-6 FPGAs.

    Ladda ner fulltext (pdf)
    fulltext
  • 44.
    Ingemarsson, Carl
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Hardware Architecture for Positive Definite Matrix Inversion Based on LDL Decomposition and Back-Substitution2016Ingår i: 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, IEEE COMPUTER SOC , 2016, s. 859-863Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper we propose an efficient hardware architecture for computation of matrix inversion of positive definite matrices. The algorithm chosen is LDL decomposition followed directly by equation system solving using back substitution. The architecture combines a high throughput with an efficient utilization of its hardware units. We also report FPGA implementation results that show that the architecture is well tailored for implementation in real-time applications.

  • 45.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Andersson, Rikard
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska högskolan.
    Qureshi, Fahad
    Tampere University of Technology, Finland.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Multiplierless Unity-Gain SDF FFTs2016Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 24, nr 9, s. 3003-3007Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this brief, we propose a novel approach to implement multiplierless unity-gain single-delay feedback fast Fourier transforms (FFTs). Previous methods achieve unity-gain FFTs by using either complex multipliers or nonunity-gain rotators with additional scaling compensation. Conversely, this brief proposes unity-gain FFTs without compensation circuits, even when using nonunity-gain rotators. This is achieved by a joint design of rotators, so that the entire FFT is scaled by a power of two, which is then shifted to unity. This reduces the amount of hardware resources of the FFT architecture, while having high accuracy in the calculations. The proposed approach can be applied to any FFT size, and various designs for different FFT sizes are presented.

    Ladda ner fulltext (pdf)
    fulltext
  • 46.
    Alam, Syed Asad
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    On the implementation of time-multiplexed frequency-response masking filters2016Ingår i: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, nr 15, s. 3933-3944Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

  • 47.
    Garrido Gálvez, Mario
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Chen, Sau-Gee
    National Chiao Tung University, Taiwan.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    The Serial Commutator FFT2016Ingår i: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 63, nr 10, s. 974-978Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This brief presents a new type of fast Fourier transform (FFT) hardware architectures called serial commutator (SC) FFT. The SC FFT is characterized by the use of circuits for bit-dimension permutation of serial data. The proposed architectures are based on the observation that, in the radix-2 FFT algorithm, only half of the samples at each stage must be rotated. This fact, together with a proper data management, makes it possible to allocate rotations only every other clock cycle. This allows for simplifying the rotator, halving the complexity with respect to conventional serial FFT architectures. Likewise, the proposed approach halves the number of adders in the butterflies with respect to previous architectures. As a result, the proposed architectures use the minimum number of adders, rotators, and memory that are necessary for a pipelined FFT of serial data, with 100% utilization ratio.

    Ladda ner fulltext (pdf)
    fulltext
  • 48.
    Gustafsson, Oscar
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Johansson, Håkan
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska högskolan.
    Decimation Filters for High-Speed Delta-Sigma Modulators With Passband Constraints: General Versus CIC-Based FIR Filters2015Ingår i: 2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE conference proceedings, 2015, s. 2205-2208Konferensbidrag (Refereegranskat)
    Abstract [en]

    For high-speed delta-sigma modulators the decimation filters are typically polyphase FIR filters as the recursive CIC filters can not be implemented because of the iteration period bound. In addition, the high clock frequency and short input word length make multiple constant multiplication techniques less beneficial. Instead a realistic complexity measure in this setting is the number of non-zero digits of the FIR filter tap coefficients. As there is limited control of the passband approximation error for CIC-based filters these must in most cases be compensated to meet a passband specification. In this work we investigate the complexity of decimation filters meeting CIC-like stopband behavior, but with a well defined passband approximation error. It is found that the general approach can in many cases produce filters with much smaller passband approximation error at a similar complexity.

  • 49.
    Johansson, Håkan
    et al.
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Filter-Bank Based All-Digital Channelizers and Aggregators for Multi-Standard Video Distribution2015Ingår i: IEEE International Conference on Digital Signal Processing (DSP), 2015, IEEE , 2015, s. 1117-1120Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper introduces all-digital flexible channelizersand aggregators for multi-standard video distribution. The overall problem is to aggregate a number of narrow-band subsignals with different bandwidths (6, 7, or 8 MHz) into one composite wide-band signal. In the proposed scheme, this is carried out through a set of analysis filter banks (FBs), that channelize the subsignals into 1/2-MHz subbands, which subsequently are aggregated through one synthesis FB. In this way, full flexibility with a low computational complexity and maintained quality is enabled. The proposed solution offers orders-of-magnitude complexity reductions as compared with a straightforward alternative. Design examples are included that demonstrate the functionality, flexibility, and efficiency.

  • 50.
    Alam, Syed Asad
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Gustafsson, Oscar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Generalized Division-Free Architecture and Compact Memory Structure for Resampling in Particle Filters2015Ingår i: 2015 European Conference on Circuit Theory and Design (ECCTD), IEEE Press, 2015, s. 416-419Konferensbidrag (Refereegranskat)
    Abstract [en]

    The most challenging step of implementing particle filtering is the resampling step which replicates particles with large weights and discards those with small weights. In this paper, we propose a generic architecture for resampling which uses double multipliers to avoid normalization divisions and make the architecture  equally efficient for non-powers-of-two number of particles. Furthermore, the complexity of resampling is greatly affected by the size of memories used to store weights. We illustrate that by storing the original weights instead of their cumulative sum and calculating them online reduces the total complexity, in terms of area, ranging from 21% to 45%, while giving up to 50% reduction in memory usage.

12345 1 - 50 av 202
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf