liu.seSearch for publications in DiVA
Change search
Refine search result
1234567 1 - 50 of 334
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Acevedo, Miguel
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    FPGA-Based Hardware-In-the-Loop Co-Simulator Platform for SystemModeler2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This thesis proposes and implements a flexible platform to perform Hardware-In-the-Loop (HIL) co-simulation using a Field-Programmable-Gate-Array (FPGA). The HIL simulations are performed with SystemModeler working as a software simulator and the FPGA as the co-simulator platform for the digital hardware design. The work presented in this thesis consists of the creation of: A communication library in the host computer, a system in the FPGA that allows implementation of different digital designs with varying architectures, and an interface between the host computer and the FPGA to transmit the data. The efficiency of the proposed system is studied with the implementation of two common digital hardware designs, a PID controller and a filter. The results of the HIL simulations of those two hardware designs are used to verify the platform and measure the timing and area performance of the proposed HIL platform.

  • 2.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wikner, J. Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    On Scaling and Output Cardinality of Multi-Bit Digital Error-Feedback Modulators2012Manuscript (preprint) (Other academic)
    Abstract [en]

    In order to determine a maximum allowed input scale for the stable operation of higher-order delta-sigma modulators, the designers largely depend on the analytical and numerical analysis. In this brief, the maximum allowed input scale to a multi-bit digital error-feedback  deltasigma modulator of arbitrary order is derived, mathematically. The digital modulator with an arbitrary output word length is stable if its output does not overflow. Thus, to avoid overflow of the modulator output, the relations between the peak values of the involved digital signals are devised. A number of example configurations are presented to illustrate the usefulness of the derivations.

  • 3.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Wikner, J. Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Digital Multi-bit Cascaded Error-Feedback ΔΣ Modulators With Reduced Hardware and Power Consumption2012Manuscript (preprint) (Other academic)
    Abstract [en]

    The hardware of the multi-bit digital error feedback modulator (EFM) of arbitrary order has recently been reduced by using multiple EFMs in cascade. In this paper, a modified cascading strategy is devised. Parts of the processing of consecutively placed EFM stages are merged such that a significant amount of circuitry is removed in each stage. In the proposed design, the modulated output is represented by a set of encoded signals to be used by the signal processing block placed after the EFM.

    To illustrate the savings, a number of configurations of fourth-order EFM designs, composed of two- and three-cascaded stages, have been synthesized in a 65 nm CMOS process technology using conventional and the proposed implementation techniques. Savings of 52.7% and 47%, in terms of area and power consumption, respectively, at an oversampling ratio of 4 could be obtain. The trade-off between sampling frequency and hardware cost is also presented. Due to reduced hardware an increase of up to 600 MHz in the sampling frequency is achieved.

  • 4.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Reducing Complexity and Power of Digital Multibit Error-Feedback Delta Sigma Modulators2014In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 61, no 9, p. 641-645Article in journal (Refereed)
    Abstract [en]

    In this brief, we propose how the hardware complexity of arbitrary-order digital multibit error-feedback delta-sigma modulators can be reduced. This is achieved by splitting the combinatorial circuitry of the modulators into two parts, i.e., one producing the modulator output and another producing the error signal fed back. The part producing modulator output is removed by utilizing a unit-element-based digital-to-analog converter. To illustrate the reduced complexity and power consumption, we compare the synthesized results with those of conventional structures. Fourth-order modulators implemented with the proposed technique use up to 26% less area compared with conventional implementations. Due to the area reduction, the designs consume up to 33% less dynamic power. Furthermore, it can operate at a frequency 100 MHz higher than that of the conventional.

  • 5.
    Ahmed, Mohsin Niaz
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    LTE Uplink Modeling and Channel Estimation2011Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This master thesis investigates the uplink transmition from User Equipment (UE) to base station in LET (Long Term Evolution) and channel estimation using pilot symbols with parameter defined in 3GPP (3rd Generation Partnership Project) specifications. The purpose of the thesis was to implement a simulator which can generate uplink signal as it is generated by UE. The Third Generation (3G) mobile system was given the name LTE. This thesis focus on the uplink of LTE where single carrier frequency division multiple access (SC-FDMA) is utilized as a multiple access technique. The advantage over the orthogonal frequency division multiple access (OFDMA), which is used in downlink is to get better peak power characteristics. Because in uplink communication better peak power characteristic is necessary for better power efficiency in mobile terminals. To access the performance of uplink transmition realistic channel model for wireless communication system is essential. Channel models used are proposed by International Telecommunication Union (ITU) and the correct knowledge of these models is important for testing, optimization and performance improvements of signal processing algorithms. The channel estimation techniques used are Least Square (LS) and Least Minimum Mean Square Error (LMMSE) for different channel models. Performance of these algorithms has been measured in term of Bit Error Rate (BER) and Signal to Noise Ratio (SNR).

  • 6.
    Alam, Syed Asad
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Techniques for Efficient Implementation of FIR and Particle Filtering2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    FIR filters occupy a central place many signal processing applications which either alter the shape, frequency or the sampling frequency of the signal. FIR filters are used because of their stability and possibility to have linear-phase but require a high filter order to achieve the same magnitude specifications as compared to IIR filters. Depending on the size of the required transition bandwidth the filter order can range from tens to hundreds to even thousands. Since the implementation of the filters in digital domain requires multipliers and adders, high filter orders translate to a large number of these arithmetic units for its implementation. Research towards reducing the complexity of FIR filters has been going on for decades and the techniques used can be roughly divided into two categories; reduction in the number of multipliers and simplification of the multiplier implementation. 

    One technique to reduce the number of multipliers is to use cascaded sub-filters with lower complexity to achieve the desired specification, known as FRM. One of the sub-filters is a upsampled model filter whose band edges are an integer multiple, termed as the period L, of the target filter's band edges. Other sub-filters may include complement and masking filters which filter different parts of the spectrum to achieve the desired response. From an implementation point-of-view, time-multiplexing is beneficial because generally the allowable maximum clock frequency supported by the current state-of-the-art semiconductor technology does not correspond to the application bound sample rate. A combination of these two techniques plays a significant role towards efficient implementation of FIR filters. Part of the work presented in this dissertation is architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of the periodic model filters.

    These time-multiplexed FRM filters not only reduce the number of multipliers but lowers the memory usage. Although the FRM technique requires a higher number delay elements, it results in fewer memories and more energy efficient memory schemes when time-multiplexed. Different memory arrangements and memory access schemes have also been discussed and compared in terms of their efficiency when using both single and dual-port memories. An efficient pipelining scheme has been proposed which reduces the number of pipelining registers while achieving similar clock frequencies. The single optimal point where the number of multiplications is minimum for non-time-multiplexed FRM filters is shown to become a function of both the period, L and time-multiplexing factor, M. This means that the minimum number of multipliers does not always correspond to the minimum number of multiplications which also increases the flexibility of implementation. These filters are shown to achieve power reduction between 23% and 68% for the considered examples.

    To simplify the multiplier, alternate number systems like the LNS have been used to implement FIR filters, which reduces the multiplications to additions. FIR filters are realized by directly designing them using ILP in the LNS domain in the minimax sense using finite word length constraints. The branch and bound algorithm, a typical algorithm to implement ILP problems, is implemented based on LNS integers and several branching strategies are proposed and evaluated. The filter coefficients thus obtained are compared with the traditional finite word length coefficients obtained in the linear domain. It is shown that LNS FIR filters provide a better approximation  error compared to a standard FIR filter for a given coefficient word length.

    FIR filters also offer an opportunity in complexity reduction by implementing the multipliers using Booth or standard high-radix multiplication. Both of these multiplication schemes generate pre-computed multiples of the multiplicand which are then selected based on the encoded bits of the multiplier. In TDF FIR filters, one input data is multiplied with a number of coefficients and complexity can be reduced by sharing the pre-computation of the multiplies of the input data for all multiplications. Part of this work includes a systematic and unified approach to the design of such computation sharing multipliers and a comparison of the two forms of multiplication. It also gives closed form expressions for the cost of different parts of multiplication and gives an overview of various ways to implement the select unit with respect to the design of multiplexers.

    Particle filters are used to solve problems that require estimation of a system. Improved resampling schemes for reducing the latency of the resampling stage is proposed which uses a pre-fetch technique to reduce the latency between 50% to 95%  dependent on the number of pre-fetches. Generalized division-free architectures and compact memory structures are also proposed that map to different resampling algorithms and also help in reducing the complexity of the multinomial resampling algorithm and reduce the number of memories required by up to 50%.

    List of papers
    1. A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliers
    Open this publication in new window or tab >>A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliers
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    A unified approach to the design and implementation of computation sharing multiplier based on Booth and standard high-radix multiplication schemes is presented here. Both of these multiplication schemes have various building blocks and one of which is the pre-computer which can be shared across a number of multiplications if the multiplicand to the multipliers is same, like in a transposed direct form (TDF) finitelength impulse response (FIR) filter. Closed form expressions to estimate the cost of different building blocks based on different schemes have been developed and analyzed in different dimensions. Standalone multipliers and as part of computation sharing in FIR filters and complex multipliers have been realized in hardware and synthesized using standard cell library.

    It is shown that apart from word length and filter length, the ratio  between the cost of implementing adders and multiplexers has an effect on the choice of optimal radix. The higher the ratio, the lower is the cost of implementing multiplexers which will benefit high radix. Higher radix will also benefit from computation sharing if the cost of one multiplication for it is less than the lower radix and it is shown that radix-16 Booth multiplier achieves lower area complexity and power consumption by an average of 7% and 17%, respectively.

    Keywords
    Computation sharing multipliers, standard high-radix multiplier, Booth multiplier, FIR filter
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124194 (URN)
    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2016-02-02Bibliographically approved
    2. On the implementation of time-multiplexed frequency-response masking filters
    Open this publication in new window or tab >>On the implementation of time-multiplexed frequency-response masking filters
    2016 (English)In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed) Published
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

    Place, publisher, year, edition, pages
    Institute of Electrical and Electronics Engineers (IEEE), 2016
    Keywords
    Frequency-response masking, FIR filter, FPGA, ASIC, time-multiplexing, memories
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124190 (URN)10.1109/TSP.2016.2557298 (DOI)000379699800009 ()
    Note

    Vid tiden för disputation förelåg publikationen som manuskript

    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2017-11-30Bibliographically approved
    3. Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain
    Open this publication in new window or tab >>Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain
    2014 (English)In: VLSI design (Print), ISSN 1065-514X, E-ISSN 1563-5171, Vol. 2014, no 217495Article in journal (Refereed) Published
    Abstract [en]

    Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because ofmultiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directlyreplaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in theLNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branchingstrategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditionalfinite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturallyprovides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length propertiesof FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error comparedto a standard FIR filter.

    Place, publisher, year, edition, pages
    Egypt: Hindawi Publishing Corporation, 2014
    Keywords
    Logarithmic Number System, FIR Filter, Integer Linear Programming, Branch and Bound
    National Category
    Signal Processing
    Identifiers
    urn:nbn:se:liu:diva-105861 (URN)10.1155/2014/217495 (DOI)
    Available from: 2014-04-10 Created: 2014-04-10 Last updated: 2017-12-05Bibliographically approved
    4. Improved particle filter resampling architectures
    Open this publication in new window or tab >>Improved particle filter resampling architectures
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step which replicates particles with large weights and discards those with small weights because it has a high latency and can only be partially executed in parallel with the other steps of particle filtering. To reduce the latency, an improved resampling scheme is proposed in this work which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator. Architectures for realizing the pre-fetch technique are also proposed. The trade-off between the latency reduction achieved by increasing the size of the pre-fetch memory and the architectural implementation complexity has been analyzed. Results show that a pre-fetch of five achieves the best area-latency trade-off while on average achieving an 85% reduction in the latency.

    We also propose a generic double multiplier architecture for resampling which avoids normalization divisions and makes the architecture equally efficient for non-powers-of-two number of particles as well as removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. It is further improved by computing the cumulative sum of weights on-the-fly which helps in reducing the size of the weight memories by up to 50%.

    Keywords
    Particle filters, resampling algorithm, resampling architecture
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124193 (URN)
    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2016-02-02Bibliographically approved
  • 7.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliersManuscript (preprint) (Other academic)
    Abstract [en]

    A unified approach to the design and implementation of computation sharing multiplier based on Booth and standard high-radix multiplication schemes is presented here. Both of these multiplication schemes have various building blocks and one of which is the pre-computer which can be shared across a number of multiplications if the multiplicand to the multipliers is same, like in a transposed direct form (TDF) finitelength impulse response (FIR) filter. Closed form expressions to estimate the cost of different building blocks based on different schemes have been developed and analyzed in different dimensions. Standalone multipliers and as part of computation sharing in FIR filters and complex multipliers have been realized in hardware and synthesized using standard cell library.

    It is shown that apart from word length and filter length, the ratio  between the cost of implementing adders and multiplexers has an effect on the choice of optimal radix. The higher the ratio, the lower is the cost of implementing multiplexers which will benefit high radix. Higher radix will also benefit from computation sharing if the cost of one multiplication for it is less than the lower radix and it is shown that radix-16 Booth multiplier achieves lower area complexity and power consumption by an average of 7% and 17%, respectively.

  • 8.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Generalized Division-Free Architecture and Compact Memory Structure for Resampling in Particle Filters2015In: 2015 European Conference on Circuit Theory and Design (ECCTD), IEEE Press, 2015, p. 416-419Conference paper (Refereed)
    Abstract [en]

    The most challenging step of implementing particle filtering is the resampling step which replicates particles with large weights and discards those with small weights. In this paper, we propose a generic architecture for resampling which uses double multipliers to avoid normalization divisions and make the architecture  equally efficient for non-powers-of-two number of particles. Furthermore, the complexity of resampling is greatly affected by the size of memories used to store weights. We illustrate that by storing the original weights instead of their cumulative sum and calculating them online reduces the total complexity, in terms of area, ranging from 21% to 45%, while giving up to 50% reduction in memory usage.

  • 9.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved particle filter resampling architecturesManuscript (preprint) (Other academic)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step which replicates particles with large weights and discards those with small weights because it has a high latency and can only be partially executed in parallel with the other steps of particle filtering. To reduce the latency, an improved resampling scheme is proposed in this work which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator. Architectures for realizing the pre-fetch technique are also proposed. The trade-off between the latency reduction achieved by increasing the size of the pre-fetch memory and the architectural implementation complexity has been analyzed. Results show that a pre-fetch of five achieves the best area-latency trade-off while on average achieving an 85% reduction in the latency.

    We also propose a generic double multiplier architecture for resampling which avoids normalization divisions and makes the architecture equally efficient for non-powers-of-two number of particles as well as removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. It is further improved by computing the cumulative sum of weights on-the-fly which helps in reducing the size of the weight memories by up to 50%.

  • 10.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    On the implementation of time-multiplexed frequency-response masking filters2016In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed)
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

  • 11.
    Alexandersson, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Nordin, Olle
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Implementation of CAN Communication Stack in AUTOSAR2015Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    In the automotive industry today, embedded systems have reached a level of complexity which is not maintainable with the traditional approach of design- ing automotive embedded systems. For this purpose, many of the worlds leading automotive manufacturers have formed an alliance to apprehend this problem. This has resulted in AUTOSAR, an open standardized architecture for automotive embedded systems, which strives for increased flexibility and safety regulations. This thesis will explore the possibilities of implementing a CAN Communication stack using the AUTOSAR architecture and its corresponding methodology. As a result of this thesis, a complete AUTOSAR CAN communication stack has been implemented, as well has a simulator application with the purpose of testing its functionality. 

  • 12.
    Alexandersson, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Nordin, Olle
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Implementation of SLAM Algorithms in a Small-Scale Vehicle Using Model-Based Development2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    As autonomous driving is rapidly becoming the next major challenge in the auto- motive industry, the problem of Simultaneous Localization And Mapping (SLAM) has never been more relevant than it is today. This thesis presents the idea of examining SLAM algorithms by implementing such an algorithm on a radio con- trolled car which has been fitted with sensors and microcontrollers. The software architecture of this small-scale vehicle is based on the Robot Operating System (ROS), an open-source framework designed to be used in robotic applications.

    This thesis covers Extended Kalman Filter (EKF)-based SLAM, FastSLAM, and GraphSLAM, examining these algorithms in both theoretical investigations, simulations, and real-world experiments. The method used in this thesis is model- based development, meaning that a model of the vehicle is first implemented in order to be able to perform simulations using each algorithm. A decision of which algorithm to be implemented on the physical vehicle is then made backed up by these simulation results, as well as a theoretical investigation of each algorithm.

    This thesis has resulted in a dynamic model of a small-scale vehicle which can be used for simulation of any ROS-compliant SLAM-algorithm, and this model has been simulated extensively in order to provide empirical evidence to define which SLAM algorithm is most suitable for this application. Out of the algo- rithms examined, FastSLAM was proven to the best candidate, and was in the final stage, through usage of the ROS package gMapping, successfully imple- mented on the small-scale vehicle.

  • 13.
    Andersson Holmström, Simon
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Adaptive TDC: Implementation and Evaluation of an FPGA2015Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesis
    Abstract [en]

    Time to digital converter (TDC) is a digital unit that measures the time interval between two events.This is useful to determine the characteristics and patterns of a signal or an event. In this thesis ahybrid TDC is presented consisting of a tapped delay line and a clock counter principle.

    The TDC is used to measure the time between received data in a QKD application. If the measuredtime does not exceed a certain value then data had been sent without any interception. It is alsopossible to use TDCs in other fields such as laser-ranging and time-of-flight applications.

    The TDC consists of two carry chains, an encoder, a FIFO and a counter for each channel, anAXI-module and a control unit to generate command signals to all channels that are implemented.The time is measured by sampling the signal that has propagated through the carry chain and from thissample encode the propagation length.

    In this thesis a TDC is implemented that has a 10 ns dead time and a resolution below 28 psin a four channel mode. The propagation variation is approximately two percent of the total valueduring testing. For the implementation an FPGA-board with a Zynq XC7Z020 SoC is used withSystemVerilog that is a hardware describing language (HDL).

  • 14.
    Andersson, Niklas
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Vesterbacka, Mark
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oskar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Steady-state cycles in digital oscillators2014Manuscript (preprint) (Other academic)
    Abstract [en]

    Digital recursive oscillators locked in steady-state can be used to generate sinusoids with high spectral purity. The locking occurs when the oscillator returns to a previously visited state and repeats its sequence. In this work we propose a new search algorithm and two new search strategies to find all steady-states for a given oscillator configuration. The improvement in spurious-free dynamic range is between 7 and 40 dB compared to previously reported results. The algorithm is also able to find oscillator sequences for more frequencies than previously reported work. A key part of the method is the reduction of the search space made possible by a proposed extension of existing theory on recursive oscillators. Specific properties of digital oscillators in a steady-state are also discussed. It is shown that the initial states can be used to individually control the phase, amplitude, spectral purity, and also cycle length of the oscillator output.

  • 15.
    Andersson, Olof
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Bengtsson, Karl
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Adapting an FPGA-optimized  microprocessor to the MIPS32 instruction set2010Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Nowadays, FPGAs are large enough to host entire system-on-chip designs, wherein a soft core processor is often an integral part. High performance of the processor is always desirable, so there is an interest in finding faster solutions.This report aims to describe the work and results performed by Karl Bengtson and Olof Andersson at ISY. The task was to continue the development of a soft core microprocessor, originally created by Andreas Ehliar. The first step was to decide a more widely adopted instruction set for the processor. The choice fell upon the MIPS32 instruction set. The main work of the project has been focused on implementing support for MIPS32, allowing the processor to execute MIPS assembly language programs. The development has been done with speed optimization in mind. For every new function, the effects on the maximum frequency has been considered, and solutions not satisfying the speed requirements has been abandoned or revised.The performance has been measured by running a benchmark program—Coremark. Comparison has also been made to the main competitors among soft core processors. The results were positive, and reported a higher Coremark score than the other processors inthe study. The processor described herein still lacks many essential features. Nevertheless, the conclusion is that it may be possible to create a competitive alternative to established soft processors.

  • 16.
    Andreasson, Robert
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Design of an FPGA Based JTAG Recorder for use in Production of IPTV Set-Top Boxes2009Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This thesis evaluates the possibility to replace the manufacturer dependent JTAG device used in the production tests of IPTV set-top boxes for storing the boot loader in the main memory in order to start the box for the first time. An FPGA based prototype was built in order to see if it is possible to record the JTAG signals, to an external DDR SDRAM, without understanding them and be able to perform a delayed playback resulting in the same bahavoir as with the original JTAG device.Overall the thesis was succesful and it shows that it is infact feasible to create a JTAG recorder based on an FPGA. A lot of data is used for storing the sequence though so the use of a fast memory is cruicial. However in this thesis the speed of both the recording and the delayed playback was reduced in order to work properly.

  • 17.
    Asghar, Rizwan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Flexible Interleaving Sub–systems for FEC in Baseband Processors2010Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Interleaving is always used in combination with an error control coding. It spreads the burst noise, and changes the burst noise to white noise so that the noise induced bit errors can be corrected. With the advancement of communication systems and substantial increase in bandwidth requirements, use of coding for forward error correction (FEC) has become an integral part in the modern communication systems. Dividing the FEC sub-systems in two categories i.e. channel coding/de-coding and interleaving/de-interleaving, the later appears to be more varying in permutation functions, block sizes and throughput requirements. The interleaving/de-interleaving consumes more silicon due to the silicon cost of the permutation tables used in conventional LUT based approaches. For multi-standard support devices the silicon cost of the permutation tables can grow much higher resulting in an un-efficient solution. Therefore, the hardware re-use among different interleaver modules to support multimode processing platform is of significance.

    The broadness of the interleaving algorithms gives rise to many challenges when considering a true multimode interleaver implementation. The main challenges include real-time low latency computation for different permutation functions, managing wide range of interleaving block sizes, higher throughput, low cost, fast and dynamic reconfiguration for different standards, and introducing parallelism where ever necessary.

    It is difficult to merge all currently used interleavers to a singlearchitecture because of different algorithms and throughputs; however, thefact that multimode coverage does not require multiple interleavers to workat the same time, provides opportunities to use hardware multiplexing. The multimode functionality is then achieved by fast switching between differentstandards. We used the algorithmic level transformations such as 2-Dtransformation, and realization of recursive computations, which appear to bethe key to bring different interleaving functions to the same level. In general,the work focuses on function level hardware re-use, but it also utilizesclassical data-path level optimizations for efficient hardware multiplexingamong different standards.

    The research has resulted in multiple flexible architectures supporting multiple standards. These architectures target both channel interleaving and turbo-code interleaving. The presented architectures can support both types of communication systems i.e. single-stream and multi-stream systems. Introducing the algorithmic level transformations and then applying hardware re-use methodology has resulted in lower silicon cost while supporting sufficient throughput. According to the database searching in March 2010, we have the first multimode interleaver core covering WLAN (802.11a/b/g and 802.11n), WiMAX (802.16e), 3GPP-WCDMA, 3GPP-LTE, and DVB-T/H on a single architecture with minimum silicon cost. The research also provides the support for parallel interleaver address generation using different architectures. It provides the algorithmic modifications and architectures to generate up to 8 addresses in parallel and handle the memory conflicts on-the-fly.

    One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by the presented architectures with minimal cycle cost overheads. Fast switching between different standards gives luxury to the baseband processor to re-configure the

    interleaver architecture on-the-fly and re-use the same hardware for another standard. Lower silicon cost, maximum flexibility and fast switchability among multiple standards during run time make the proposed research a good choice for the radio baseband processing platforms.

  • 18.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    2-D Realization of WiMAX Channel Interleaver for Efficient Hardware Implementation2009In: Proceedings of World Academy of Science, Engineering and Technology (ISSN: 2070-3740), 2009, p. 25-29Conference paper (Refereed)
    Abstract [en]

    The direct implementation of interleaver functions in WiMAX is not hardware efficient due to presence of complex functions. Also the conventional method i.e. using memories for storing the permutation tables is silicon consuming. This work presents a 2-D transformation for WiMAX channel interleaver functions which reduces the overall hardware complexity to compute the interleaver addresses on the fly.  A fully re-configurable architecture for address generation in WiMAX channel interleaver is presented, which consume 1.1 k-gates in total. It can be configured for any block size and any modulation scheme in WiMAX. The presented architecture can run at a frequency of 200 MHz, thus fully supporting high bandwidth requirements for WiMAX.

  • 19.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Dual standard re-configurable hardware interleaver for turbo decoding2008Conference paper (Refereed)
    Abstract [en]

    A very low cost re-configurable hardwareinterleaver for two standards, 3GPP-WCMDA and 3GPPLong Term Evolution (3GPP-LTE) is presented. Theinterleaver is a key component of radio communicationsystems. Using conventional design methods, it consumes alarge part of silicon area in the design of turbo encoder anddecoder. The presented hardware interleaver addressgeneration architecture, utilizes the algorithmic levelhardware simplifications to achieve very low cost solution.After doing the hardware optimizations the proposedarchitecture consumes only 3.1k gates with a 256x8 bitmemory for the fully re-configurable dual standardinterleaver address generator. The interleaved address iscomputed every clock cycle except the case of pruning (ifblock size is less than the row-column matrix) in 3GPPWCDMA.In this case one additional clock cycle is consumedfor valid address generation.

  • 20.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low Complexity Hardware Interleaver for MIMO-OFDM based Wireless LAN2009In: Proceedings - IEEE International Symposium on Circuits and Systems, 2009, p. 1747-1750Conference paper (Refereed)
    Abstract [en]

    A low complexity hardware interleaver architecture is presented for MIMO-OFDM based Wireless LAN e.g. 802.11n. Novelty of the presented architecture is twofold; 1) Flexibility to choose interleaver implementation with different modulation scheme and different size for different spatial streams in a multi antenna system, 2) Complexity to compute on the fly interleaver address is reduce by using recursion and is supported by mathematical formulation. The proposed interleaver architecture is implemented on 65nm CMOS process and it consumes 0.035 mm2 area. The proposed architecture supports high speed communication with maximum throughput of 900 Mbps at a clock rate of 225 MHz.

  • 21.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low Complexity Multi Mode Interleaver Core for WiMax with Support for Convolutional Interleaving2009In: International Journal of Electronics, Communications and Computer Engineering, Vol. 1, no 1, p. 20-29Article in journal (Refereed)
    Abstract [en]

    A hardware efficient, multi mode, re-configurable architecture of interleaver/de-interleaver for multiple standards, like DVB, WiMAX and WLAN is presented. The interleavers consume a large part of silicon area when implemented by using conventional methods as they use memories to store permutation patterns. In addition, different types of interleavers in different standards cannot share the hardware due to different construction methodologies. The novelty of the work presented in this paper is threefold: 1) Mapping of vital types of interleavers including convolutional interleaver onto a single architecture with flexibility to change interleaver size; 2) Hardware complexity for channel interleaving in WiMAX is reduced by using 2-D realization of the interleaver functions; and 3) Silicon cost overheads reduced by avoiding the use of small memories. The proposed architecture consumes 0.18mm2 silicon area for 0.12μm process and can operate at a frequency of 140 MHz. The reduced complexity helps in minimizing the memory utilization, and at the same time provides strong support to on-the-fly computation of permutation patterns.

  • 22.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Multimode flex-interleaver core for baseband processor platform2010In: Journal of Computer Systems, Networks and Communications, ISSN 1687-7381, Vol. 2010, p. 1-16Article in journal (Refereed)
    Abstract [en]

    This paper presents a flexible interleaver architecture supportingmultiple standards likeWLAN,WiMAX, HSPA+, 3GPP-LTE, and DVB. Algorithmic level optimizations like 2D transformation and realization of recursive computation are applied, which appear to be the key to reach to an efficient hardware multiplexing among different interleaver implementations. The presented hardware enables the mapping of vital types of interleavers including multiple block interleavers and convolutional interleaver onto a single architecture. By exploiting the hardware reuse methodology the silicon cost is reduced, and it consumes 0.126mm2 area in total in 65nm CMOS process for a fully reconfigurable architecture. It can operate at a frequency of 166 MHz, providing a maximum throughput up to 664 Mbps for a multistream system and 166 Mbps for single stream communication systems, respectively. One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by this hardware with minimal cycle cost overheads. Maximum flexibility and fast switchability among multiple standards during run time makes the proposed architecture a right choice for the radio baseband processing platform.

  • 23.
    Asghar, Rizwan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Programmable Parallel Data-path for FEC2007In: Swedish System-on-Chip Conference, SSoCC,2007, 2007Conference paper (Other academic)
  • 24.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Towards Radix-4, Parallel Interleaver Design to Support High-Throughput Turbo Decoding for Re-Configurability2010Conference paper (Refereed)
    Abstract [en]

    Parallel, radix-4 turbo decoding is used to enhance the throughput and at the same time reduce the overall memory cost. The bottleneck is the higher complexity associated with radix-4 parallel interleaver implementation. This paper addresses the implementation issues of radix-4, parallel interleaver and also proposes necessary modifications in the interleaver algorithms for parallel address generation. It presents a re-configurable architecture which enables the use of same turbo decoding core to be used for multiple standards. The proposed interleaver architecture is capable of handling the memory conflicts on-the-fly. It consumes 12.5K gates and can run at a frequency of 285MHz, thus supporting a throughput of 173.3Mpbs, which can cover most of the emerging communication standards.

  • 25.
    Asghar, Rizwan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Very Low Cost Configurable Hardware Interleaver for 3G Turbo Decoding2008In: IEEE Internation Conference on Information and Communication Tech from Theory to Applications, ICTTA,2008, IEEE , 2008, p. 2314-2318Conference paper (Refereed)
    Abstract [en]

    A very low cost hardware interleaver for 3rd Generation Partnership Project (3GPP) turbo coding algorithm is presented. The interleaver is a key component of turbo codes and it is used to minimize the effect of burst errors in the transmission. Using conventional design methods, it consumes a large part of silicon area in the design of turbo encoder and decoder. The presented hardware interleaver architecture utilizes the algorithmic level hardware simplifications as well as the iterative modulo computation to achieve very low cost solution. After doing the hardware multiplexing and optimization the proposed architecture consumes only 1.5 k gates (without pre-computation) and 2.2 k gates (with pre-computation). In both cases the interleaved address is computed every clock cycle except the case of pruning, in which one additional clock cycle is consumed.

  • 26.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding2010In: Journal of Signal Processing Systems for Signal, Image, and Video Technology, ISSN 1939-8018, Vol. 60, no 1, p. 15-29Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used indifferent standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.

  • 27.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Memory Conflict Analysis and Interleaver Design for Parallel Turbo Decoding Supporting HSPA Evolution2009In: 12th EUROMICRO Conference on Digital System Design, 2009, p. 699-706Conference paper (Refereed)
    Abstract [en]

    HSPA evolution has raised the throughput requirements for WCDMA based systems where turbo code has been adapted to perform the error correction. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithm used in WCDMA based systems does not freely allows to use them due to high percentage of memory conflicts. This paper provides a comprehensive analysis for reduction of interleaver memory conflicts while generating more than one address in a single clock cycle. It also provides trade-off analysis in terms of area and power efficiency for multiple architectures for different functions involved in the interleaver design. The final architecture supports processing of two parallel SISO blocks and manages the conflicts by applying different approaches like stream misalignment, memory division and small FIFO buffer. The proposed architecture is low cost and consumes 4.3K gates at a frequency of 150MHz. This work also focuses on reduction of pre-processing overheads by introducing the segment based modulo computation, thus providing further relaxation to SISO decoding process.

  • 28.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Saeed, Ali
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Huang, Yulin
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support2012In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 66, no 1, p. 25-41Article in journal (Refereed)
    Abstract [en]

    This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm(2) area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm(2)/Mbps and energy efficiency of 0.55 nJ/b/iter.

  • 29.
    Ashrafi, Ashkan
    et al.
    San Diego State University.
    Strollo, Antonio G. M.
    University of Napoli Federico II.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Hardware implementation of digital signal processing algorithms2013In: Journal of Electrical and Computer Engineering, ISSN 2090-0147, E-ISSN 2090-0155, Vol. 2013, no 782575, p. 1-2Article in journal (Other academic)
  • 30.
    Berggren, Erik
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Testverktyg för JTAG Boundary Scan2017Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesis
    Abstract [sv]

    Ett projekt har genomförts i python för att läsa och analysera nätlistor från eCAD programmet Altium. Projektet är en prototyp till en mjukvara som färdigutvecklad ska kunna användas till att automatisera kontakttest på mönsterkort mha JTAG Boundary Scan. Projektet undersöker hur stor andel av kontaktbanorna på några godtyckligt valda mönsterkort som är tillgängliga för Boundary Scan test och finner att i snitt 39% av kontaktbanorna är observerbara.

  • 31.
    Bertilsson, Erik
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Massive MIMO is an emerging technology for future wireless systems that has received much attention from both academia and industry recently. The most prominent feature of Massive MIMO is that the base station is equiped with a large number of antennas. It is therefore important to create scalable architectures to enable simple deployment in different configurations.

    In this thesis, a distributed architecture for performing the baseband processing in a massive OFDM MU-MIMO system is proposed and analyzed. The proposed architecture is based on connecting several identical nodes in a K-ary tree. It is shown that, depending on the chosen algorithms, all or most computations can be performed in a distrbuted manner. Also, the computational load of each node does not depend on the number of nodes in the tree (except for some timing issues) which implies simple scalability of the system.

    It is shown that it should be enough that each node contains one or two complex multipliers and a few complex adders running at a couple of hundres MHz to support specifications similar to LTE. Additionally the nodes must communicate with each other over links with data rates in the order of some Gbps.

    Finally, a VHDL implementation of the system is proposed. The implementation is parameterized such that a system can be generated from a given specification.

  • 32.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing2016In: 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, Washington: IEEE COMPUTER SOC , 2016, p. 864-868Conference paper (Refereed)
    Abstract [en]

    Massive MIMO-systems have received considerable attention in recent years as an enabler in future wireless communication systems. As the idea is based on having a large number of antennas at the base station it is important to have both a scalable and distributed realization of such a system to ease deployment. Most work so far have focused on the theoretical aspects although a few demonstrators have been reported. In this work, we propose a base station architecture based on connecting the processing nodes in a K-ary tree, allowing simple scalability. Furthermore, it is shown that most of the processing can be performed locally in each node. Further analysis of the node processing shows that it should be enough that each node contains one or two complex multipliers and a few complex adders/subtracters operating at some hundred MHz. It is also shown that a communication link of some Gbps is required between the nodes, and, hence, it is fully feasible to have one or a few links between the nodes to cope with the communication requirements.

  • 33.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO2017In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, p. 466-469Conference paper (Refereed)
    Abstract [en]

    Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.

  • 34.
    Bhide, Priyanka
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Design and Evaluation of Aceelerometer Based Mobile Authentication Techniques2017Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Smartphones’ usages are growing rapidly. Smart phone usages are not limited to the receiving/calling or SMSing anymore. People use smartphone for online shopping, searching various information in the web, bank transactions, games, different applications for different usages etc. Anything is possible by just having a smartphone and the internet. The more usages of the smartphone also increase keeping more secrete information about the user in the phone. The popularity is increasing and so is different ways to steal/hack the phones. There are many areas which require further investigation in the field of smartphone security and authentication.

    This thesis work evaluates the scope of different inbuilt sensors in smartphones for mobile authentication based techniques. The Android Operating system was used in the implementation phase. Android OS has many open source library and Services which have been used for the sensor identification using Java Android platform.

    Two applications using Accelerometer sensor and one using Magnetometer sensor were developed. Two foremost objectives of this thesis work were-1) To figure it out the possibilities of sensor based authentication technique. 2) To check the end user perception/opinion about the applications.

    Usability testing was conducted to gather the user’s assessments/vision of the applications. Two methods which were used for usability testing are named Magical move and Tapping. Users (Most of them) have shown interest and inclination towards tapping application. Although, some users were also expressed inhibitions using both sensor based methods.

  • 35.
    Carlsson, Erik
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Synchronization of Distributed Units without Access to GPS2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Time synchronization between systems having no external reference can be an issue in small wireless node-based systems. In this thesis a transceiver is designed and implemented in two separate systems. Then the timing algorithm of "TwoWay Time Transfer" is then chosen to correct any timing error between the two free running clocks of the systems. In conclusion the results are compared towards having both systems get their timing based on GPS timing.

  • 36.
    Chen, Sau-Gee
    et al.
    National Chiao Tung University, Taiwan.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Jou, Shyh-Jye
    National Chiao Tung University, Taiwan.
    Continuous-flow Parallel Bit-Reversal Circuit for MDF and MDC FFT Architectures2014In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 61, no 10, p. 2869-2877Article in journal (Refereed)
    Abstract [en]

    This paper presents a bit reversal circuit for continuous-flow parallel pipelined FFT processors. In addition to two flexible commutators, the circuit consists of two memory groups, where each group has P memory banks. For the consideration of achieving both low delay time and area complexity, a novel write/read scheduling mechanism is devised, so that FFT outputs can be stored in those memory banks in an optimized way. The proposed scheduling mechanism can write the current successively generated FFT output data samples to the locations without any delay right after they are successively released by the previous symbol. Therefore, total memory space of only N data samples is enough for continuous-flow FFT operations. Since read operation is not overlapped with write operation during the entire period, only single-port memory is required, which leads to great area reduction. The proposed bit-reversal circuit architecture can generate natural-order FFT output and support variable power-of-2 FFT lengths.

  • 37.
    Davari, Mahdad
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Improving an FPGA Optimized Processor2011Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This work aims at improving an existing soft microprocessor core optimized for Xilinx Virtex®-4 FPGA. Instruction and data caches will be designed and implemented. Interrupt support will be added as well, preparing the microprocessor core to host operating systems. Thorough verification of the added modules is also emphasized in this work. Maintaining core clock frequency at its maximum has been the main concern through all the design and implementation steps.

  • 38.
    de Maris, Jay
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Multi-Function Automatic Wireless Irrigation System (MAWIS)2016Independent thesis Basic level (university diploma), 10,5 credits / 16 HE creditsStudent thesis
    Abstract [en]

    This project is designed in order create a system that is simple and highly functional for the purpose of maintaining the well-being of plant life through use of Internet of Things (IoT). This project will focus around the idea of a self-sustaining system using a microcontroller board with access via Wi-Fi communications and ability to use the photolytic sensors to recharge the systems power supply. This project is focused on small scale home gardening.

  • 39.
    Di, Wu
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Nilsson, Anders
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Tell, Eric
    Coresonic AB, Linköping.
    Alfredsson, Erik
    Coresonic AB, Linköping.
    System Architecture for 3GPP LTE Modem Using a Programmable Baseband Processo2009In: International Symposium on System-on-Chip (SoC 2009), 2009Conference paper (Refereed)
    Abstract [en]

    3G evolution towards HSPA and LTE is ongoing which will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing such as time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier demapping. The throughput and latency requirements of a Category 4 User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ. This brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution.

  • 40.
    Edman, Anders
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Christensen, J
    Emrich, A.
    Svensson, Christer
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronic Devices.
    A low-power 416-lag 1.5-b 0.5-TMAC correlator in 0.6um CMOS.2001In: IEEE Journal of Solid-State Circuits, ISSN 0018-9200, E-ISSN 1558-173X, Vol. 36, p. 258-265Article in journal (Refereed)
    Abstract [en]

    The autocorrelation spectrometer is an important instrument for radio astronomy. In satellite-based spectrometers, low power consumption is essential. The correlator chip presented in this paper reduces the power consumption more than five times compared to other full-custom designs. This has been achieved by reducing the number of clocked transistors, using a compact layout of cells, which reduces wire lengths, and using parallel processing of data. Also, the low power performance is combined with a large number of lags and a high data throughput. The correlator performs 0.5-TMAC operations in 416 lags at a sample rate of 1.28-GSample/s with an input data precision of 1.5-b and a correlation period of one second. The chip is also designed to reduce noise generation by using multiple internal clock phases.

  • 41.
    Edman, Anders
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Svensson, Christer
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Electronic Devices.
    Timing closure through a globally synchronous, timing partitioned design methodology.2004In: DAC,2004, New York: ACM, Inc. , 2004, p. 71-Conference paper (Refereed)
  • 42.
    Edman, Anders
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Svensson, Christer
    Linköping University, Department of Electrical Engineering, Electronic Devices. Linköping University, The Institute of Technology.
    Mesgarzadeh, Behzad
    Linköping University, Department of Electrical Engineering, Electronic Devices. Linköping University, The Institute of Technology.
    Synchronous Latency-Insensitive Design for Multiple Clock Domain2005In: Proceedings of the IEEE International System-on-Chip Conference (SoCC), IEEE Explore , 2005, p. 83-86Conference paper (Refereed)
    Abstract [en]

    Modern system-on-chip designs often require multiple clock frequencies. On the other hand, global interconnects suffer large delays. This paper proposes a method that manages these two problems within the framework of conventional synchronous design flow. The design is partitioned into isochronous blocks already at behavioral level, where each block is synchronous using a local clock. The local clock frequencies are assumed related by rational numbers. Communication between blocks is managed with FIFOs at each receiver, which manage different clock frequencies and hide unknown delays or clock skews. This method guarantees clock true implementation of a clock true behavioral description utilizing a predefined block-to-block latency.

  • 43.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Area Efficient Floating-Point Adder and Multiplier with IEEE-754 Compatible Semantics2014Conference paper (Refereed)
    Abstract [en]

    In this paper we describe an open source floating-point adder andmultiplier implemented using a 36-bit custom number format based onradix-16 and optimized for the 7-series FPGAs from Xilinx. Althoughthis number format is not identical to the single-precision IEEE-754format, the floating-point operators are designed in such a way thatthe numerical results for a given operation will be identical to theresult from an IEEE-754 compliant operator with support forround-to-nearest even, NaNs and Infs, and subnormalnumbers. The drawback of this number format is that the rounding stepis more involved than in a regular, radix-2 based operator. On theother hand, the use of a high radix means that the area costassociated with normalization and denormalization can be reduced,leading to a net area advantage for the custom number format, underthe assumption that support for subnormal numbers is required.

    The area of the floating-point adder in a Kintex-7 FPGA is 261 sliceLUTs and the area of the floating-point multiplier is 235 slice LUTsand 2 DSP48E blocks. The adder can operate at 319 MHz and themultiplier can operate at a frequency of 305 MHz.

  • 44.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Aspects of system-on-chip design for FPGAs2008Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    Due to the increasing NRE costs of recent ASICs, the use of FPGAs is expected to continue to increase. While the first FPGAs were limited devices useful mainly for glue logic, todays FPGAs are highly capable devices used in many different application areas including telecommunication, multimedia, medical, and automotive. This means that many VLSI designers need to deal with FPGAs, either as the primary target, or as a prototype platform. The design methodology for an ASIC and FPGA are similar, but if high performance is expected from the FPGA, it is necessary to take FPGA limitations related to memories, data path components, I/O, and routing delays into account early in the design cycle for both FPGA prototyping and FPGA products.

    This thesis investigates these limitations through three case studies of important VLSI building blocks. The thesis also discusses how a designer can gain additional information from the FPGA backend flow through custom tools and presents a framework for designing such tools.

    The first case study discusses the opportunities and problems when designing both the data path and control path components of a high speed processor in an FPGA. The resulting processor core is a RISC processor with some DSP extensions which has a clock frequency which is significantly higher than the Micro blaze processor which has been specifically developed for Xilinx FPGAs. This case study focuses on the tradeoffs which are necessary to reach this performance in an FPGA.

    The second case study describes how a floating point adder and multiplier can be optimized for FPGAs. This is a very important area as the use of floating point arithmetic can significantly reduce the design time of some applications. The solution presented in the thesis outperforms previous academic publications and has a performance similar to commercial offerings.

    The third case study presents a packet switched Network-on-Chip (NoC) architecture. While NoCs are not commonly used in FPGA designs today it is expected that they will become an important component in future FPGA designs, especially when prototyping large NoC based ASICs.

    Finally, a framework is presented which allows a designer to write custom backend tool by modifying Xilinx XDL files. While the framework is already useful for some tasks, the main reason for including it is to inspire both researchers and developers to look into this area by showing that it is actually quite easy to write such tools.

    List of papers
    1. High Performance, Low Latency FPGA based Floating Point Adder and Multiplier Units in a Virtex 4
    Open this publication in new window or tab >>High Performance, Low Latency FPGA based Floating Point Adder and Multiplier Units in a Virtex 4
    2006 (English)In: NORCHIP 2006: The Nordic Microelectronics Event. 2006, 2006, p. 31-34Conference paper, Published paper (Refereed)
    Abstract [en]

    Since the invention of FPGAs, the increase in their size and performance has allowed designers to use FPGAs for more complex designs. FPGAs are generally good at bit manipulations and fixed point arithmetics but has a harder time coping with floating point arithmetics. In this paper we describe methods used to construct high performance floating point components in a Virtex-4. We have constructed a floating point adder/subtracter and multiplier which we then used to construct a complex radix-2 butterfly. Our adder/subtracter can operate at a frequency of 361 MHz in a Virtex-4SX35 (speed grade -12)

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-100922 (URN)10.1109/NORCHP.2006.329238 (DOI)9781424407729 (ISBN)
    Conference
    24th Norchip Conference, 20-21 November 2006, Linkoping, Sweden.
    Available from: 2013-11-14 Created: 2013-11-14 Last updated: 2015-02-18
    2. An FPGA based Open Source Network-on-chip Architecture
    Open this publication in new window or tab >>An FPGA based Open Source Network-on-chip Architecture
    2007 (English)In: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, p. 800-803Conference paper, Published paper (Refereed)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

    Place, publisher, year, edition, pages
    IEEE, 2007
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16560 (URN)10.1109/FPL.2007.4380772 (DOI)978-1-4244-1060-6 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    3. Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    Open this publication in new window or tab >>Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    2007 (English)In: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Conference paper, Published paper (Refereed)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16561 (URN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    4. A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    Open this publication in new window or tab >>A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    2008 (English)In: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, p. 599-602Conference paper, Published paper (Refereed)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16562 (URN)10.1109/FPL.2008.4630018 (DOI)978-1-4244-1960-9 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
  • 45.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    EBRAM - Extending the BlockRAMs in FPGAs to support caches and hash tables inan efficient manner2012Conference paper (Refereed)
    Abstract [en]

    In this paper we discuss how a typical Block RAM in an FPGA can be extended to enable the implementation of more efficient caches in FPGAs with very minor modifications to the existing Block RAM architectures. In addition, the modifications also allow other components, such as hash tables, to be implemented more efficiently.

  • 46.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Optimizing Xilinx designs through primitive instantiation2010In: FPGAworld '10 Proceedings of the 7th FPGAworld Conference, New York: ACM , 2010, p. 20-27Conference paper (Refereed)
    Abstract [en]

    This paper is intended as a guideline for people who are interested in manual instantiation of FPGA primitives as a way of improving the performance of an FPGA design. The focus of the paper is on designs where slice primitives like flip-fops and lookup tables are instantiated. Guidelines on how to develop a design with manual instantiation are presented together with a case study of a high performance bitserial two's complement divider where a majority of the area is manually instantiated. This divider is capable of reaching a maximum frequency of 345 MHz in the fastest Virtex-4 while utilizing less than 150 LUTs thanks to the high amount of manual optimizations. An open source library containing modules intended to promote the structured development of modules with manually instantiated components is also presented.

  • 47.
    Ehliar, Andreas
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Performance driven FPGA design with an ASIC perspective2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    FPGA devices are an important component in many modern devices. This means that it is important that VLSI designers have a thorough knowledge of how to optimize designs for FPGAs. While the design flows for ASICs and FPGAs are similar, there are many differences as well due to the limitations inherent in FPGA devices. To be able to use an FPGA efficiently it is important to be aware of both the strengths and oweaknesses of FPGAs. If an FPGA design should be ported to an ASIC at a later stage it is also important to take this into account early in the design cycle so that the ASIC port will be efficient.

    This thesis investigates how to optimize a design for an FPGA through a number of case studies of important SoC components. One of these case studies discusses high speed processors and the tradeoffs that are necessary when constructing very high speed processors in FPGAs. The processor has a maximum clock frequency of 357~MHz in a Xilinx Virtex-4 devices of the fastest speedgrade, which is significantly higher than Xilinx' own processor in the same FPGA.

    Another case study investigates floating point datapaths and describes how a floating point adder and multiplier can be efficiently implemented in an FPGA.

    The final case study investigates Network-on-Chip architectures and how these can be optimized for FPGAs. The main focus is on packet switched architectures, but a circuit switched architecture optimized for FPGAs is also investigated.

    All of these case studies also contain information about potential pitfalls when porting designs optimized for an FPGA to an ASIC. The focus in this case is on systems where initial low volume production will be using FPGAs while still keeping the option open to port the design to an ASIC if the demand is high. This information will also be useful for designers who want to create IP cores that can be efficiently mapped to both FPGAs and ASICs.

    Finally, a framework is also presented which allows for the creation of custom backend tools for the Xilinx design flow. The framework is already useful for some tasks, but the main reason for including it is to inspire researchers and developers to use this powerful ability in their own design tools.

    List of papers
    1. Using low precision floating point numbers to reduce memory cost for MP3 decoding
    Open this publication in new window or tab >>Using low precision floating point numbers to reduce memory cost for MP3 decoding
    2004 (English)In: International Workshop on Multimedia Signal Processing, IEEE Xplore , 2004, p. 119-122Conference paper, Published paper (Refereed)
    Abstract [en]

    The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.

    Place, publisher, year, edition, pages
    IEEE Xplore, 2004
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16559 (URN)10.1109/MMSP.2004.1436435 (DOI)0-7803-8578-0 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    2. An FPGA based Open Source Network-on-chip Architecture
    Open this publication in new window or tab >>An FPGA based Open Source Network-on-chip Architecture
    2007 (English)In: 17th International Conference on Fileld Programmable Logic and Applications, FPL, Amsterdam, Holland, 2007, IEEE , 2007, p. 800-803Conference paper, Published paper (Refereed)
    Abstract [en]

    Networks on chip (NoC) has long been seen as a potential solution to the problems encountered when implementing large digital hardware designs. In this paper we describe an open source FPGA based NoC architecture with low area overhead, high throughput and low latency compared to other published works. The architecture has been optimized for Xilinx FPGAs and the NoC is capable of operating at a frequency of 260 MHz in a Virtex-4 FPGA. We have also developed a bridge so that generic Wishbone bus compatible IP blocks can be connected to the NoC.

    Place, publisher, year, edition, pages
    IEEE, 2007
    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16560 (URN)10.1109/FPL.2007.4380772 (DOI)978-1-4244-1060-6 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    3. Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    Open this publication in new window or tab >>Thinking outside the flow: Creating customized backend tools for Xilinx based designs
    2007 (English)In: 4th annual FPGAworld Conference, Stockholm, 2007, 2007Conference paper, Published paper (Refereed)
    Abstract [en]

    This paper is intended to serve as an introduction to how to build a customized backend tool for a Xilinx based design flow. A Python based library called PyXDL is presented which allows a user to manipulate XDL files which contain a placed and routed design. Three different tools are presented which uses this library, ranging from a simple resource utilization viewer to a tool which will insert a logic analyzer into an already routed design, thus avoiding a costly complete rerun of the place and route tool.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16561 (URN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    4. A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    Open this publication in new window or tab >>A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA
    2008 (English)In: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, p. 599-602Conference paper, Published paper (Refereed)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16562 (URN)10.1109/FPL.2008.4630018 (DOI)978-1-4244-1960-9 (ISBN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    5. High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 4
    Open this publication in new window or tab >>High performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 4
    2008 (English)In: IET Computers and digital techniques, ISSN 1751-8601, Vol. 2, p. 305-313Article in journal (Refereed) Published
    Abstract [en]

    There is increasing interest about floating-point arithmetics in field programmable gate arrays (FPGAs) because of the increase in their size and performance. FPGAs are generally good at bit manipulations and fixed-point arithmetics, but they have a harder time coping with floating-point arithmetics. An architecture used to construct high-performance floating-point components in a Virtex-4 FPGA is described in detail. Floating-point adder/subtracter and multiplier units have been constructed. The adder/subtracter can operate at a frequency of 377 MHz in a Virtex-4SX35 (speed grade -12).

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16563 (URN)10.1049/iet-cdt:20070075 (DOI)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
    6. An ASIC Perspective on High Performance FPGA Design
    Open this publication in new window or tab >>An ASIC Perspective on High Performance FPGA Design
    2009 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    In this paper we discuss how various design components perform in both FPGAs and standard cell based ASICs. We also investigate how various common FPGA optimizations will effect the performance and area of an ASIC port. We find that most techniques that are used to optimize a design for an FPGA will not have a negative impact on the area in an ASIC. The intended audience for this paper are engineers charged with creating designs or IP cores that are optimized for both FPGAs and ASICs.

    National Category
    Engineering and Technology
    Identifiers
    urn:nbn:se:liu:diva-16564 (URN)
    Available from: 2009-02-02 Created: 2009-02-02 Last updated: 2015-02-18Bibliographically approved
  • 48.
    Ehliar, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Karlström, Per
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    A High Performance Microprocessor with DSP Extensions Optimized for the Virtex-4 FPGA2008In: International Conference on Field Programmable Logic and Applications FLP 2008, Heidelberg, Germany, 2008, 2008, p. 599-602Conference paper (Refereed)
    Abstract [en]

    As the use of FPGAs increases, the importance of highly optimized processors for FPGAs will increase. In this paper we present the microarchitecture of a soft microprocessor core optimized for the Virtex-4 architecture. The core can operate at 357 MHz, which is significantly faster than Xilinxpsila Microblaze architecture on the same FPGA. At this frequency it is necessary to keep the logic complexity down and this paper shows how this can be done while retaining sufficient functionality for a high performance processor.

  • 49.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Eilert, Johan
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Comparison of Three FPGA Optimized NoC Architectures2007In: Swedish System-on-Chip Conference, SSoCC,2007, 2007Conference paper (Other academic)
  • 50.
    Ehliar, Andreas
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Network on Chip based gigabit Ethernet router implemented on an FPGA2006In: SSoCC Swedish System-on-Chip Conference,2006, 2006Conference paper (Other academic)
1234567 1 - 50 of 334
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf