liu.seSearch for publications in DiVA
Change search
Refine search result
1234567 1 - 50 of 431
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Acevedo, Miguel
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    FPGA-Based Hardware-In-the-Loop Co-Simulator Platform for SystemModeler2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This thesis proposes and implements a flexible platform to perform Hardware-In-the-Loop (HIL) co-simulation using a Field-Programmable-Gate-Array (FPGA). The HIL simulations are performed with SystemModeler working as a software simulator and the FPGA as the co-simulator platform for the digital hardware design. The work presented in this thesis consists of the creation of: A communication library in the host computer, a system in the FPGA that allows implementation of different digital designs with varying architectures, and an interface between the host computer and the FPGA to transmit the data. The efficiency of the proposed system is studied with the implementation of two common digital hardware designs, a PID controller and a filter. The results of the HIL simulations of those two hardware designs are used to verify the platform and measure the timing and area performance of the proposed HIL platform.

    Download full text (pdf)
    fulltext
  • 2.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wikner, J. Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    On Scaling and Output Cardinality of Multi-Bit Digital Error-Feedback Modulators2012Manuscript (preprint) (Other academic)
    Abstract [en]

    In order to determine a maximum allowed input scale for the stable operation of higher-order delta-sigma modulators, the designers largely depend on the analytical and numerical analysis. In this brief, the maximum allowed input scale to a multi-bit digital error-feedback  deltasigma modulator of arbitrary order is derived, mathematically. The digital modulator with an arbitrary output word length is stable if its output does not overflow. Thus, to avoid overflow of the modulator output, the relations between the peak values of the involved digital signals are devised. A number of example configurations are presented to illustrate the usefulness of the derivations.

  • 3.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Wikner, J. Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Digital Multi-bit Cascaded Error-Feedback ΔΣ Modulators With Reduced Hardware and Power Consumption2012Manuscript (preprint) (Other academic)
    Abstract [en]

    The hardware of the multi-bit digital error feedback modulator (EFM) of arbitrary order has recently been reduced by using multiple EFMs in cascade. In this paper, a modified cascading strategy is devised. Parts of the processing of consecutively placed EFM stages are merged such that a significant amount of circuitry is removed in each stage. In the proposed design, the modulated output is represented by a set of encoded signals to be used by the signal processing block placed after the EFM.

    To illustrate the savings, a number of configurations of fourth-order EFM designs, composed of two- and three-cascaded stages, have been synthesized in a 65 nm CMOS process technology using conventional and the proposed implementation techniques. Savings of 52.7% and 47%, in terms of area and power consumption, respectively, at an oversampling ratio of 4 could be obtain. The trade-off between sampling frequency and hardware cost is also presented. Due to reduced hardware an increase of up to 600 MHz in the sampling frequency is achieved.

  • 4.
    Afzal, Nadeem
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Reducing Complexity and Power of Digital Multibit Error-Feedback Delta Sigma Modulators2014In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 61, no 9, p. 641-645Article in journal (Refereed)
    Abstract [en]

    In this brief, we propose how the hardware complexity of arbitrary-order digital multibit error-feedback delta-sigma modulators can be reduced. This is achieved by splitting the combinatorial circuitry of the modulators into two parts, i.e., one producing the modulator output and another producing the error signal fed back. The part producing modulator output is removed by utilizing a unit-element-based digital-to-analog converter. To illustrate the reduced complexity and power consumption, we compare the synthesized results with those of conventional structures. Fourth-order modulators implemented with the proposed technique use up to 26% less area compared with conventional implementations. Due to the area reduction, the designs consume up to 33% less dynamic power. Furthermore, it can operate at a frequency 100 MHz higher than that of the conventional.

  • 5.
    Ahmed, Mohsin Niaz
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    LTE Uplink Modeling and Channel Estimation2011Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This master thesis investigates the uplink transmition from User Equipment (UE) to base station in LET (Long Term Evolution) and channel estimation using pilot symbols with parameter defined in 3GPP (3rd Generation Partnership Project) specifications. The purpose of the thesis was to implement a simulator which can generate uplink signal as it is generated by UE. The Third Generation (3G) mobile system was given the name LTE. This thesis focus on the uplink of LTE where single carrier frequency division multiple access (SC-FDMA) is utilized as a multiple access technique. The advantage over the orthogonal frequency division multiple access (OFDMA), which is used in downlink is to get better peak power characteristics. Because in uplink communication better peak power characteristic is necessary for better power efficiency in mobile terminals. To access the performance of uplink transmition realistic channel model for wireless communication system is essential. Channel models used are proposed by International Telecommunication Union (ITU) and the correct knowledge of these models is important for testing, optimization and performance improvements of signal processing algorithms. The channel estimation techniques used are Least Square (LS) and Least Minimum Mean Square Error (LMMSE) for different channel models. Performance of these algorithms has been measured in term of Bit Error Rate (BER) and Signal to Noise Ratio (SNR).

    Download full text (pdf)
    LTE
  • 6.
    Akif, Ahmed
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    FIR Filter Features on FPGA2018Independent thesis Basic level (university diploma), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Finite-length impulse response (FIR) filters are one of the most commonly used digital signal processing algorithms used nowadays where a FPGA is the device used to implement it. The continued development of the FPGA device through the insertion of dedicated blocks raised the need to study the advantages offered by different FPGA families. The work presented in this thesis study the special features offered by FPGAs for FIR filters and introduce a cost model of resource utilization. The used method consist of several stages including reading, classification of features and generating coefficients. The results show that FPGAs have common features but also specific differences in features as well as resource utilization. It has been shown that there is misconception when dealing with FPGAs when it comes to FIR filter as compared to ASICs.

    Download full text (pdf)
    fulltext
  • 7. Order onlineBuy this publication >>
    Alam, Syed Asad
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Techniques for Efficient Implementation of FIR and Particle Filtering2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    FIR filters occupy a central place many signal processing applications which either alter the shape, frequency or the sampling frequency of the signal. FIR filters are used because of their stability and possibility to have linear-phase but require a high filter order to achieve the same magnitude specifications as compared to IIR filters. Depending on the size of the required transition bandwidth the filter order can range from tens to hundreds to even thousands. Since the implementation of the filters in digital domain requires multipliers and adders, high filter orders translate to a large number of these arithmetic units for its implementation. Research towards reducing the complexity of FIR filters has been going on for decades and the techniques used can be roughly divided into two categories; reduction in the number of multipliers and simplification of the multiplier implementation. 

    One technique to reduce the number of multipliers is to use cascaded sub-filters with lower complexity to achieve the desired specification, known as FRM. One of the sub-filters is a upsampled model filter whose band edges are an integer multiple, termed as the period L, of the target filter's band edges. Other sub-filters may include complement and masking filters which filter different parts of the spectrum to achieve the desired response. From an implementation point-of-view, time-multiplexing is beneficial because generally the allowable maximum clock frequency supported by the current state-of-the-art semiconductor technology does not correspond to the application bound sample rate. A combination of these two techniques plays a significant role towards efficient implementation of FIR filters. Part of the work presented in this dissertation is architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of the periodic model filters.

    These time-multiplexed FRM filters not only reduce the number of multipliers but lowers the memory usage. Although the FRM technique requires a higher number delay elements, it results in fewer memories and more energy efficient memory schemes when time-multiplexed. Different memory arrangements and memory access schemes have also been discussed and compared in terms of their efficiency when using both single and dual-port memories. An efficient pipelining scheme has been proposed which reduces the number of pipelining registers while achieving similar clock frequencies. The single optimal point where the number of multiplications is minimum for non-time-multiplexed FRM filters is shown to become a function of both the period, L and time-multiplexing factor, M. This means that the minimum number of multipliers does not always correspond to the minimum number of multiplications which also increases the flexibility of implementation. These filters are shown to achieve power reduction between 23% and 68% for the considered examples.

    To simplify the multiplier, alternate number systems like the LNS have been used to implement FIR filters, which reduces the multiplications to additions. FIR filters are realized by directly designing them using ILP in the LNS domain in the minimax sense using finite word length constraints. The branch and bound algorithm, a typical algorithm to implement ILP problems, is implemented based on LNS integers and several branching strategies are proposed and evaluated. The filter coefficients thus obtained are compared with the traditional finite word length coefficients obtained in the linear domain. It is shown that LNS FIR filters provide a better approximation  error compared to a standard FIR filter for a given coefficient word length.

    FIR filters also offer an opportunity in complexity reduction by implementing the multipliers using Booth or standard high-radix multiplication. Both of these multiplication schemes generate pre-computed multiples of the multiplicand which are then selected based on the encoded bits of the multiplier. In TDF FIR filters, one input data is multiplied with a number of coefficients and complexity can be reduced by sharing the pre-computation of the multiplies of the input data for all multiplications. Part of this work includes a systematic and unified approach to the design of such computation sharing multipliers and a comparison of the two forms of multiplication. It also gives closed form expressions for the cost of different parts of multiplication and gives an overview of various ways to implement the select unit with respect to the design of multiplexers.

    Particle filters are used to solve problems that require estimation of a system. Improved resampling schemes for reducing the latency of the resampling stage is proposed which uses a pre-fetch technique to reduce the latency between 50% to 95%  dependent on the number of pre-fetches. Generalized division-free architectures and compact memory structures are also proposed that map to different resampling algorithms and also help in reducing the complexity of the multinomial resampling algorithm and reduce the number of memories required by up to 50%.

    List of papers
    1. A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliers
    Open this publication in new window or tab >>A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliers
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    A unified approach to the design and implementation of computation sharing multiplier based on Booth and standard high-radix multiplication schemes is presented here. Both of these multiplication schemes have various building blocks and one of which is the pre-computer which can be shared across a number of multiplications if the multiplicand to the multipliers is same, like in a transposed direct form (TDF) finitelength impulse response (FIR) filter. Closed form expressions to estimate the cost of different building blocks based on different schemes have been developed and analyzed in different dimensions. Standalone multipliers and as part of computation sharing in FIR filters and complex multipliers have been realized in hardware and synthesized using standard cell library.

    It is shown that apart from word length and filter length, the ratio  between the cost of implementing adders and multiplexers has an effect on the choice of optimal radix. The higher the ratio, the lower is the cost of implementing multiplexers which will benefit high radix. Higher radix will also benefit from computation sharing if the cost of one multiplication for it is less than the lower radix and it is shown that radix-16 Booth multiplier achieves lower area complexity and power consumption by an average of 7% and 17%, respectively.

    Keywords
    Computation sharing multipliers, standard high-radix multiplier, Booth multiplier, FIR filter
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124194 (URN)
    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2016-02-02Bibliographically approved
    2. On the implementation of time-multiplexed frequency-response masking filters
    Open this publication in new window or tab >>On the implementation of time-multiplexed frequency-response masking filters
    2016 (English)In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed) Published
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

    Place, publisher, year, edition, pages
    Institute of Electrical and Electronics Engineers (IEEE), 2016
    Keywords
    Frequency-response masking, FIR filter, FPGA, ASIC, time-multiplexing, memories
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124190 (URN)10.1109/TSP.2016.2557298 (DOI)000379699800009 ()
    Note

    Vid tiden för disputation förelåg publikationen som manuskript

    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2017-11-30Bibliographically approved
    3. Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain
    Open this publication in new window or tab >>Design of Finite Word Length Linear-Phase FIR Filters inthe Logarithmic Number System Domain
    2014 (English)In: VLSI design (Print), ISSN 1065-514X, E-ISSN 1563-5171, Vol. 2014, no 217495Article in journal (Refereed) Published
    Abstract [en]

    Logarithmic number system (LNS) is an attractive alternative to realize finite-length impulse response filters because ofmultiplication in the linear domain being only addition in the logarithmic domain. In the literature, linear coefficients are directlyreplaced by the logarithmic equivalent. In this paper, an approach to directly optimize the finite word length coefficients in theLNS domain is proposed. This branch and bound algorithm is implemented based on LNS integers and several different branchingstrategies are proposed and evaluated. Optimal coefficients in the minimax sense are obtained and compared with the traditionalfinite word length representation in the linear domain as well as using rounding. Results show that the proposed method naturallyprovides smaller approximation error compared to rounding. Furthermore, they provide insights into finite word length propertiesof FIR filters coefficients in the LNS domain and show that LNS FIR filters typically provide a better approximation error comparedto a standard FIR filter.

    Place, publisher, year, edition, pages
    Egypt: Hindawi Publishing Corporation, 2014
    Keywords
    Logarithmic Number System, FIR Filter, Integer Linear Programming, Branch and Bound
    National Category
    Signal Processing
    Identifiers
    urn:nbn:se:liu:diva-105861 (URN)10.1155/2014/217495 (DOI)
    Available from: 2014-04-10 Created: 2014-04-10 Last updated: 2017-12-05Bibliographically approved
    4. Improved particle filter resampling architectures
    Open this publication in new window or tab >>Improved particle filter resampling architectures
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step which replicates particles with large weights and discards those with small weights because it has a high latency and can only be partially executed in parallel with the other steps of particle filtering. To reduce the latency, an improved resampling scheme is proposed in this work which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator. Architectures for realizing the pre-fetch technique are also proposed. The trade-off between the latency reduction achieved by increasing the size of the pre-fetch memory and the architectural implementation complexity has been analyzed. Results show that a pre-fetch of five achieves the best area-latency trade-off while on average achieving an 85% reduction in the latency.

    We also propose a generic double multiplier architecture for resampling which avoids normalization divisions and makes the architecture equally efficient for non-powers-of-two number of particles as well as removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. It is further improved by computing the cumulative sum of weights on-the-fly which helps in reducing the size of the weight memories by up to 50%.

    Keywords
    Particle filters, resampling algorithm, resampling architecture
    National Category
    Electrical Engineering, Electronic Engineering, Information Engineering
    Identifiers
    urn:nbn:se:liu:diva-124193 (URN)
    Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2016-02-02Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (pdf)
    Errata
    Download (pdf)
    omslag
    Download (jpg)
    presentationsbild
  • 8.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A unified approach to the design and implementation of computation sharing multipliers: Computation sharing multipliersManuscript (preprint) (Other academic)
    Abstract [en]

    A unified approach to the design and implementation of computation sharing multiplier based on Booth and standard high-radix multiplication schemes is presented here. Both of these multiplication schemes have various building blocks and one of which is the pre-computer which can be shared across a number of multiplications if the multiplicand to the multipliers is same, like in a transposed direct form (TDF) finitelength impulse response (FIR) filter. Closed form expressions to estimate the cost of different building blocks based on different schemes have been developed and analyzed in different dimensions. Standalone multipliers and as part of computation sharing in FIR filters and complex multipliers have been realized in hardware and synthesized using standard cell library.

    It is shown that apart from word length and filter length, the ratio  between the cost of implementing adders and multiplexers has an effect on the choice of optimal radix. The higher the ratio, the lower is the cost of implementing multiplexers which will benefit high radix. Higher radix will also benefit from computation sharing if the cost of one multiplication for it is less than the lower radix and it is shown that radix-16 Booth multiplier achieves lower area complexity and power consumption by an average of 7% and 17%, respectively.

  • 9.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Generalized Division-Free Architecture and Compact Memory Structure for Resampling in Particle Filters2015In: 2015 European Conference on Circuit Theory and Design (ECCTD), IEEE Press, 2015, p. 416-419Conference paper (Refereed)
    Abstract [en]

    The most challenging step of implementing particle filtering is the resampling step which replicates particles with large weights and discards those with small weights. In this paper, we propose a generic architecture for resampling which uses double multipliers to avoid normalization divisions and make the architecture  equally efficient for non-powers-of-two number of particles. Furthermore, the complexity of resampling is greatly affected by the size of memories used to store weights. We illustrate that by storing the original weights instead of their cumulative sum and calculating them online reduces the total complexity, in terms of area, ranging from 21% to 45%, while giving up to 50% reduction in memory usage.

  • 10.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved particle filter resampling architecturesManuscript (preprint) (Other academic)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step which replicates particles with large weights and discards those with small weights because it has a high latency and can only be partially executed in parallel with the other steps of particle filtering. To reduce the latency, an improved resampling scheme is proposed in this work which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator. Architectures for realizing the pre-fetch technique are also proposed. The trade-off between the latency reduction achieved by increasing the size of the pre-fetch memory and the architectural implementation complexity has been analyzed. Results show that a pre-fetch of five achieves the best area-latency trade-off while on average achieving an 85% reduction in the latency.

    We also propose a generic double multiplier architecture for resampling which avoids normalization divisions and makes the architecture equally efficient for non-powers-of-two number of particles as well as removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. It is further improved by computing the cumulative sum of weights on-the-fly which helps in reducing the size of the weight memories by up to 50%.

  • 11.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering. Namal Inst, Pakistan.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved Particle Filter Resampling Architectures2020In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 92, no 6, p. 555-568Article in journal (Refereed)
    Abstract [en]

    The most challenging aspect of particle filtering hardware implementation is the resampling step. This is because of high latency as it can be only partially executed in parallel with the other steps of particle filtering and has no inherent parallelism inside it. To reduce the latency, an improved resampling architecture is proposed which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator along with architectures for realizing the pre-fetch technique. This enables a particle filter using M particles with otherwise streaming operation to get new inputs more often than 2M cycles as the previously best approach gives. Results show that a pre-fetch buffer of five values achieves the best area-latency reduction trade-off while on average achieving an 85% reduction in latency for the resampling step leading to a sample time reduction of more than 40%. We also propose a generic division-free architecture for the resampling steps. It also removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. In addition, on-the-fly computation of the cumulative sum of weights is proposed which helps reduce the word length of the particle weight memory. FPGA implementation results show that the memory size is reduced by up to 50%.

    Download full text (pdf)
    fulltext
  • 12.
    Alam, Syed Asad
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    On the implementation of time-multiplexed frequency-response masking filters2016In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 64, no 15, p. 3933-3944Article in journal (Refereed)
    Abstract [en]

    The complexity of narrow transition band finite-length impulse response (FIR) filters is high and can be reduced by using frequency-response masking (FRM) techniques. These techniques use a combination of periodic model and, possibly periodic, masking filters. Time-multiplexing is in general beneficial since only rarely does the technology bound maximum obtainable clock frequency and the application determined required sample rate correspond. Therefore, architectures for time-multiplexed FRM filters that benefit from the inherent sparsity of theperiodic filters are introduced in this work.

    We show that FRM filters not only reduces the number of multipliers needed, but also have benefits in terms of memory usage. Despite the total amount of samples to be stored is larger for FRM, it results in fewer memory resources needed in FPGAs and more energy efficient memory schemes in ASICs. In total, the power consumption is significantly reduced compared to a single stage implementation. Furthermore, we show that the choice of the interpolation factor which gives the least complexity for the periodic model filter and subsequent masking filter(s) is a function of the time-multiplexing factor, meaning that the minimum number of multipliers not always correspond to the minimum number of multiplications. Both single-port and dual-port memories are considered and the involved trade-off in number of multipliers and memory complexity is illustrated. The results show that for FPGA implementation, the power reduction ranges from 23% to 68% for the considered examples.

  • 13.
    Alexandersson, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Nordin, Olle
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Implementation of CAN Communication Stack in AUTOSAR2015Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    In the automotive industry today, embedded systems have reached a level of complexity which is not maintainable with the traditional approach of design- ing automotive embedded systems. For this purpose, many of the worlds leading automotive manufacturers have formed an alliance to apprehend this problem. This has resulted in AUTOSAR, an open standardized architecture for automotive embedded systems, which strives for increased flexibility and safety regulations. This thesis will explore the possibilities of implementing a CAN Communication stack using the AUTOSAR architecture and its corresponding methodology. As a result of this thesis, a complete AUTOSAR CAN communication stack has been implemented, as well has a simulator application with the purpose of testing its functionality. 

    Download full text (pdf)
    fulltext
  • 14.
    Alexandersson, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Nordin, Olle
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Implementation of SLAM Algorithms in a Small-Scale Vehicle Using Model-Based Development2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    As autonomous driving is rapidly becoming the next major challenge in the auto- motive industry, the problem of Simultaneous Localization And Mapping (SLAM) has never been more relevant than it is today. This thesis presents the idea of examining SLAM algorithms by implementing such an algorithm on a radio con- trolled car which has been fitted with sensors and microcontrollers. The software architecture of this small-scale vehicle is based on the Robot Operating System (ROS), an open-source framework designed to be used in robotic applications.

    This thesis covers Extended Kalman Filter (EKF)-based SLAM, FastSLAM, and GraphSLAM, examining these algorithms in both theoretical investigations, simulations, and real-world experiments. The method used in this thesis is model- based development, meaning that a model of the vehicle is first implemented in order to be able to perform simulations using each algorithm. A decision of which algorithm to be implemented on the physical vehicle is then made backed up by these simulation results, as well as a theoretical investigation of each algorithm.

    This thesis has resulted in a dynamic model of a small-scale vehicle which can be used for simulation of any ROS-compliant SLAM-algorithm, and this model has been simulated extensively in order to provide empirical evidence to define which SLAM algorithm is most suitable for this application. Out of the algo- rithms examined, FastSLAM was proven to the best candidate, and was in the final stage, through usage of the ROS package gMapping, successfully imple- mented on the small-scale vehicle.

    Download full text (pdf)
    fulltext
  • 15.
    Andersson Holmström, Simon
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Adaptive TDC: Implementation and Evaluation of an FPGA2015Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesis
    Abstract [en]

    Time to digital converter (TDC) is a digital unit that measures the time interval between two events.This is useful to determine the characteristics and patterns of a signal or an event. In this thesis ahybrid TDC is presented consisting of a tapped delay line and a clock counter principle.

    The TDC is used to measure the time between received data in a QKD application. If the measuredtime does not exceed a certain value then data had been sent without any interception. It is alsopossible to use TDCs in other fields such as laser-ranging and time-of-flight applications.

    The TDC consists of two carry chains, an encoder, a FIFO and a counter for each channel, anAXI-module and a control unit to generate command signals to all channels that are implemented.The time is measured by sampling the signal that has propagated through the carry chain and from thissample encode the propagation length.

    In this thesis a TDC is implemented that has a 10 ns dead time and a resolution below 28 psin a four channel mode. The propagation variation is approximately two percent of the total valueduring testing. For the implementation an FPGA-board with a Zynq XC7Z020 SoC is used withSystemVerilog that is a hardware describing language (HDL).

    Download full text (pdf)
    fulltext
  • 16.
    Andersson, Niklas
    et al.
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Vesterbacka, Mark
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Gustafsson, Oskar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wikner, Jacob
    Linköping University, Department of Electrical Engineering, Electronics System. Linköping University, The Institute of Technology.
    Steady-state cycles in digital oscillators2014Manuscript (preprint) (Other academic)
    Abstract [en]

    Digital recursive oscillators locked in steady-state can be used to generate sinusoids with high spectral purity. The locking occurs when the oscillator returns to a previously visited state and repeats its sequence. In this work we propose a new search algorithm and two new search strategies to find all steady-states for a given oscillator configuration. The improvement in spurious-free dynamic range is between 7 and 40 dB compared to previously reported results. The algorithm is also able to find oscillator sequences for more frequencies than previously reported work. A key part of the method is the reduction of the search space made possible by a proposed extension of existing theory on recursive oscillators. Specific properties of digital oscillators in a steady-state are also discussed. It is shown that the initial states can be used to individually control the phase, amplitude, spectral purity, and also cycle length of the oscillator output.

  • 17.
    Andersson, Olof
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Bengtsson, Karl
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Adapting an FPGA-optimized  microprocessor to the MIPS32 instruction set2010Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Nowadays, FPGAs are large enough to host entire system-on-chip designs, wherein a soft core processor is often an integral part. High performance of the processor is always desirable, so there is an interest in finding faster solutions.This report aims to describe the work and results performed by Karl Bengtson and Olof Andersson at ISY. The task was to continue the development of a soft core microprocessor, originally created by Andreas Ehliar. The first step was to decide a more widely adopted instruction set for the processor. The choice fell upon the MIPS32 instruction set. The main work of the project has been focused on implementing support for MIPS32, allowing the processor to execute MIPS assembly language programs. The development has been done with speed optimization in mind. For every new function, the effects on the maximum frequency has been considered, and solutions not satisfying the speed requirements has been abandoned or revised.The performance has been measured by running a benchmark program—Coremark. Comparison has also been made to the main competitors among soft core processors. The results were positive, and reported a higher Coremark score than the other processors inthe study. The processor described herein still lacks many essential features. Nevertheless, the conclusion is that it may be possible to create a competitive alternative to established soft processors.

    Download full text (pdf)
    FULLTEXT01
  • 18.
    Andreasson, Robert
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Design of an FPGA Based JTAG Recorder for use in Production of IPTV Set-Top Boxes2009Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This thesis evaluates the possibility to replace the manufacturer dependent JTAG device used in the production tests of IPTV set-top boxes for storing the boot loader in the main memory in order to start the box for the first time. An FPGA based prototype was built in order to see if it is possible to record the JTAG signals, to an external DDR SDRAM, without understanding them and be able to perform a delayed playback resulting in the same bahavoir as with the original JTAG device.Overall the thesis was succesful and it shows that it is infact feasible to create a JTAG recorder based on an FPGA. A lot of data is used for storing the sequence though so the use of a fast memory is cruicial. However in this thesis the speed of both the recording and the delayed playback was reduced in order to work properly.

    Download full text (pdf)
    FULLTEXT01
  • 19. Order onlineBuy this publication >>
    Asghar, Rizwan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Flexible Interleaving Sub–systems for FEC in Baseband Processors2010Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Interleaving is always used in combination with an error control coding. It spreads the burst noise, and changes the burst noise to white noise so that the noise induced bit errors can be corrected. With the advancement of communication systems and substantial increase in bandwidth requirements, use of coding for forward error correction (FEC) has become an integral part in the modern communication systems. Dividing the FEC sub-systems in two categories i.e. channel coding/de-coding and interleaving/de-interleaving, the later appears to be more varying in permutation functions, block sizes and throughput requirements. The interleaving/de-interleaving consumes more silicon due to the silicon cost of the permutation tables used in conventional LUT based approaches. For multi-standard support devices the silicon cost of the permutation tables can grow much higher resulting in an un-efficient solution. Therefore, the hardware re-use among different interleaver modules to support multimode processing platform is of significance.

    The broadness of the interleaving algorithms gives rise to many challenges when considering a true multimode interleaver implementation. The main challenges include real-time low latency computation for different permutation functions, managing wide range of interleaving block sizes, higher throughput, low cost, fast and dynamic reconfiguration for different standards, and introducing parallelism where ever necessary.

    It is difficult to merge all currently used interleavers to a singlearchitecture because of different algorithms and throughputs; however, thefact that multimode coverage does not require multiple interleavers to workat the same time, provides opportunities to use hardware multiplexing. The multimode functionality is then achieved by fast switching between differentstandards. We used the algorithmic level transformations such as 2-Dtransformation, and realization of recursive computations, which appear to bethe key to bring different interleaving functions to the same level. In general,the work focuses on function level hardware re-use, but it also utilizesclassical data-path level optimizations for efficient hardware multiplexingamong different standards.

    The research has resulted in multiple flexible architectures supporting multiple standards. These architectures target both channel interleaving and turbo-code interleaving. The presented architectures can support both types of communication systems i.e. single-stream and multi-stream systems. Introducing the algorithmic level transformations and then applying hardware re-use methodology has resulted in lower silicon cost while supporting sufficient throughput. According to the database searching in March 2010, we have the first multimode interleaver core covering WLAN (802.11a/b/g and 802.11n), WiMAX (802.16e), 3GPP-WCDMA, 3GPP-LTE, and DVB-T/H on a single architecture with minimum silicon cost. The research also provides the support for parallel interleaver address generation using different architectures. It provides the algorithmic modifications and architectures to generate up to 8 addresses in parallel and handle the memory conflicts on-the-fly.

    One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by the presented architectures with minimal cycle cost overheads. Fast switching between different standards gives luxury to the baseband processor to re-configure the

    interleaver architecture on-the-fly and re-use the same hardware for another standard. Lower silicon cost, maximum flexibility and fast switchability among multiple standards during run time make the proposed research a good choice for the radio baseband processing platforms.

    Download full text (pdf)
    Flexible Interleaving Sub–systems for FEC in Baseband Processors
    Download (pdf)
    Cover
  • 20.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    2-D Realization of WiMAX Channel Interleaver for Efficient Hardware Implementation2009In: Proceedings of World Academy of Science, Engineering and Technology (ISSN: 2070-3740), 2009, p. 25-29Conference paper (Refereed)
    Abstract [en]

    The direct implementation of interleaver functions in WiMAX is not hardware efficient due to presence of complex functions. Also the conventional method i.e. using memories for storing the permutation tables is silicon consuming. This work presents a 2-D transformation for WiMAX channel interleaver functions which reduces the overall hardware complexity to compute the interleaver addresses on the fly.  A fully re-configurable architecture for address generation in WiMAX channel interleaver is presented, which consume 1.1 k-gates in total. It can be configured for any block size and any modulation scheme in WiMAX. The presented architecture can run at a frequency of 200 MHz, thus fully supporting high bandwidth requirements for WiMAX.

  • 21.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Dual standard re-configurable hardware interleaver for turbo decoding2008Conference paper (Refereed)
    Abstract [en]

    A very low cost re-configurable hardwareinterleaver for two standards, 3GPP-WCMDA and 3GPPLong Term Evolution (3GPP-LTE) is presented. Theinterleaver is a key component of radio communicationsystems. Using conventional design methods, it consumes alarge part of silicon area in the design of turbo encoder anddecoder. The presented hardware interleaver addressgeneration architecture, utilizes the algorithmic levelhardware simplifications to achieve very low cost solution.After doing the hardware optimizations the proposedarchitecture consumes only 3.1k gates with a 256x8 bitmemory for the fully re-configurable dual standardinterleaver address generator. The interleaved address iscomputed every clock cycle except the case of pruning (ifblock size is less than the row-column matrix) in 3GPPWCDMA.In this case one additional clock cycle is consumedfor valid address generation.

  • 22.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low Complexity Hardware Interleaver for MIMO-OFDM based Wireless LAN2009In: Proceedings - IEEE International Symposium on Circuits and Systems, 2009, p. 1747-1750Conference paper (Refereed)
    Abstract [en]

    A low complexity hardware interleaver architecture is presented for MIMO-OFDM based Wireless LAN e.g. 802.11n. Novelty of the presented architecture is twofold; 1) Flexibility to choose interleaver implementation with different modulation scheme and different size for different spatial streams in a multi antenna system, 2) Complexity to compute on the fly interleaver address is reduce by using recursion and is supported by mathematical formulation. The proposed interleaver architecture is implemented on 65nm CMOS process and it consumes 0.035 mm2 area. The proposed architecture supports high speed communication with maximum throughput of 900 Mbps at a clock rate of 225 MHz.

  • 23.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Low Complexity Multi Mode Interleaver Core for WiMax with Support for Convolutional Interleaving2009In: International Journal of Electronics, Communications and Computer Engineering, Vol. 1, no 1, p. 20-29Article in journal (Refereed)
    Abstract [en]

    A hardware efficient, multi mode, re-configurable architecture of interleaver/de-interleaver for multiple standards, like DVB, WiMAX and WLAN is presented. The interleavers consume a large part of silicon area when implemented by using conventional methods as they use memories to store permutation patterns. In addition, different types of interleavers in different standards cannot share the hardware due to different construction methodologies. The novelty of the work presented in this paper is threefold: 1) Mapping of vital types of interleavers including convolutional interleaver onto a single architecture with flexibility to change interleaver size; 2) Hardware complexity for channel interleaving in WiMAX is reduced by using 2-D realization of the interleaver functions; and 3) Silicon cost overheads reduced by avoiding the use of small memories. The proposed architecture consumes 0.18mm2 silicon area for 0.12μm process and can operate at a frequency of 140 MHz. The reduced complexity helps in minimizing the memory utilization, and at the same time provides strong support to on-the-fly computation of permutation patterns.

  • 24.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Multimode flex-interleaver core for baseband processor platform2010In: Journal of Computer Systems, Networks and Communications, ISSN 1687-7381, Vol. 2010, p. 1-16Article in journal (Refereed)
    Abstract [en]

    This paper presents a flexible interleaver architecture supportingmultiple standards likeWLAN,WiMAX, HSPA+, 3GPP-LTE, and DVB. Algorithmic level optimizations like 2D transformation and realization of recursive computation are applied, which appear to be the key to reach to an efficient hardware multiplexing among different interleaver implementations. The presented hardware enables the mapping of vital types of interleavers including multiple block interleavers and convolutional interleaver onto a single architecture. By exploiting the hardware reuse methodology the silicon cost is reduced, and it consumes 0.126mm2 area in total in 65nm CMOS process for a fully reconfigurable architecture. It can operate at a frequency of 166 MHz, providing a maximum throughput up to 664 Mbps for a multistream system and 166 Mbps for single stream communication systems, respectively. One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by this hardware with minimal cycle cost overheads. Maximum flexibility and fast switchability among multiple standards during run time makes the proposed architecture a right choice for the radio baseband processing platform.

    Download full text (pdf)
    fulltext
  • 25.
    Asghar, Rizwan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Programmable Parallel Data-path for FEC2007In: Swedish System-on-Chip Conference, SSoCC,2007, 2007Conference paper (Other academic)
  • 26.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Towards Radix-4, Parallel Interleaver Design to Support High-Throughput Turbo Decoding for Re-Configurability2010Conference paper (Refereed)
    Abstract [en]

    Parallel, radix-4 turbo decoding is used to enhance the throughput and at the same time reduce the overall memory cost. The bottleneck is the higher complexity associated with radix-4 parallel interleaver implementation. This paper addresses the implementation issues of radix-4, parallel interleaver and also proposes necessary modifications in the interleaver algorithms for parallel address generation. It presents a re-configurable architecture which enables the use of same turbo decoding core to be used for multiple standards. The proposed interleaver architecture is capable of handling the memory conflicts on-the-fly. It consumes 12.5K gates and can run at a frequency of 285MHz, thus supporting a throughput of 173.3Mpbs, which can cover most of the emerging communication standards.

  • 27.
    Asghar, Rizwan
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Liu, Dake
    Linköping University, The Institute of Technology. Linköping University, Department of Electrical Engineering, Computer Engineering.
    Very Low Cost Configurable Hardware Interleaver for 3G Turbo Decoding2008In: IEEE Internation Conference on Information and Communication Tech from Theory to Applications, ICTTA,2008, IEEE , 2008, p. 2314-2318Conference paper (Refereed)
    Abstract [en]

    A very low cost hardware interleaver for 3rd Generation Partnership Project (3GPP) turbo coding algorithm is presented. The interleaver is a key component of turbo codes and it is used to minimize the effect of burst errors in the transmission. Using conventional design methods, it consumes a large part of silicon area in the design of turbo encoder and decoder. The presented hardware interleaver architecture utilizes the algorithmic level hardware simplifications as well as the iterative modulo computation to achieve very low cost solution. After doing the hardware multiplexing and optimization the proposed architecture consumes only 1.5 k gates (without pre-computation) and 2.2 k gates (with pre-computation). In both cases the interleaved address is computed every clock cycle except the case of pruning, in which one additional clock cycle is consumed.

  • 28.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding2010In: Journal of Signal Processing Systems for Signal, Image, and Video Technology, ISSN 1939-8018, Vol. 60, no 1, p. 15-29Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used indifferent standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.

    Download full text (pdf)
    FULLTEXT01
  • 29.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Eilert, Johan
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Memory Conflict Analysis and Interleaver Design for Parallel Turbo Decoding Supporting HSPA Evolution2009In: 12th EUROMICRO Conference on Digital System Design, 2009, p. 699-706Conference paper (Refereed)
    Abstract [en]

    HSPA evolution has raised the throughput requirements for WCDMA based systems where turbo code has been adapted to perform the error correction. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithm used in WCDMA based systems does not freely allows to use them due to high percentage of memory conflicts. This paper provides a comprehensive analysis for reduction of interleaver memory conflicts while generating more than one address in a single clock cycle. It also provides trade-off analysis in terms of area and power efficiency for multiple architectures for different functions involved in the interleaver design. The final architecture supports processing of two parallel SISO blocks and manages the conflicts by applying different approaches like stream misalignment, memory division and small FIFO buffer. The proposed architecture is low cost and consumes 4.3K gates at a frequency of 150MHz. This work also focuses on reduction of pre-processing overheads by introducing the segment based modulo computation, thus providing further relaxation to SISO decoding process.

  • 30.
    Asghar, Rizwan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Wu, Di
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Saeed, Ali
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Huang, Yulin
    Linköping University, Department of Electrical Engineering. Linköping University, The Institute of Technology.
    Liu, Dake
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support2012In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 66, no 1, p. 25-41Article in journal (Refereed)
    Abstract [en]

    This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm(2) area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm(2)/Mbps and energy efficiency of 0.55 nJ/b/iter.

  • 31.
    Ashrafi, Ashkan
    et al.
    San Diego State University.
    Strollo, Antonio G. M.
    University of Napoli Federico II.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Hardware implementation of digital signal processing algorithms2013In: Journal of Electrical and Computer Engineering, ISSN 2090-0147, E-ISSN 2090-0155, Vol. 2013, no 782575, p. 1-2Article in journal (Other academic)
  • 32.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Gokhale, Madhur
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Implementation of High-Speed 512-Tap FIR Filters for Chromatic Dispersion Compensation2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    A digital filter is a system or a device that modifies a signal. This is an essential feature in digital communication. Using optical fibers in the communication has various advantages like higher bandwidth and distance capability over copper wires. However, at high-rate transmission, chromatic dispersion arises as a problem to be relieved in an optical communication system. Therefore, it is necessary to have a filter that compensates chromatic dispersion. In this thesis, we introduce the implementation of a new architecture of the filter and compare it with a previously proposed architecture.

    Download full text (pdf)
    fulltext
  • 33.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gokhale, Madhur
    Linköping University, Department of Electrical Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Improved Implementation Approaches for 512-tap 60 GSa/s Chromatic Dispersion FIR Filters2018In: 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, IEEE , 2018, p. 213-217Conference paper (Refereed)
    Abstract [en]

    In optical communication the non-ideal properties of the fibers lead to pulse widening from chromatic dispersion. One way to compensate for this is through digital signal processing. In this work, two architectures for compensation are compared. Both are designed for 60 GSa/s and 512 filter taps and implemented in the frequency domain using FFTs. It is shown that the high-speed requirements introduce constraints on possible architectural choices. In this work, it is shown that it is not required to use two overlapping FFTs to obtain continuous filtering. In addition, efficient highly parallel implementation of FFTs is discussed and an unproved FFT compared to our earlier work is proposed. The results are compared to using an approach with a shorter FFT and FIR filters.

  • 34.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    FFT-Size Implementation Tradeoffs for Chromatic Dispersion Compensation Filters2023Conference paper (Other academic)
    Abstract [en]

    FIR filtering realized in frequency domain can use different FFT sizes leading to different arithmetic complexities. The implementation results indicate that not only arithmetic complexities must be considered for minimal power consumption.

  • 35.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Finite Word Length Analysis for FFT-Based Chromatic Dispersion Compensation Filters2021In: Signal Processing in Photonic Communications 2021, OPTICA , 2021Conference paper (Refereed)
    Abstract [en]

    Finite word length effects for frequency-domain implementation of chromatic dispersion compensation is analyzed. The results show a significant difference for the different factors when it comes to power consumption and receiver penalty.

    Download full text (pdf)
    fulltext
  • 36.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    High-Speed Chromatic Dispersion Compensation Filtering in FPGAs for Coherent Optical Communication2020In: 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), IEEE, 2020, p. 357-358Conference paper (Refereed)
    Abstract [en]

    Chromatic dispersion is one of the error sources limiting the transmission capacity in coherent optical communication that can be mitigated with digital signal processing. In this paper, the current status and plans of implementation of chromatic dispersion compensation (CDC) filters on FPGAs are discussed. As these high-speed filters are most efficiently implemented in the frequency-domain, different approaches for high-speed FFT-based architectures are considered and preliminary results of fully parallel FFT implementation by utilizing FPGA hardware features are presented.

  • 37.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Overlap-Save Commutators for High-Speed Streaming Data Filtering2021In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE , 2021Conference paper (Other academic)
    Abstract [en]

    Overlap-save and overlap-add methods enable efficient implementation of FIR filters. In this paper, a compact method for handling the overlap and shuffle of samples for realtime processing using pipelined FFT architectures is presented. It is suitable for cases when the sample rate is equal to or higher than the clock frequency

  • 38.
    Bae, Cheolyong
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson-Edefors, Per
    Chalmers University of Technology, Gothenburg, Sweden.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Benefit of Prime Factor FFTs in Fully Parallel 60 GBaud CDC Filters2020Conference paper (Refereed)
    Abstract [en]

    Prime factor algorithms are beneficial in fully parallel frequency-domain implementation of CDC filters and enable a more continuous scaling of filter lengths. ASICimplementation results in 28-nm CMOS for 60 GBd are provided.

  • 39.
    Baghyari, Roza
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Nykvist, Carolina
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Händelsekonstruktion genom säkrande och analys av data från ett hemautomationssystem2019Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesis
    Abstract [en]

    The purpose of this bachelor thesis was to extract timestamps from a home automation system with a control unit named Homey in a forensic perspective. The first step was to create a course of event regarding a burglar breaking into an apartment with home automation. The home automation system consisted of some peripheral units using different types of wireless network protocols. All these units were triggered during the break in. Thereafter different types of methods were tested in an attempt to extract the timestamps for each unit. These methods included rest-API, UART and chip-off on a flash memory. The method using JTAG were not tested due to lack of time. Rest-API was the method that provided most information about the units and time stamps. The flash memory also contained every timestamp, however it did not provide any information about which timestamp belonged to which unit. Even though the rest-API was the best method to extract data, it was also the method with most requirements such as credentials or a rooted smartphone. With the extracted timestamps it was possible to reconstruct the course of events of the break-in.

    Download full text (pdf)
    Examensrapport
  • 40.
    Bangalore Kumara Swamy, Vishal
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    FPGA-Implementation of NNLS-Based mMTC User Detector for Pilot-Hopping Sequences2021Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Download full text (pdf)
    fulltext
  • 41.
    Berggren, Erik
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Testverktyg för JTAG Boundary Scan2017Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesis
    Abstract [sv]

    Ett projekt har genomförts i python för att läsa och analysera nätlistor från eCAD programmet Altium. Projektet är en prototyp till en mjukvara som färdigutvecklad ska kunna användas till att automatisera kontakttest på mönsterkort mha JTAG Boundary Scan. Projektet undersöker hur stor andel av kontaktbanorna på några godtyckligt valda mönsterkort som är tillgängliga för Boundary Scan test och finner att i snitt 39% av kontaktbanorna är observerbara.

    Download full text (pdf)
    fulltext
  • 42.
    Bertilsson, Erik
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Massive MIMO is an emerging technology for future wireless systems that has received much attention from both academia and industry recently. The most prominent feature of Massive MIMO is that the base station is equiped with a large number of antennas. It is therefore important to create scalable architectures to enable simple deployment in different configurations.

    In this thesis, a distributed architecture for performing the baseband processing in a massive OFDM MU-MIMO system is proposed and analyzed. The proposed architecture is based on connecting several identical nodes in a K-ary tree. It is shown that, depending on the chosen algorithms, all or most computations can be performed in a distrbuted manner. Also, the computational load of each node does not depend on the number of nodes in the tree (except for some timing issues) which implies simple scalability of the system.

    It is shown that it should be enough that each node contains one or two complex multipliers and a few complex adders running at a couple of hundres MHz to support specifications similar to LTE. Additionally the nodes must communicate with each other over links with data rates in the order of some Gbps.

    Finally, a VHDL implementation of the system is proposed. The implementation is parameterized such that a system can be generated from a given specification.

    Download full text (pdf)
    fulltext
  • 43.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    A Modular Base Station Architecture for Massive MIMO with Antenna and User Scalability per Processing Node2018In: 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, IEEE , 2018, p. 1649-1653Conference paper (Refereed)
    Abstract [en]

    Massive MIMO is key technology for the upcoming fifth generation cellular networks (5G), promising high spectral efficiency, low power consumption, and the use of cheap hardware to reduce costs. Previous work has shown how to create a distributed processing architecture, where each node in a network performs the computations related to one or more antennas. The required total number of antennas, M, at the base station depends on the number of simultaneously operating terminals, K. In this work, a flexible node architecture is presented, where the number of terminals can he traded for additional antennas at the same node. This means that the same node can be used with a wide range of system configurations. The computational complexity, along with the order in which to compute incoming and outgoing symbols is explored.

  • 44.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    A Scalable Architecture for Massive MIMO Base Stations Using Distributed Processing2016In: 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, Washington: IEEE COMPUTER SOC , 2016, p. 864-868Conference paper (Refereed)
    Abstract [en]

    Massive MIMO-systems have received considerable attention in recent years as an enabler in future wireless communication systems. As the idea is based on having a large number of antennas at the base station it is important to have both a scalable and distributed realization of such a system to ease deployment. Most work so far have focused on the theoretical aspects although a few demonstrators have been reported. In this work, we propose a base station architecture based on connecting the processing nodes in a K-ary tree, allowing simple scalability. Furthermore, it is shown that most of the processing can be performed locally in each node. Further analysis of the node processing shows that it should be enough that each node contains one or two complex multipliers and a few complex adders/subtracters operating at some hundred MHz. It is also shown that a communication link of some Gbps is required between the nodes, and, hence, it is fully feasible to have one or a few links between the nodes to cope with the communication requirements.

  • 45.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Larsson, Erik G.
    Linköping University, Department of Electrical Engineering, Communication Systems. Linköping University, Faculty of Science & Engineering.
    Computation Limited Matrix Inversion Using Neumann Series Expansion for Massive MIMO2017In: 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, p. 466-469Conference paper (Refereed)
    Abstract [en]

    Neumann series expansion is a method for performing matrix inversion that has received a lot of interest in the context of massive MIMO systems. However, the computational complexity of the Neumann methods is higher than for the lowest complexity exact matrix inversion algorithms, such as LDL, when the number of terms in the series is three or more. In this paper, the Neumann series expansion is analyzed from a computational perspective for cases when the complexity of performing exact matrix inversion is too high. By partially computing the third term of the Neumann series, the computational complexity can be reduced. Three different preconditioning matrices are considered. Simulation results show that when limiting the total number of operations performed, the BER performance of the tree different preconditioning matrices is the same.

  • 46.
    Bertilsson, Erik
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Ingemarsson, Carl
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Gustafsson, Oscar
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Low-Latency Parallel Hermitian Positive-Definite Matrix Inversion for Massive MIMO2021In: 2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 23-28Conference paper (Refereed)
    Abstract [en]

    In this work, the effect of latency for three different positive definite matrix inversion algorithms when implemented on parallel and pipelined processing elements is considered. The work is motivated by the fact that in a massive MIMO system, matrix inversion needs to be performed between estimating the channels and producing the transmitted downlink signal, which means that the latency of the matrix inversion has a significant impact on the system performance. It is shown that, despite the algorithms having different complexity, all three algorithms can have the lowest latency for different number of processing elements and pipeline levels. Especially, in systems with many processing elements, the algorithm with the highest complexity has the lowest latency.

    Download full text (pdf)
    fulltext
  • 47.
    Bhide, Priyanka
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Design and Evaluation of Aceelerometer Based Mobile Authentication Techniques2017Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Smartphones’ usages are growing rapidly. Smart phone usages are not limited to the receiving/calling or SMSing anymore. People use smartphone for online shopping, searching various information in the web, bank transactions, games, different applications for different usages etc. Anything is possible by just having a smartphone and the internet. The more usages of the smartphone also increase keeping more secrete information about the user in the phone. The popularity is increasing and so is different ways to steal/hack the phones. There are many areas which require further investigation in the field of smartphone security and authentication.

    This thesis work evaluates the scope of different inbuilt sensors in smartphones for mobile authentication based techniques. The Android Operating system was used in the implementation phase. Android OS has many open source library and Services which have been used for the sensor identification using Java Android platform.

    Two applications using Accelerometer sensor and one using Magnetometer sensor were developed. Two foremost objectives of this thesis work were-1) To figure it out the possibilities of sensor based authentication technique. 2) To check the end user perception/opinion about the applications.

    Usability testing was conducted to gather the user’s assessments/vision of the applications. Two methods which were used for usability testing are named Magical move and Tapping. Users (Most of them) have shown interest and inclination towards tapping application. Although, some users were also expressed inhibitions using both sensor based methods.

    Download full text (pdf)
    ThesisPriyankaBhide
  • 48.
    Carlsson, Erik
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Synchronization of Distributed Units without Access to GPS2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Time synchronization between systems having no external reference can be an issue in small wireless node-based systems. In this thesis a transceiver is designed and implemented in two separate systems. Then the timing algorithm of "TwoWay Time Transfer" is then chosen to correct any timing error between the two free running clocks of the systems. In conclusion the results are compared towards having both systems get their timing based on GPS timing.

    Download full text (pdf)
    Erica476_exjobb
  • 49.
    Chen, Sau-Gee
    et al.
    National Chiao Tung University, Taiwan.
    Huang, Shen-Jui
    Novatek Corp, Taiwan.
    Garrido Gálvez, Mario
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
    Jou, Shyh-Jye
    National Chiao Tung University, Taiwan.
    Continuous-flow Parallel Bit-Reversal Circuit for MDF and MDC FFT Architectures2014In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 61, no 10, p. 2869-2877Article in journal (Refereed)
    Abstract [en]

    This paper presents a bit reversal circuit for continuous-flow parallel pipelined FFT processors. In addition to two flexible commutators, the circuit consists of two memory groups, where each group has P memory banks. For the consideration of achieving both low delay time and area complexity, a novel write/read scheduling mechanism is devised, so that FFT outputs can be stored in those memory banks in an optimized way. The proposed scheduling mechanism can write the current successively generated FFT output data samples to the locations without any delay right after they are successively released by the previous symbol. Therefore, total memory space of only N data samples is enough for continuous-flow FFT operations. Since read operation is not overlapped with write operation during the entire period, only single-port memory is required, which leads to great area reduction. The proposed bit-reversal circuit architecture can generate natural-order FFT output and support variable power-of-2 FFT lengths.

    Download full text (pdf)
    fulltext
  • 50.
    Davari, Mahdad
    Linköping University, Department of Electrical Engineering, Computer Engineering.
    Improving an FPGA Optimized Processor2011Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This work aims at improving an existing soft microprocessor core optimized for Xilinx Virtex®-4 FPGA. Instruction and data caches will be designed and implemented. Interrupt support will be added as well, preparing the microprocessor core to host operating systems. Thorough verification of the added modules is also emphasized in this work. Maintaining core clock frequency at its maximum has been the main concern through all the design and implementation steps.

    Download full text (pdf)
    Improving an FPGA Optimized Processor
1234567 1 - 50 of 431
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf