liu.seSearch for publications in DiVA
Endre søk
Begrens søket
45678 301 - 350 of 362
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 301.
    Ul Haque, Muhammad Fahim
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Johansson, Ted
    Linköpings universitet, Institutionen för systemteknik, Elektroniska Kretsar och System. Linköpings universitet, Tekniska fakulteten.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Power Efficienct Band-limited Pulse Width Modulated Transmitter2015Konferansepaper (Annet (populærvitenskap, debatt, mm))
  • 302.
    Ulmstedt, Mattias
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Stålberg, Joacim
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    GPU Accelerated Ray-tracing for Simulating Sound Propagation in Water2019Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    The propagation paths of sound in water can be somewhat complicated due to the fact that the sound speed in water varies with properties such as water temperature and pressure, which has the effect of curving the propagation paths. This thesis shows how sound propagation in water can be simulated using a ray-tracing based approach on a GPU using Nvidia’s OptiX ray-tracing engine. In particular, it investigates how much speed-up can be achieved compared to CPU based implementations and whether the RT cores introduced in Nvidia’s Turing architecture, which provide hardware accelerated ray-tracing, can be used to speed up the computations. The presented GPU implementation is shown to be up to 310 times faster then the CPU based Fortran implementation Bellhop. Although the speed-up is significant, it is hard to say how much speed-up is gained by utilizing the RT cores due to not having anything equivalent to compare the performance to.

  • 303.
    Unnikrishnan, Nanda K.
    et al.
    Univ Minnesota, MN 55455 USA.
    Garrido Gálvez, Mario
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Parhi, Keshab K.
    Univ Minnesota, MN 55455 USA.
    Effect of Finite Word-Length on SQNR, Area and Power for Real-Valued Serial FFT2019Inngår i: 2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE , 2019Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Modern applications for DSP systems are increasingly constrained by tight area and power requirements. Therefore, it is imperative to analyze effective strategies that work within these requirements. This paper studies the impact of finite word-length arithmetic on the signal to quantization noise ratio (SQNR), power and area for a real-valued serial FFT implementation. An experiment is set up using a hardware description language (HDL) to empirically determine the tradeoffs associated with the following parameters: (i) the input word-length, (ii) the word-length of the rotation coefficients, and (iii) length of the FFT on performance (SQNR), power and area. The results of this paper can be used to make design decisions by careful selection of word-length to achieve a reduction in area and power for an acceptable loss in SQNR.

  • 304.
    Vasilica, Vlad Valentin
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    FFT Implemention on FPGA for 5G Networks2019Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    The main goal of this thesis will be the design and implementation of a 2048-point FFT on an FPGA through the use of VHDL code.The FFT will use a butterfly Radix-2 architecture with focus on the comparison of the parameters between the system with different Worlengths, Coefficient Wordlengths and Symbol Error rates as well as different modulation types, comparing 64QAM and 256QAM for the 5Gsystem.This implementation will replace an FFT function block in a Matlab based open source 5G NR simulator based on the 3GPP 15 standard and simulate spectrum, MSE payload,and SER performance.

  • 305.
    Vidlid, Marija
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Konstruktion av strömförsörjningsmodul till testsystem2015Independent thesis Basic level (university diploma), 20 poäng / 30 hpOppgave
    Abstract [sv]

    Detta examensarbete är utfört vid den tekniska högskolan vid Linköping Universitet på programmet högskoleingenjör elektronik. Uppdragsgivaren, Flextronics, är ett företag som utvecklar generell testutrustning inom elektronikproduktion. Den testutrustning som finns behöver uppdateras och examensarbetet går ut på att bygga en ny strömförsörjningsmodul till denna. Den största skillnaden mot tidigare system är att den nya strömförsörjningsmodulen ska klara av högre uteffekt. Eftersom den nya testutrustningen redan är påbörjad finns några krav att ta hänsyn till och ett av dem är att det ska finnas en mikrokontroller i strömförsörjningsmodulen. Mikrokontrollern kan ha funktioner som är användbara såsom inbyggda DAC:ar och ADC:er och designen har gjorts om så att dessa kan utnyttjas och till och med sköta regleringen. Efter en del databladsläsande och simulerande utvecklades en lösning som har med två regulatorer vilka styrs av mikrokontrollern. Denna lösning har också konstruerats och utvärderats.

  • 306.
    von Hacht, Karl-Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Garden Monitoring with Embedded Systems2015Independent thesis Basic level (university diploma), 10 poäng / 15 hpOppgave
    Abstract [en]

    In today’s modern society the process of handling crops in an accountable way withoutloss have become more and more important. By letting a gardener evaluate the progressof his plants from relevant data one can reduce these losses and increase effectiveness ofthe whole plantation. This work is about the construction of such a system composedfrom a developers perspective of three different platforms, from the start of data samplingwithin the context of gardening to and end user easily able to understand the data thentranslated. The first platform will be created from scratch with both hardware andsoftware, the next assembled from already finished hardware components and build withsimpler software. The last will essentially only be a software solution in an alreadyfinished hardware environment.

  • 307.
    Voronov, Sergii
    et al.
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Frisk, Erik
    Linköpings universitet, Institutionen för systemteknik, Fordonssystem. Linköpings universitet, Tekniska fakulteten.
    Krysander, Mattias
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Data-Driven Battery Lifetime Prediction and Confidence Estimation for Heavy-Duty Trucks2018Inngår i: IEEE Transactions on Reliability, ISSN 0018-9529, E-ISSN 1558-1721, Vol. 67, nr 2, s. 623-639Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Maintenance planning is important in the automotive industry as it allows fleet owners or regular customers to avoid unexpected failures of the components. One cause of unplanned stops of heavy-duty trucks is failure in the lead-acid starter battery. High availability of the vehicles can be achieved by changing the battery frequently, but such an approach is expensive both due to the frequent visits to a workshop and also due to the component cost. Here, a data-driven method based on random survival forest (RSF) is proposed for predicting the reliability of the batteries. The dataset available for the study, covering more than 50 000 trucks, has two important properties. First, it does not contain measurements related directly to the battery health; second, there are no time series of measurements for every vehicle. In this paper, the RSF method is used to predict the reliability function for a particular vehicle using data from the fleet of vehicles given that only one set of measurements per vehicle is available. A theory for confidence bands for the RSF method is developed, which is an extension of an existing technique for variance estimation in the random forest method. Adding confidence bands to the RSF method gives an opportunity for an engineer to evaluate the confidence of the model prediction. Some aspects of the confidence bands are considered: their asymptotic behavior and usefulness in model selection. A problem of including time-related variables is addressed in this paper with the argument that why it is a good choice not to add them into the model. Metrics for performance evaluation are suggested, which show that the model can be used to schedule and optimize the cost of the battery replacement. The approach is illustrated extensively using the real-life truck data case study.

  • 308.
    Wahlin, Yngve
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Feldt, Hannes
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Implementation & utvärdering av spelmotor i WebGL2013Independent thesis Basic level (university diploma), 10,5 poäng / 16 hpOppgave
    Abstract [en]

    This report describes an analysis of WebGL together with JavaScript with the aim to examine its limitations, strengths and weaknesses. This analysis was performed by building a 2D game engine containing some dynamic elements such as water, smoke, fire, light, and more. Different algorithms have been tested and analyzed to provide a clearer picture of how these work together. The report will go through the most basic functions of the game engine and describe briefly how these work.

    The result shows that JavaScript with WebGL can be considered to be a potent toolsets, despite the difficulties caused by JavaScript.

    In summary, similar projects can be recommended as Javascript and WebGL proved both fun and incredibly rewarding to work with.

  • 309.
    Wang, Jian
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor2014Doktoravhandling, monografi (Annet vitenskapelig)
    Abstract [en]

    The physical scaling following Moore’s law is saturated while the requirement on computing keeps growing. The gain from improving silicon technology is only the shrinking of the silicon area, and the speed-power scaling has almost stopped in the last two years. It calls for new parallel computing architectures and new parallel programming methods.

    Traditional ASIC (Application Specific Integrated Circuits) hardware has been used for acceleration of Digital Signal Processing (DSP) subsystems on SoC (System-on-Chip). Embedded systems become more complicated, and more functions, more applications, and more features must be integrated in one ASIC chip to follow up the market requirements. At the same time, the product lifetime of a SoC with ASIC has been much reduced because of the dynamic market. The life time of the design for a typical main chip in a mobile phone based on ASIC acceleration is about half a year and the NRE (Non-Recurring Engineering) cost of it can be much more than 50 million US$.

    The current situation calls for a new solution than that of ASIC. ASIP (Application Specific Instruction set Processor) offers comparable power consumption and silicon cost to ASICs. Its greatest advantage is the functional flexibility in a predefined application domain. ASIP based SoC enables software upgrading without changing hardware. Thus the product life time can be 5-10 times more than that of ASIC based SoC.

    This dissertation will present an ASIP based SoC, a new unified parallel DSP subsystem named ePUMA (embedded Parallel DSP Platform with Unique Memory Access), to target embedded signal processing in  communication and multimedia applications. The unified DSP subsystem can further reduce the hardware cost, especially the memory cost, of embedded SoC processors, and most importantly, provide full programmability for a wide range of DSP applications. The ePUMA processor is based on a master-slave heterogeneous multi-core architecture. One master core performs the central control, and multiple Single Instruction Multiple Data (SIMD) coprocessors work in parallel to offer a majority of the computing power.

    The focus and the main contribution of this thesis are on the memory subsystem design of ePUMA. The multi-core system uses a distributed memory architecture based on scratchpad memories and software controlled data movement. It is suitable for the data access properties of streaming applications and the kernel based multi-core computing model. The essential techniques include the conflict free access parallel memory architecture, the multi-layer interconnection network, the non-address stream data transfer, the transitioned memory buffers, and the lookup table based parallel memory addressing. The goal of the design is to minimize the hardware cost, simplify the software protocol for inter-processor communication, and increase the arithmetic computing efficiency.

    We have so far proved by applications that most DSP algorithms, such as filters, vector/matrix operations, transforms, and arithmetic functions, can achieve computing efficiency over 70% on the ePUMA platform. And the non-address stream network provides equivalent communication bandwidth by less than 30% implementation cost of a crossbar interconnection.

  • 310.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Joar, Sohl
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Olof, Kraigher
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    ePUMA: a novel embedded parallel DSP platform for predictable computing2010Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, a novel parallel DSP platform based on master-multi-SIMD architecture is introduced. The platform is named ePUMA [1]. The essential technology is to use separated data access kernels and algorithm kernels to minimize the communication overhead of parallel processing by running the two types of kernels in parallel. ePUMA platform is optimized for predictable computing. The memory subsystem design that relies on regular and predictable memory accesses can dramatically improve the performance according to benchmarking results. As a scalable parallel platform, the chip area is estimated for different number of co-processors. The aim of ePUMA parallel platform is to achieve low power high performance embedded parallel computing with low silicon cost for communications and similar signal processing applications.

  • 311.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlsson, Andréas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Convolutional Decoding on Deep-pipelined SIMD Processor with Flexible Parallel Memory2012Inngår i: Digital System Design (DSD), 2012, IEEE , 2012, s. 529-532Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Single Instruction Multiple Data (SIMD) architecture has been proved to be a suitable parallel processor architecture for media and communication signal processing. But the computing overhead such as memory access latency and vector data permutation limit the performance of conventional SIMD processor. Solutions such as combined VLIW and SIMD architecture are designed with an increased complexity for compiler design and assembly programming. This paper introduces the SIMD processor in the ePUMA1 platform which uses deep execution pipeline and flexible parallel memory to achieve high computing performance. Its deep pipeline can execute combined operations in one cycle. And the parallel memory architecture supports conflict free parallel data access. It solves the problem of large vector permutation in a short vector SIMD machine in a more efficient way than conventional vector permutation instruction. We evaluate the architecture by implementing the soft decision Viterbi algorithm for convolutional decoding. The result is compared with other architectures, including TI C54x, CEVA TeakLike III, and PowerPC AltiVec, to show ePUMA’s computing efficiency advantage.

  • 312.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlsson, Andréas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Pettersson, Magnus
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    A multi-level arbitration and topology free streaming network for chip multiprocessor2011Inngår i: ASIC (ASICON), 2011, IEEE , 2011, s. 153-158Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Predictable computing is common in embedded signal processing, which has communication characteristics of data independent memory access and long streaming data transfer. This paper presents a streaming network-on-chip (NoC) StreamNet for chip multiprocessor (CMP) platform targeting predictable signal processing. The network is based on circuit-switch and uses a two-level arbitration scheme. The first level uses fast hardware arbitration, and the second level is programmable software arbitration. Its communication protocol is designed to support free choice of network topology. Associated with its scheduling tool, the network can achieve high communication efficiency and improve parallel computing performance. This NoC architecture is used to design the Ring network in the ePUMA1 multiprocessor DSP. The evaluation by the multi-user signal processing application at the LTE base-station shows the low parallel computing overhead on the ePUMA multiprocessor platform.

  • 313.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Dake, Liu
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor2010Inngår i: EUC '10 Proceedings of the 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, Washington, DC, USA: IEEE Computer Society , 2010, s. 47-52Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient architecture for high performance signal processing which explores both task level parallelism by multi-core processing and data level parallelism by SIMD processors. Different from the cache-based memory subsystem in most general purpose processors, this architecture uses on-chip scratchpad memory (SPM) as processor local data buffer and allows software to explicitly control the data movements in the memory hierarchy. This SPM-based solution is more efficient for predictable signal processing in embedded systems where data access patterns are known at design time. The predictable performance is especially important for real time signal processing. According to Amdahl¡¯s law, the nonparallelizable part of an algorithm has critical impact on the overall performance. Implementing an algorithm in a parallel platform usually produces control and communication overhead which is not parallelizable. This paper presents the architectural support in an embedded multiprocessor platform to maximally reduce the parallel processing overhead. The effectiveness of these architecture designs in boosting parallel performance is evaluated by an implementation example of 64x64 complex matrix multiplication. The result shows that the parallel processing overhead is reduced from 369% to 28%.

  • 314.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Karlsson, Andréas
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    An Efficient Streaming Star Network for Multi-core Parallel DSP Processor2011Inngår i: Networking and Computing (ICNC), 2011, IEEE , 2011, s. 332-336Konferansepaper (Fagfellevurdert)
    Abstract [en]

    As more and more computing components are integrated into one digital signal processing (DSP) system to achieve high computing power by executing tasks in parallel, it is soon observed that the inter-processor and processor to memory communication overheads become the performance bottleneck and limit the scalability of a multi-processor platform. For chip multiprocessor (CMP) DSP systems targeting on predictable computing, an appreciation of the communication characteristics is essential to design an efficient interconnection architecture and improve performance. This paper presents a Star network designed for the ePUMA multi-core DSP processor based on analysis of the network communication models. As part of ePUMA’s multi-layer interconnection network, the Star network handles core to off-chip memory communications for kernel computing on slave processors. The network has short setup latency, easy multiprocessor synchronization, rich memory addressing patterns, and power efficient streaming data transfer. The improved network efficiency is evaluated in comparison with a previous study.

  • 315.
    Wang, Jian
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Sohl, Joar
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Kraigher, Olof
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Dake, Liu
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Software programmable data allocation in multi-bank memory of SIMD processors2010Inngår i: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools / [ed] Sebastian Lopez, Washington, DC, USA: IEEE Computer Society , 2010, s. 28-33Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.

  • 316. Wang, Qi
    et al.
    Wu, Di
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Cost Analysis of Channel Estimation in MIMO-OFDM for Software Defined Radio2008Inngår i: WCNC 2008: IEEE WIRELESS COMMUNICATIONS & NETWORKING CONFERENCE, IEEE , 2008, s. 935-939Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Channel State Information (CSI) is critical for the overall performance or wireless systems. Meanwhile, the estimation or CSI forms one or the most intensive tasks in radio baseband signal processing. This paper investigates the real-time implementation or channel estimation for MIMO-OFDM systems using programmable hardware aimed for software defined radio. Based on the programmable hardware architecture proposed by us, several prevalent channel estimation methods such as Least Square (LS), Minimum Mean Square Error (MMSE) and Pilot-Symbol-Aided (PSA) are evaluated from both the performance and computational latency perspectives. By utilizing the symmetric feature of the covariance matrix, a simplified two-sided Jacobi rotation method is adopted to speed up the complex-valued singular value decomposition involved in the MMSE channel estimation.

  • 317.
    Wang, Zhenyu
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Digits-Recognition Convolutional Neural Network on FPGA2019Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    A convolutional neural network (CNN) is a deep learning framework that is widely used in computer vision. A CNN extracts important features of input images by perform- ing convolution and reduces the parameters in the network by applying pooling operation. CNNs are usually implemented with programming languages and run on central process- ing units (CPUs) and graphics processing units (GPUs). However in recent years, research has been conducted to implement CNNs on field-programmable gate array (FPGA).

    The objective of this thesis is to implement a CNN on an FPGA with few hardware resources and low power consumption. The CNN we implement is for digits recognition. The input of this CNN is an image of a single digit. The CNN makes inference on what number it is on that image. The performance and power consumption of the FPGA is compared with that of a CPU and a GPU.

    The results show that our FPGA implementation has better performance than the CPU and the GPU, with respect to runtime, power consumption, and power efficiency.

  • 318.
    Wei, Zhengzhe
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    H.264 Baseline Real-time High Definition Encoder on CELL2010Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    In this thesis a H.264 baseline high definition encoder is implemented on CELL processor. The target video sequence is YUV420 1080p at 30 frames per second in our encoder. To meet real-time requirements, a system architecture which reduces DMA requests is designed for large memory accessing. Several key computing kernels: Intra frame encoding, motion estimation searching and entropy coding are designed and ported to CELL processor units. A main challenge is to find a good tradeoff between DMA latency and processing time. The limited 256K bytes on-chip memory of SPE has to be organized efficiently in SIMD way. CAVLC is performed in non-real-time on the PPE.

     

    The experimental results show that our encoder is able to encode I frame in high quality and encode common 1080p video sequences in real-time. With the using of five SPEs and 63KB executable code size, 20.72M cycles are needed to encode one P frame partitions for one SPE. The average PSNR of P frames increases a maximum of 1.52%. In the case of fast speed video sequence, 64x64 search range gets better frame qualities than 16x16 search range and increases only less than two times computing cycles of 16x16. Our results also demonstrate that more potential power of the CELL processor can be utilized in multimedia computing.

     

    The H.264 main profile will be implemented in future phases of this encoder project. Since the platform we use is IBM Full-System Simulator, DMA performance in a real CELL processor is an interesting issue. Real-time entropy coding is another challenge to CELL.

  • 319.
    Werin, Atle
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Use of a Multiplexer to get Multiple Streams Through a Limited Interface: Encapsulation of digital video broadcasting streams2016Independent thesis Basic level (university diploma), 180 hpOppgave
    Abstract [en]

    In digital video broadcasting, sometimes many sources are used. When handling this broadcast a problem is a limited interface that has a fixed number to input channels but overcapacity in data transfer rate. To be able to connect more inputs to the interface a protocol that lets the user send more than one channel on a connection is needed. The important part for the protocol is that it keeps the input equal to the output both in timing and in what data is sent. These are done by encapsulating the data and use a header containing information for recreating the input. To solve the timing constraint dynamic buffer are used that makes all data evenly delayed. To validate the functionality of the protocol a test designed is implemented in VHDL and simulated.

  • 320.
    Wernhoff, Carl
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    An FPGA implementation of neutrino track detection for the IceCube telescope2010Independent thesis Advanced level (professional degree), 20 poäng / 30 hpOppgave
    Abstract [en]

    The IceCube telescope is built within the ice at the geographical South Pole in the middle of the Antarctica continent. The purpose of the telescope is to detect muon neutrinos, the muon neutrino being an elementary particle with minuscule mass coming from space.

    The detector consists of some 5000 DOMs registering photon hits (light). A muon neutrino traveling through the detector might give rise to a track of photons making up a straight line, and by analyzing the hit output of the DOMs, looking for tracks, neutrinos and their direction can be detected.

    When processing the output, triggers are used. Triggers are calculation- efficient algorithms used to tell if the hits seem to make up a track - if that is the case, all hits are processed more carefully to find the direction and other properties of the track.

    The Track Engine is an additional trigger, specialized to trigger on low- energy events (few track hits), which are particularly difficult to detect. Low-energy events are of special interest in the search for Dark Matter.

    An algorithm for triggering on low-energy events has been suggested. Its main idea is to divide time in overlapping time windows, find all possible pairs of hits in each time window, calculate the spherical coordinates θ and ϕ of the position vectors of the hits of the pairs, histogram the angles, and look for peaks in the resulting 2d-histogram. Such peaks would indicate a straight line of hits, and, hence, a track.

    It is not believed that a software implementation of the algorithm would be fast enough. The Master's Thesis project has had the aim of developing an FPGA implementation of the algorithm.

    Such an FPGA implementation has been developed. Extensive tests on the design has yielded positive results showing that it is fully functional. The design can be synthesized to about 180 MHz, making it possible to handle an incoming hit rate of about 6 MHz, giving a margin of more than twice to the expected average hit rate of 2.6 MHz.

  • 321.
    Westerholm, Glenn
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Kadenssensor med en accelerometer och ANT+2015Independent thesis Basic level (degree of Bachelor), 10,5 poäng / 16 hpOppgave
    Abstract [sv]

    Rapporten presenterar det examensarbetet som har gått ut på att undersöka möjligheterna att konstruera en sensor som mäter kadens med hjälp av en accelerometer. Implementation av kadensprofilen till ANT+ har gjorts för att möjliggöra synkronisering mellan en sportklocka och sensorn. Kadens är hur fort cyklisten trampar med pedalerna mätt i enheten Varv per minut vanligt förkortat RPM från engelskans Revolutions Per Minute. Hur fort en cyklist trampar påverkar kroppen på många olika sätt och ofta vill cyklisten veta vad aktuell kadens är för att optimera sin prestation. Den undersökta principen att använda en accelerometer för att mäta kadens syftar till att en eventuell prototyp skulle vara lämplig till inomhuscykling även kallad spinning. På en vanlig traditionell cykel har man oftast två hårdvarudelar för att mäta kadens, en monterad på pedalarmen och den andra på cykelramen. Cykelramen på en spinningcykel skiljer sig så pass mot en vanlig cykel att hårdvarudelen som ska sitta på cykelramen inte kan monteras med samma lätthet. Med en accelerometer behövs bara en hårdvarudel som lätt kan monteras på pedalarmen på cykeln. Programutvecklingen har skett med ett Arduino Uno  som består av en ATmega328 mikrokontroller från Atmel. Sensorenheten som mäter kadensen består av Arduino Uno, accelerometern LSM303DLHC från STMicroelectronics och ANT-chippet nRF24AP2 från Nordic Semiconductor. Huvudenheten har bestått av en persondator som har agerat mottagare med programmet ANT+ Simulator. Det utvecklade programmet på mikrokontrollen upptäcker när det sker ett pedalvarv och skickar den totala varvtiden tillsammans med antal pedalvarv som totalt inträffat till nRF24AP2 vidare till huvudenheten. Kadensprofilen är den som räknar ut vad aktuell kadens är. Avslutningsvis presenteras ett minimumkrav av hårdvaran och ett förslag av en energisnål mikrokontroller för en eventuell prototyp.

  • 322.
    Wiklund, Daniel
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    An on-chip network architecture for hard real time systems2003Licentiatavhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    With the ever increasing demands on processing power and communication on a single chip the industry is facing a huge obstacle in closing the gap between possible complexity and achieved complexity, the so called design gap. A possible path out of this is the increase (re-)use of intellectual property (IP) blocks from within the company or from other suppliers. We have identified the problem area in the on-chip communication between IP blocks where the time-division multiplex buses are quickly becoming saturated.

    Another problem arising with the increased use of deep submicron manufacturing technologies is the relatively long delay of wires compared to the gates. This problem forces the synchronous part of a chip to either shrink or run at a slower speed. With the goals of keeping the clock rate and increasing the complexity the only feasible solution is to use smaller synchronous subsystems that communicate asynchronously. This approach is known as globally asynchronous but locally synchronous (GALS).

    This thesis presents the work on a bus replacement for on-chip communication. The goal of this bus replacement is to achieve very high performance compared to the old solution while allowing for higher flexibility, GALS style implementation, and simpler verification of the system.

    With this goal in mind we investigated the possible topologies for a switched on-chip network (OCN) and concluded that a 2-d mesh or torus is the most appropriate. To keep the latency low we decided on a pseudo-circuit switched network using the 2-d mesh. We have developed a novel approach for route setup in the circuit switched network called packet connected circuit (PCC) which allows very short latency both for routing and payload transfer while having a very low silicon cost.

    A simulator for this network has been implemented together with behavioral models of the network components. Simulations have shown that the PCC concept is not very suitable for general purpose processing platforms but that it is very suitable for a hard real time system that uses some communication scheduling.

    Delarbeid
    1. Switched interconnect for system-on-a-chip design
    Åpne denne publikasjonen i ny fane eller vindu >>Switched interconnect for system-on-a-chip design
    2000 (engelsk)Inngår i: Proceedings of the IP2000 Europe Conference, 2000, s. 185-192Konferansepaper, Publicerat paper (Annet vitenskapelig)
    Abstract [en]

    With the increased use of IP cores in chip designs, an increasing amount of time is spent on design and verification of glue logic. To solve this problem together with the bottleneck problem of arbitration based buses, a novel approach in system-on-a-chip interconnect has been investigated. The approach is based on a switched interconnect structure, with small crossbar switches connected in a mesh for intercore communications with low latency in system-on-chip solutions. The interfaces between the interconnect network and the cores are handled by configurable wrappers that adapt the port parameters from core to network format. The core functionality of the interconnect network can be fully verified with a fairly low work effort even when configurable, so the main problem for cutting verification time is the quite complex wrappers. The concept is to make the wrappers highly configurable yet needing short verification time in an application by making a fairly complete verification of the wrappers for all configurations. How this can be achieved is under investigation. The approach described in this paper is mainly aimed for use in communication equipment where high bandwidth and low latency is essential.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-100987 (URN)
    Konferanse
    IP2000 Europe Conference. Edinburgh, Scotland, 2000
    Tilgjengelig fra: 2013-11-15 Laget: 2013-11-15 Sist oppdatert: 2013-11-15
    2. Design of a system-on-chip switched network and its design support
    Åpne denne publikasjonen i ny fane eller vindu >>Design of a system-on-chip switched network and its design support
    2002 (engelsk)Inngår i: IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions, 2002, s. 1279-1283Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    As the degree of integration increases, the on-chip communication is becoming a bottleneck. A solution to this problem is to use an on-chip switched interconnect network. Such a system-on-chip network was proposed in 2000 by the same authors. In this paper, we present the system-on-chip network in detail together with the design flow support. The choice of topology for the network, as well as some ways to use the network to overcome the future physical implementation issues of wire delay, and to gain performance, is also discussed. To aid the design choices of the network, a behavioral simulator has been created. The importance of the behavioral simulator is clearly shown from the design flow and the design and implementation of this simulator is discussed in detail.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-33551 (URN)10.1109/ICCCAS.2002.1179016 (DOI)19576 (Lokal ID)0-7803-7547-5 (ISBN)19576 (Arkivnummer)19576 (OAI)
    Konferanse
    International Conference on Communications, Circuits and Systems and West Sino Expositions (ICCCAS), Chengdu, China. 29 June - 1 July. 2002.
    Tilgjengelig fra: 2009-10-09 Laget: 2009-10-09 Sist oppdatert: 2013-11-15
    3. SoCBUS: switched network on chip for hard real time embedded systems
    Åpne denne publikasjonen i ny fane eller vindu >>SoCBUS: switched network on chip for hard real time embedded systems
    2003 (engelsk)Inngår i: Proceedings. International Parallel and Distributed Processing Symposium, 2003, 2003, s. 78-Konferansepaper, Publicerat paper (Fagfellevurdert)
    Abstract [en]

    With the current trend in integration of more complex systems on chip there is a need for better communication infrastructure on chip that will increase the available bandwidth and simplify the interface verification. We have previously proposed a circuit switched two-dimensional mesh network known as SoCBUS that increases performance and lowers the cost of verification. In this paper, the SoCBUS is explained together with the working principles of the transaction handling. We also introduce the concept of packet connected circuit, PCC, where a packet is switched through the network locking the circuit as it goes. PCC is deadlock free and does not impose any unnecessary restrictions on the system while being simple and efficient in implementation. SoCBUS uses this PCC scheme to set up routes through the network. We introduce a possible application, a telephone to voice-over-IP gateway, and use this to show that the SoCBUS have very good properties in bandwidth, latency, and complexity when used in a hard real time system with scheduling of the traffic. The simulations analysis of the SoCBUS in the application show that a certain SoCBUS setup can handle 48000 channels of voice data including buffer swapping in a single chip. We also show that the SoCBUS is not suitable for general purpose computing platforms that exhibit random traffic patterns but that the SoCBUS show acceptable performance when the traffic is mainly local.

    HSV kategori
    Identifikatorer
    urn:nbn:se:liu:diva-33271 (URN)10.1109/IPDPS.2003.1213180 (DOI)19271 (Lokal ID)19271 (Arkivnummer)19271 (OAI)
    Konferanse
    International Parallel and Distributed Processing Symposium. Nice, France, 22-26 April, 2003.
    Tilgjengelig fra: 2009-10-09 Laget: 2009-10-09 Sist oppdatert: 2013-11-15
  • 323.
    Wiklund, Daniel
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Implementation of a behavioral simulator for on-chip switched networks2002Inngår i: Swedish System-on-Chip Conference,2002, 2002Konferansepaper (Annet vitenskapelig)
  • 324.
    Wiklund, Daniel
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Mesochronous clocking and communication in on-chip2003Inngår i: Swedish System-on-Chip Conference,2003, 2003Konferansepaper (Annet vitenskapelig)
  • 325.
    Wiklund, Daniel
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Processing and memory requirements for a 3G WCDMA2004Inngår i: Swedish System -on-Chip Conference, SSoCC,2004, 2004Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    The WCDMA standard is the main system used for third generation mobile communications in Europe. The basestation is the central node in the radio access network. The cost of a basestation is a very important factor in the deployment of the 3G network. This cost can be lowered by merging functionality of the basestation into fewer components. In order to get an appropriate system level model of the 3G WCDMAbasestation baseband part, a processing task survey has been done. This survey has been conducted through analysis of the standard documents and published research papers. The results from the survey show that the baseband part of a 128 channel basestation may be possible to implement on a single chip.

  • 326.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Ehliar, Andreas
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Design of an internet core router using the SoCBUS network on chip2005Inngår i: International Symposium on Signals, Circuits, and Systems ISSCS,2005, 2005Konferansepaper (Fagfellevurdert)
  • 327.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Design, mapping, and simulations of a 3G WCDMA/FDD basestation using network on chip2005Inngår i: International workshop on SoC for real-time applications,2005, 2005Konferansepaper (Fagfellevurdert)
  • 328.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Design of a system-on-chip switched network and its design support2002Inngår i: IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions, 2002, s. 1279-1283Konferansepaper (Fagfellevurdert)
    Abstract [en]

    As the degree of integration increases, the on-chip communication is becoming a bottleneck. A solution to this problem is to use an on-chip switched interconnect network. Such a system-on-chip network was proposed in 2000 by the same authors. In this paper, we present the system-on-chip network in detail together with the design flow support. The choice of topology for the network, as well as some ways to use the network to overcome the future physical implementation issues of wire delay, and to gain performance, is also discussed. To aid the design choices of the network, a behavioral simulator has been created. The importance of the behavioral simulator is clearly shown from the design flow and the design and implementation of this simulator is discussed in detail.

  • 329.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    SoCBUS: switched network on chip for hard real time embedded systems2003Inngår i: Proceedings. International Parallel and Distributed Processing Symposium, 2003, 2003, s. 78-Konferansepaper (Fagfellevurdert)
    Abstract [en]

    With the current trend in integration of more complex systems on chip there is a need for better communication infrastructure on chip that will increase the available bandwidth and simplify the interface verification. We have previously proposed a circuit switched two-dimensional mesh network known as SoCBUS that increases performance and lowers the cost of verification. In this paper, the SoCBUS is explained together with the working principles of the transaction handling. We also introduce the concept of packet connected circuit, PCC, where a packet is switched through the network locking the circuit as it goes. PCC is deadlock free and does not impose any unnecessary restrictions on the system while being simple and efficient in implementation. SoCBUS uses this PCC scheme to set up routes through the network. We introduce a possible application, a telephone to voice-over-IP gateway, and use this to show that the SoCBUS have very good properties in bandwidth, latency, and complexity when used in a hard real time system with scheduling of the traffic. The simulations analysis of the SoCBUS in the application show that a certain SoCBUS setup can handle 48000 channels of voice data including buffer swapping in a single chip. We also show that the SoCBUS is not suitable for general purpose computing platforms that exhibit random traffic patterns but that the SoCBUS show acceptable performance when the traffic is mainly local.

  • 330.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Switched interconnect for system-on-a-chip design2000Inngår i: Proceedings of the IP2000 Europe Conference, 2000, s. 185-192Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    With the increased use of IP cores in chip designs, an increasing amount of time is spent on design and verification of glue logic. To solve this problem together with the bottleneck problem of arbitration based buses, a novel approach in system-on-a-chip interconnect has been investigated. The approach is based on a switched interconnect structure, with small crossbar switches connected in a mesh for intercore communications with low latency in system-on-chip solutions. The interfaces between the interconnect network and the cores are handled by configurable wrappers that adapt the port parameters from core to network format. The core functionality of the interconnect network can be fully verified with a fairly low work effort even when configurable, so the main problem for cutting verification time is the quite complex wrappers. The concept is to make the wrappers highly configurable yet needing short verification time in an application by making a fairly complete verification of the wrappers for all configurations. How this can be achieved is under investigation. The approach described in this paper is mainly aimed for use in communication equipment where high bandwidth and low latency is essential.

  • 331.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Sathe, Sumant
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Benchmarking of On-Chip Interconnection Networks2004Inngår i: International Conference on Microelectronics, ICM,2004, 2004Konferansepaper (Fagfellevurdert)
  • 332.
    Wiklund, Daniel
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Sathe, Sumant
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Network on chip simulations for benchmarking2004Inngår i: International Workshop on SoC for real-time applications,2004, 2004Konferansepaper (Fagfellevurdert)
  • 333.
    Wikström, Rolf
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska fakulteten.
    Mysak Konstruktion av ett mät- och stimulikort för mobiltelefoner2015Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [sv]

    Vid högvolymsproduktion av konsumentelektronik är testtid, mätnogrannhet och fabriksgolvsutrymme synonymt med kostnader. Detta har gjort att man på sektionen Test Engineering vid Ericsson Mobile Communications fabrik i Linköping tagit fram ett testkoncept, kallat Pelle-konceptet, där datorkraft och mätutrustning flyttas in i små utrymmessnåla testfixturer anpassade för både robotiserade produktionslinor och manuella. Det nya testkonceptet saknade på våren 1998 ett generellt mät- och stimulikort för mobiltelefoner vilket specificerades och konstruerades  under sommren och hösten 1998 av författaren och Stefan Lantz. I rapporten beskrivs arbetet med att specificera kortets funktionsblock samt detaljkonstruktion av de funktionsblock som författaren ansvarat för. Rapporten ger även en inblick i designarbetet och dess svårigheter för ett mät- och stimulikort med många olika funktionsblock på en mycket begränsad yta, samt de problem som uppstår i många projekt till följd av förändrade krav, missupfattningar, kommunikationsmissar och saknad dokumentation.

  • 334.
    Winberg, Ulf
    Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    DRAM Controller Benchmarking2009Independent thesis Advanced level (professional degree), 20 poäng / 30 hpOppgave
    Abstract [en]

    Since a few years, flat screen TVs, such as LCD and plasma, has come to completelydominate the market of televisions. In a SoC solution for digital TVs, severalprocessors are used to obtain a decent image quality. Some of the processorsneed temporal information, which means that whole frames need to be storedin memory, which in turn motivates the use of SDRAM memory. When higherdemands of resolution and image quality arrives, greater pressure is put on theperformance of the SoC memory subsystem, to not become a bottleneck of thesystem.

    In this master thesis project, a model of an existing SoC for digital TVs is usedto benchmark and evaluate the performance of an SDRAM memory controllerarchitecture study. The two major features are the ability to reorder transactionsand the compatibility with DDR3. By introducing reordering of transactions, thechoice is given to the memory controller to service memory requests in an orderthat decreases bank conflicts and read/write turn arounds. Measurements showthat a utilization of 86.5 % of the total available bandwidth can be achieved, whichis 18.5 percentage points more, compared to an existing nonreordering memorycontroller developed by NXP.

  • 335.
    Winqvist, Arvid
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    DSP implementation of the Cholesky factorisation2014Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    The Cholesky factorisation is an efficient tool that, when used correctly, significantlycan reduce the computational complexity in many applications. This thesiscontains an in-depth study of the factorisation, some of its applications andan implementation on the Coresonic SIMT DSP architecture.

  • 336.
    Wu, Di
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Scalable Multi-Standard Radio Baseband for Modern Wireless Communications2009Doktoravhandling, monografi (Annet vitenskapelig)
    Abstract [en]

    Today, owing to the rapid advancement of technologies, people can cross the geographic gap and communicate without waiting for a week to receive a mail. Meanwhile, more and more wireless communications standards are emerging, as all claimed to make our life easier. This really brings us into a dilemma: we need new technologies, not because we are fond of technical complication,

    on the contrary, because we are constantly pursuing convenience and simplicity. Being tangled by so many standards for connectivity is not fun for anyone (even for people who invented these technologies). The demand is rather simple: why not put everything into one unit which can automatically attach itself to the most suitable radio access available in the circumstances? The whole purpose of this thesis is to find out an economic way of meeting such a demand.

    From semiconductor industry’s point-of-view, traditional ASIC design flow is facing the challenges brought by the ever rapidly changing specification and immense tape-out cost at nanoscale. Let alone the ever increased system complexity requires painstaking and costly integration and verification.

    This thesis investigates multi-tasking radio which is a concept to allow multiple radio access technologies to be supported by the same hardware platform and switched under different scenarios. By simultaneously looking at different layers of abstraction such as system modeling and simulation, architecture design, and silicon implementation, the design tradeoff for multi-tasking radio baseband is discussed.

    In this dissertation, taking the emerging mobile broadband standard 3GPP LTE as the focus and other standards (e.g IEEE 802.11n and DVB) as complements, the system architecture of a multi-tasking radio platform is studied. A general multi-tasking radio baseband chain is partitioned into several functional blocks according to the processing flow and investigated separately. These blocks include synchronization, channel estimation, demodulation and channel coding. Different algorithms are evaluated for each functional block. A new multiple-input multipleoutput symbol detection algorithm “modified fixed-complexity soft-output”, in short MFCSO, is proposed and implemented in silicon. A unified synchronization unit is presented to support several standards. The architecture of channel estimator is also addressed. Finally a highspeed radix-2 Turbo decoder implementation is presented leading towards radix-4 scenario. It is worth mentioning that in this dissertation, the performance evaluation takes the complete system into consideration rather than independently analyzing an individual block. Based on this, algorithm/hardware co-optimization is carried out. Using the “Single Instruction Multiple Tasks” architecture presented earlier, by exploring the commonality of signal processing functions and choosing the proper level of hardware multiplexing, it is concluded in this dissertation that system thinking allows a harmony to be achieved for multi-tasking radio baseband design.

  • 337.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Huang, Yulin
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Implementation of a high-speed parallel Turbo decoder for 3GPP LTE terminals2009Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a parameterized parallel Turbo decoder for 3GPP LTE terminals. To support the high peak data-rate defined in the forthcoming 3GPP LTE standard, turbo decoder with a throughout beyond 150 Mbit/s is needed as a key component of the radio baseband chip. By exploiting the tradeoff of precision, speed and area consumption, a turbo decoder with eight parallel SISO units is implemented to meet the throughput requirement. The turbo decoder is synthesized, placed and routed using 65 nm CMOS technology. It achieves a throughput of 152 Mbit/s and occupies an area of 0.7 mm2 with estimated power consumption being 650 mW.

  • 338.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    VLSI Implementation of a Fixed-Complexity Soft-Output MIMO Detector for High-Speed Wireless2010Inngår i: EURASIP Journal on Wireless Communications and Networking, ISSN 1687-1472, E-ISSN 1687-1499, Vol. 2010, nr 893184Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a low-complexity MIMO symbol detector with close-Maximum a posteriori performance for the emerging multiantenna enhanced high-speed wireless communications. The VLSI implementation is based on a novel MIMO detection algorithm called Modified Fixed-Complexity Soft-Output (MFCSO) detection, which achieves a good trade-off between performance and implementation cost compared to the referenced prior art. By including a microcode-controlled channel preprocessing unit and a pipelined detection unit, it is flexible enough to cover several different standards and transmission schemes. The flexibility allows adaptive detection to minimize power consumption without degradation in throughput. The VLSI implementation of the detector is presented to show that real-time MIMO symbol detection of 20 MHz bandwidth 3GPP LTE and 10 MHz WiMAX downlink physical channel is achievable at reasonable silicon cost.

  • 339.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Ge, Qun
    Linköpings universitet, Institutionen för systemteknik. Linköpings universitet, Tekniska högskolan.
    VLSI Implementation of A Multi-Standard MIMO Symbol Detector for 3GPP LTE and WiMAX2010Inngår i: Wireless Telecommunications Symposium (WTS), 2010, IEEE , 2010, s. 1-4Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, a low-complexity symbol detector is presentedtargeting the emerging 3GPP LTE andWiMAX standards. The detector isthe VLSI implementation of a novel MIMO detection algorithm recentlyproposed. Compared to the design in the reference, the detector performsbetter while consumes less silicon area. Including a microcode controlledchannel preprocessing unit and a pipelined detection unit, it is flexibleenough to cover different standards and transmission schemes whilemaintaining the power and area efficiency. Implemented using 65 nmCMOS process, the detector can support real-time detection of 20 MHzbandwidth 3GPP LTE or 10 MHz WiMAX downlink physical channel.

  • 340.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Asghar, Rizwan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Nilsson, A.
    Coresonic AB, Sweden.
    Tell, E.
    Coresonic AB, Sweden.
    Alfredsson, E.
    Coresonic AB, Sweden.
    System architecture for 3GPP-LTE modem using a programmable baseband processor2010Inngår i: International Journal of Embedded and Real-Time Communication Systems, ISSN 1947-3176, E-ISSN 1947-3184, Vol. 1, nr 3, s. 44-64Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The evolution of third generation mobile communications toward high-speed packet access and long-term evolution is ongoing and will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier de-mapping. The throughput and latency requirements of a Category four User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ, which brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution. Copyright © 2010, IGI Global.

  • 341.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Programmable Lattice-Reduction Aided Detector for MIMO-OFDMA2008Inngår i: 4th IEEE International Conference on Circuits and Systems, ICCSC,2008, IEEE , 2008, s. 293-297Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents the first programmable Lattice- Reduction Aided (LRA) symbol detector for Multiple-Input Multiple-Output (MIMO) and Orthogonal Frequency Division Multiple Access (OFDMA). The detector proposed is implemented using 65 nm ASIC technologies. Owing to the programmability, the detector can be dynamically switched between linear (e.g. MMSE) and lattice-reduction aided (e.g. LRA-MMSE) detectors by simply running another software subroutine. Therefore, it allows a good trade-off between performance and computational latency to be achieved under various scenarios. Along with the hardware, two algorithm simplifications (SCNT-LR and SOT-LR) are proposed for finding subcarriers with ill- conditioned channel matrices. And in the end, interpolated LR (I- LR) is proposed to further reduce the computational complexity for real-time implementations.

  • 342.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Evaluation of MIMO Symbol Detectors for 3GPP LTE Terminals2009Inngår i: 17th European Signal Processing Conference (EUSIPCO), 2009Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper investigates various MIMO detection methods for 3GPP LTE open-loop downlink multi-antenna transmission. Targeting VLSI implementation, these detection methods are evaluated with respect to complexity and detection performance. A realistic 3GPP LTE simulation chain is developed for the evaluation. The result shows that with the aid of Hybrid Automatic Repeat reQuest (H-ARQ), a recently proposed reduced complexity close-ML detector called MFCSO achieves a good tradeoff between achievable throughput and complexity. An adaptive transmission and detection scheme is also proposed based on user scenarios.

  • 343.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Eilert, Johan
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Implementation of a High-Speed MIMO Soft-Output Symbol Detector for Software Defined Radio2011Inngår i: Journal of Signal Processing Systems, ISSN 1939-8115, Vol. 63, nr 1, s. 27-37Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a programmable MMSE soft-output MIMO symbol detector that supports 600 Mbps data rate defined in 802.11n. The detector is implemented using a multi-core floating-point processor and configurable soft-bit demapper. Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of channel processing with sufficient numerical stability for large channel matrices. When compared to several existing fixed-functional solutions, the detector proposed in this paper is smaller and faster. More important, it is programmable and configurable so that it can support various MIMO transmission schemes defined by different standards.

  • 344.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Lattice-Reduction Aided Multi-User STBC Decoding with Resource Constraints2007Inngår i: 2007 IEEE 18TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, IEEE , 2007, s. 192-196Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Recently lattice-reduction aided decoders have been proposed in MIMO system to achieve near Maximum Likelihood decoder performance while maintaining reasonable complexity. This paper studies the implementation of lattice-reduction aided linear decoders on a programmable device for multi-user space-time block coding (MU-STBC). By reloading software, the device can be configured to use different decoding schemes according to the amount of resources available, which is an important feature of cognitive radio. In this paper, two different lattice-reduction aided linear decoding methods namely SQRD-LR and AQRD-LR for MU-STBC are evaluated based on their BER performance and computational complexity. Furthermore, the effect of deadline constraint on LR is evaluated and based on the evaluation, a new method namely adaptive decoding is proposed by us to allow mode-switching of the decoder according to the environment parameters, so that the best decoder performance can always be achieved while fulfiling the resource constraints.

  • 345.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Wang, Dandan
    Al-Dhahir, Naofal
    Minn, Hlaing
    Fast Complex Valued Matrix Inversion for Multi-User STBC-MIMO Decoding2007Inngår i: IEEE Computer Society Annual Aymposium on VLSI, ISVLSI,2007, IEEE , 2007, s. 325-330Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper studies the efficient complex matrix inversion for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) and its programmable VLSI implementation are proposed for the inversion of (in this context) large complex matrices with Alamouti sub-blocks. Our solution significantly reduces the number of operations which makes it more than 4 times faster than several other solutions in the literature. Furthermore, compared to these fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware can be reused for many other operations. In addition to the routine analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix inversion.

  • 346.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Hu, Tiejun
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Single Scalar DSP based Programmable H.264 Decoder2005Inngår i: Swedish System on Chip Conference SSoCC,2005, 2005Konferansepaper (Annet vitenskapelig)
  • 347.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Hu, Tiejun
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    A Single-Issue DSP based Multi-standard Media Processor for Mobile Platform2006Inngår i: ARCS,2006, 2006Konferansepaper (Fagfellevurdert)
  • 348.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Karlström, Per
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Ehliar, Andreas
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Media DSP: An Application Specific Heterogeneous Multiprocessor SoC2006Inngår i: SSoCC Swedish System-on-Chip Conference,2006, 2006Konferansepaper (Annet vitenskapelig)
  • 349.
    Wu, Di
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Larsson, Erik G.
    Linköpings universitet, Institutionen för systemteknik, Kommunikationssystem. Linköpings universitet, Tekniska högskolan.
    Liu, Dake
    Linköpings universitet, Institutionen för systemteknik, Datorteknik. Linköpings universitet, Tekniska högskolan.
    Implementation Aspects of Fixed-Complexity Soft-Output MIMO Detection2009Inngår i: Proceedings of the 69th IEEE Vehicular Technology Conference (VTC'09), IEEE , 2009, s. 1-5Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper discusses implementation aspects of arecently proposed fixed-complexity soft-output (FCSO) symboldetector for MIMO systems [4]. A further approximation tothe FCSO detector is proposed which substantially reduces thecomplexity at the cost of a minor performance loss. With theresulting method, it is possible to carry out close-to ML detectionfor MIMO systems with a large number antennas (e.g. 4×4) usinghigher-order modulation schemes (e.g. 64-QAM) at low siliconcost in real-time. Furthermore, the parallelism inherited by theFCSO algorithm allows massive parallel processing which makesthe method suitable for implementation in multi-core basebandsignal processing hardware architectures.

  • 350.
    Wu, Di
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Li, Yi-Hsien
    Eilert, Johan
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Liu, Dake
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorteknik.
    Real-Time Space-Time Adaptive Processing on the STI CELL Multiprocessor2007Inngår i: 4th European Radar Conference,2007, IEEE , 2007, s. 71-74Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Space-time adaptive processing (STAP) has been widely used in modern radar systems such as ground moving target indication (GMTI) systems in order to suppress jamming and interference. However, its baseband signal processing part usually requires huge amount of computing power. This paper presents the real-time implementation of an STAP baseband signal processing flow on the state-of-the-art STI CELL multiprocessor which enables the concept of software-defined radar (SDR). SIMD vectorization is applied to speed-up the kernel subroutines of STAP such as the QR decomposition, forward/backward substitution and fast Fourier transform (FFT). Benchmarking results of both the kernel subroutines and the overall flow are presented. Furthmore, based on the result of earlier benchmarking, optimized task partitioning and scheduling methods are proposed by us to improve the overall performance so that the overhead is reduced to the minimum.

45678 301 - 350 of 362
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf