liu.seSearch for publications in DiVA
Change search
Refine search result
1 - 9 of 9
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Nunez-Yanez, Jose Luis
    et al.
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Otero, Andres
    Centro de Electronica Industrial, UPM, Madrid, Spain.
    de la Torre, Eduardo
    Centro de Electronica Industrial, UPM, Madrid, Spain.
    Dynamically reconfigurable variable-precision sparse-dense matrix acceleration in Tensorflow Lite2023In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 98, article id 104801Article in journal (Refereed)
    Abstract [en]

    In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25% better performance in dense mode with half the DSP blocks in the same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.

    Download full text (pdf)
    fulltext
  • 2.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    EnergyAnalyzer: Using Static WCET Analysis Techniques to Estimate the Energy Consumption of Embedded Applications2023Conference paper (Refereed)
    Abstract [en]

    This paper presents EnergyAnalyzer, a code-level static analysis tool for estimating the energy consumption of embedded software based on statically predictable hardware events. The tool utilises techniques usually used for worst-case execution time (WCET) analysis together with bespoke energy models developed for two predictable architectures - the ARM Cortex-M0 and the Gaisler LEON3 - to perform energy usage analysis. EnergyAnalyzer has been applied in various use cases, such as selecting candidates for an optimised convolutional neural network, analysing the energy consumption of a camera pill prototype, and analysing the energy consumption of satellite communications software. The tool was developed as part of a larger project called TeamPlay, which aimed to provide a toolchain for developing embedded applications where energy properties are first-class citizens, allowing the developer to reflect directly on these properties at the source code level. The analysis capabilities of EnergyAnalyzer are validated across a large number of benchmarks for the two target architectures and the results show that the statically estimated energy consumption has, with a few exceptions, less than 1% difference compared to the underlying empirical energy models which have been validated on real hardware.

  • 3.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Z. Shen, J. Nunez-Yanez and N. Dahnoun, "Multiple Human Tracking and Fall Detection Real-Time System Using Millimeter-Wave Radar and Data Fusion,"2023Conference paper (Refereed)
    Abstract [en]

    This paper investigates an indoor multiple human tracking andfall detection system based on the usage of multiple MillimeterWave radars from Texas Instruments. We propose a real-timesystem framework to merge the signals received from radars andtrack the position and body status of human objects. In order toguarantee the overall accuracy of our system, we develop novelstrategies such as dynamic DBSCAN clustering based on signalenergy levels and a possibility matrix for multiple object tracking. Our prototype system, which employs three radars placedon x-y-z surfaces, demonstrates higher accuracy than the solution in [1] (90%), with 98.5% and 98.2% accuracy in multiplehuman tracking and fall detection respectively. The accuracyreaches 99.7% for single human tracking.

  • 4.
    Wang, Zijie
    et al.
    University of Bristol, UK.
    Lu, Jiajun
    University of Bristol, UK.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    A Low-complexity FPGA TDC based on a DSP Delay Line and a Wave Union Launcher2022In: 2022 25th Euromicro Conference on Digital System Design (DSD), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 101-108Conference paper (Refereed)
    Abstract [en]

    High-precision time-to-digital converters (TDCs) are key components for controlling quantum systems and FPGAs have gained popularity for this task thanks to their low-cost and flexibility compared with Application Specific Integrated Circuits (ASICs). This paper investigates a novel FPGA-based TDC architecture that combines a wave union launcher and delay lines constructed with DSP blocks. The configuration achieves a 8.07ps RMS resolution on a low-cost Zynq FPGA with a power usage of only 0.628W. The low power consumption is achieved thanks to a combination of operating frequency and logic resource usage that are lower than other methods, such as multi-chain DSP based TDCs and multi-chain CARRY4 based TDCs

  • 5.
    Nikov, Kris
    et al.
    University of Bristol, UK.
    Georgiou, Kyriakos
    University of Bristol, UK.
    Chamski, Zbigniew
    University of Bristol, UK.
    Eder, Kerstin
    University of Bristol, UK.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Accurate Energy Modelling on the Cortex-M0 Processor for Profiling and Static Analysis2022In: 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 1-4Conference paper (Refereed)
    Abstract [en]

    Energy modelling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific config-urations, neither are they suitable for static energy consumption estimation. This paper introduces a set of comprehensive energy models for Arm's Cortex-M0 processor, ready to support energy-aware development of edge computing applications using either profiling- or static-analysis-based energy consumption estimation. We use a commercially representative physical platform together with a custom modified Instruction Set Simulator to obtain the physical data and system state markers used to generate the models. The models account for different processor configurations which all have a significant impact on the execution time and energy consumption of edge computing applications. Unlike existing works, which target a very limited set of applications, all developed models are generated and validated using a very wide range of benchmarks from a variety of emerging IoT application areas, including machine learning and have a prediction error of less than 5%.

  • 6.
    Olgu, Kaan
    et al.
    Dept. Electrical & Electronics Engineering, University of Bristol, Bristol, UK.
    Nikov, Kris
    Dept. Electrical & Electronics Engineering, University of Bristol, Bristol, UK.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Analysis of Graph Processing in Reconfigurable Devices for Edge Computing Applications2022In: 2022 25th Euromicro Conference on Digital System Design (DSD), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 16-23Conference paper (Refereed)
    Abstract [en]

    Graph processing is an area that has received significant attention in recent years due to the substantial expansion in industries relying on data analytics. Alongside the vital role of finding relations in social networks, graph processing is also widely used in transportation to find optimal routes and biological networks to analyse sequences. The main bottleneck in graph processing is irregular memory accesses rather than computation intensity. Since computational intensity is not a driving factor, we propose a method to perform graph processing at the edge more efficiently. We believe current cloud computing solutions are still very costly and have latency issues. The results demonstrate the benefits of a dedicated sparse graph processing algorithm compared with dense graph processing when analysing data with low density. As graph datasets grow exponentially, traversal algorithms such as breadth-first search (BFS), fundamental to many graph processing applications and metrics, become more costly to compute. Our work focuses on reviewing other implementations of breadth-first search algorithms designed for low power systems and proposing our solution that utilises advanced enhancements to achieve a significant performance boost up to 9.2x better performance in terms of MTEPS compared to other state-of-the-art solutions with a power usage of 2.32W.

  • 7.
    Kong, Minxuan
    et al.
    University of Bristol, Bristol, UK.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Entropy-Based Early-Exit in a FPGA-Based Low-Precision Neural Network2022In: Applied Reconfigurable Computing. Architectures, Tools, and Applications / [ed] Gan, Lin; Wang, Yu; Xue, Wei; Chau, Thomas, Springer Nature, 2022, Vol. 13569, p. 72-86Conference paper (Refereed)
    Abstract [en]

    In this paper, we investigate the application of early-exit strategies to fully quantized neural networks, mapped to low-complexity FPGA SoC devices. The challenge of accuracy drop with low bitwidth quantized first convolutional layer and fully connected layers has been resolved. We apply an early-exit strategy to a network model that combines weights and activation with extremely low bitwidth and binary arithmetic precision based on the ImageNet dataset. We use entropy calculations to decide which branch of the early-exit network to take. The experiments show an improvement in inferred speed of $$1.52\times $$1.52×using an early-exit system, compared with using a single primary neural network, with a slight accuracy decrease of 1.64%.

    Download full text (pdf)
    fulltext
  • 8.
    Kong, Minxuan
    et al.
    Dept. Electrical and Electronic Engineering, University of Bristol, Bristol, The United Kingdom.
    Nikov, Kris
    Dept. Electrical and Electronic Engineering, University of Bristol, Bristol, The United Kingdom.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Evaluation of Early-exit Strategies in Low-cost FPGA-based Binarized Neural Networks2022In: 2022 25th Euromicro Conference on Digital System Design (DSD), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 01-08Conference paper (Refereed)
    Abstract [en]

    In this paper, we investigate the application of early-exit strategies to quantized neural networks with binarized weights, mapped to low-cost FPGA SoC devices. The increasing complexity of network models means that hardware reuse and heterogeneous execution are needed and this opens the opportunity to evaluate the prediction confidence level early on. We apply the early-exit strategy to a network model suitable for ImageNet classification that combines weights with floating-point and binary arithmetic precision. The experiments show an improvement in inferred speed of around 20% using an early-exit network, compared with using a single primary neural network, with a negligible accuracy drop of 1.56%.

  • 9.
    Nunez-Yanez, Jose Luis
    Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.
    Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite2022In: IEEE Micro, ISSN 0272-1732, E-ISSN 1937-4143, IEEE Micro, ISSN 0272-1732, Vol. 42, no 6, p. 55-66Article in journal (Refereed)
    Abstract [en]

    In this paper we present a hardware architecture optimized for sparse and dense matrix processing in TensorFlow Lite and compatible with embedded-heterogeneous devices that integrate CPU and FPGA resources. The FADES (Fused Architecture for DEnse and Sparse matrices) design offers multiple configuration options that trade-off parallelism and complexity and uses a dataflow model to create four stages that read, compute, scale and write results. All stages are designed to support TensorFlow Lite operations including asymmetric quantized activations, column-major matrix write, per-filter/per-axis bias values and current scaling specifications. The configurable accelerator is integrated with the TensorFlow Lite inference engine running on the ARMv8 processor. We compare performance/power/energy with the state-of-the-art RUY software multiplication library showing up to 18x acceleration and 48x in dense and sparse modes respectively. The sparse mode benefits from structural pruning to fully utilize the DSP blocks present in the FPGA device.

1 - 9 of 9
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf