liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 12) Show all publications
Nunez-Yanez, J. (2024). Adaptive Quantization of Graph Convolutional Networks with Hardware-Aware On-device Training. In: 2024 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE, NORCAS: . Paper presented at 10th Nordic Circuits and Systems Conference, Lund, SWEDEN, oct 29-30, 2024. IEEE
Open this publication in new window or tab >>Adaptive Quantization of Graph Convolutional Networks with Hardware-Aware On-device Training
2024 (English)In: 2024 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE, NORCAS, IEEE , 2024Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we investigate a hardware approach for on-device training and inference targeting fully quantized graph convolutional networks (GCNs). Our proposed solution leverages a specialized hardware accelerator consisting of a streaming architecture and adaptive fixed-point numeric precision. The accelerator offers scalable performance via a variable number of independent hardware threads and compute units per thread. During training, the architecture widens the data path in the backward pass to maintain the gradient accuracy needed for backpropagation. In contrast, during the forward pass, the accelerator narrows the data path to emulate the uncertainty introduced by the quantized parameters. We use the popular Planetoid datasets to benchmark the accelerator, achieving valid precisions extending down to 1-bit for weights and features and 2-bits for adjacency. The performance gains over the Pytorch optimized software solution show more than 2 orders of magnitude for inference and 1 order of magnitude for training. A comparison with previous GCN accelerators designed for inference-only mode and based on HPC (High Performance Computing) FPGA platforms shows competitive performance.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
FPGA acceleration, graph neural network, on-device training, quantized sparse data, dataflow architecture
National Category
Computer Systems
Identifiers
urn:nbn:se:liu:diva-213075 (URN)10.1109/NorCAS64408.2024.10752483 (DOI)001444043400045 ()2-s2.0-85211959496 (Scopus ID)9798331517663 (ISBN)9798331517670 (ISBN)
Conference
10th Nordic Circuits and Systems Conference, Lund, SWEDEN, oct 29-30, 2024
Note

Funding Agencies|Wallenberg AI autonomous systems and software (WASP) program - Knut and Alice Wallenberg Foundation

Available from: 2025-04-16 Created: 2025-04-16 Last updated: 2025-11-11
Shen, Z., Nunez-Yanez, J. & Dahnoun, N. (2024). MMIDNet: Secure Human Identification Using Millimeter-wave Radar and Deep Learning. In: 2024 13TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, MECO 2024: . Paper presented at 13th Mediterranean Conference on Embedded Computing (MECO), Budva, MONTENEGRO, jun 11-14, 2024 (pp. 328-334). IEEE
Open this publication in new window or tab >>MMIDNet: Secure Human Identification Using Millimeter-wave Radar and Deep Learning
2024 (English)In: 2024 13TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, MECO 2024, IEEE , 2024, p. 328-334Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces an innovative approach using deep learning for human identification utilizing millimeter-wave (mmWave) radar technology. Unlike conventional vision methods, our approach ensures privacy and accuracy in various indoor settings. Leveraging partial PointNet, Convolutional Neural Network (CNN), and Bi-directional Long Short-Term Memory (Bi-LSTM) network components, we propose a unique neural network architecture named MMIDNet, designed to directly process point cloud data from mmWave radar. Our system achieves an impressive identification accuracy of 92.4% for 12 individuals. The research encompasses data collection, system design, and evaluation, highlighting the potential of mmWave radar combined with deep learning for secure and efficient human identification in Internet of Things (IoT) applications.

Place, publisher, year, edition, pages
IEEE, 2024
Series
Mediterranean Conference on Embedded Computing, ISSN 2377-5475, E-ISSN 2637-9511
Keywords
Millimeter-wave radar; Point cloud; Human identification; Data processing; Deep learning; IoT application
National Category
Computer Systems
Identifiers
urn:nbn:se:liu:diva-207260 (URN)10.1109/MECO62516.2024.10577920 (DOI)001268606200093 ()9798350387568 (ISBN)9798350387575 (ISBN)
Conference
13th Mediterranean Conference on Embedded Computing (MECO), Budva, MONTENEGRO, jun 11-14, 2024
Available from: 2024-09-06 Created: 2024-09-06 Last updated: 2024-11-11
Nunez-Yanez, J. (2023). Accelerating Graph Neural Networks in Pytorch With HLS and Deep Dataflows. In: Francesca Palumbo, Georgios Keramidas, Nikolaos Voros, Pedro C. Diniz (Ed.), Applied Reconfigurable Computing. Architectures, Tools, and Applications.: . Paper presented at 19th International Symposium on Applied Reconfigurable Computing (ARC) - Architectures, Tools, and Applications, Cottbus, GERMANY, sep 27-29, 2023, Proceedings (pp. 131-145). Cham: Springer, 14251
Open this publication in new window or tab >>Accelerating Graph Neural Networks in Pytorch With HLS and Deep Dataflows
2023 (English)In: Applied Reconfigurable Computing. Architectures, Tools, and Applications. / [ed] Francesca Palumbo, Georgios Keramidas, Nikolaos Voros, Pedro C. Diniz, Cham: Springer, 2023, Vol. 14251, p. 131-145Conference paper, Published paper (Refereed)
Abstract [en]

Graph neural networks (GNNs) combine sparse and densedata compute requirements that are challenging to meet in resourceconstrained embedded hardware. In this paper, we investigate a dataflowof dataflows architecture that optimizes data access and processing element utilization. The architecture is described with high-level synthesisand offers multiple configuration options including varying the number ofindependent hardware threads, the interface data width and the numberof compute units per thread. Each hardware thread uses a fine-graineddataflow to stream words with a bit-width that depends on the network precision while a coarse-grained dataflow links the thread stagesstreaming partially-computed matrix tiles. The accelerator is mappedto the programmable logic of a Zynq Ultrascale device whose processing system runs Pytorch extended with PYNQ overlays. Results basedon the citation networks show a performance gain of up to 140x withmulti-threaded hardware configurations compared with the optimizedsoftware implementation available in Pytorch. The results also show competitive performance of the embedded hardware compared with otherhigh-performance state-of-the-art hardware accelerators.

Place, publisher, year, edition, pages
Cham: Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Keywords
neural network, FPGA, sparse, HLS, GNN, Pytorch
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-199899 (URN)10.1007/978-3-031-42921-7_9 (DOI)001162213700009 ()9783031429200 (ISBN)9783031429217 (ISBN)
Conference
19th International Symposium on Applied Reconfigurable Computing (ARC) - Architectures, Tools, and Applications, Cottbus, GERMANY, sep 27-29, 2023, Proceedings
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), 308397
Note

Funding Agencies|Wallenberg AI autonomous autonomous systems and software (WASP) program - Knut and Alice Wallenberg Foundation

Available from: 2024-01-03 Created: 2024-01-03 Last updated: 2024-12-09
Nunez-Yanez, J., Otero, A. & de la Torre, E. (2023). Dynamically reconfigurable variable-precision sparse-dense matrix acceleration in Tensorflow Lite. Microprocessors and microsystems, 98, Article ID 104801.
Open this publication in new window or tab >>Dynamically reconfigurable variable-precision sparse-dense matrix acceleration in Tensorflow Lite
2023 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 98, article id 104801Article in journal (Refereed) Published
Abstract [en]

In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25% better performance in dense mode with half the DSP blocks in the same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.

Place, publisher, year, edition, pages
ELSEVIER, 2023
Keywords
Neural network, FPGA, Sparse, Pruning, Matrix multiplication acceleration, TensorFlow
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-192064 (URN)10.1016/j.micpro.2023.104801 (DOI)000954898200001 ()2-s2.0-85149058738 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), 308397
Note

Funding: Royal Society Industry fellowship MINET [INF\2\192044]; EPSRC HOPWARE [EP\040863\1]; Wallenberg AI autonomous systems and software (WASP) program - Knut and Alice Wallenberg Foundation, Sweden; Leverhurme trust international fellowship [IF-2021-003]

Available from: 2023-02-28 Created: 2023-02-28 Last updated: 2025-03-27
Wegener, S., Nikov, K., Nunez-Yanez, J. L. & Eder, K. (2023). EnergyAnalyzer: Using Static WCET Analysis Techniques to Estimate the Energy Consumption of Embedded Applications. In: : . Paper presented at 21st International Workshop on Worst-Case Execution Time Analysis (WCET 2023).
Open this publication in new window or tab >>EnergyAnalyzer: Using Static WCET Analysis Techniques to Estimate the Energy Consumption of Embedded Applications
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents EnergyAnalyzer, a code-level static analysis tool for estimating the energy consumption of embedded software based on statically predictable hardware events. The tool utilises techniques usually used for worst-case execution time (WCET) analysis together with bespoke energy models developed for two predictable architectures - the ARM Cortex-M0 and the Gaisler LEON3 - to perform energy usage analysis. EnergyAnalyzer has been applied in various use cases, such as selecting candidates for an optimised convolutional neural network, analysing the energy consumption of a camera pill prototype, and analysing the energy consumption of satellite communications software. The tool was developed as part of a larger project called TeamPlay, which aimed to provide a toolchain for developing embedded applications where energy properties are first-class citizens, allowing the developer to reflect directly on these properties at the source code level. The analysis capabilities of EnergyAnalyzer are validated across a large number of benchmarks for the two target architectures and the results show that the statically estimated energy consumption has, with a few exceptions, less than 1% difference compared to the underlying empirical energy models which have been validated on real hardware.

Keywords
Energy Modelling, Static Analysis, Gaisler LEON3, ARM Cortex-M0
National Category
Computer Systems
Identifiers
urn:nbn:se:liu:diva-196385 (URN)10.4230/OASIcs.WCET.2023.9 (DOI)
Conference
21st International Workshop on Worst-Case Execution Time Analysis (WCET 2023)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-07-27 Created: 2023-07-27 Last updated: 2024-11-11
Shen, Z., Nunez-Yanez, J. L. & Danohum, N. (2023). Multiple Human Tracking and Fall Detection Real-Time System Using Millimeter-Wave Radar and Data Fusion. In: 2023 12th Mediterranean Conference on Embedded Computing (MECO): . Paper presented at 2023 12th Mediterranean Conference on Embedded Computing (MECO), 06-10 June 2023, Budva, Montenegro. IEEE
Open this publication in new window or tab >>Multiple Human Tracking and Fall Detection Real-Time System Using Millimeter-Wave Radar and Data Fusion
2023 (English)In: 2023 12th Mediterranean Conference on Embedded Computing (MECO), IEEE, 2023Conference paper, Published paper (Refereed)
Abstract [en]

This paper investigates an indoor multiple human tracking andfall detection system based on the usage of multiple MillimeterWave radars from Texas Instruments. We propose a real-timesystem framework to merge the signals received from radars andtrack the position and body status of human objects. In order toguarantee the overall accuracy of our system, we develop novelstrategies such as dynamic DBSCAN clustering based on signalenergy levels and a possibility matrix for multiple object tracking. Our prototype system, which employs three radars placedon x-y-z surfaces, demonstrates higher accuracy than the solution in [1] (90%), with 98.5% and 98.2% accuracy in multiplehuman tracking and fall detection respectively. The accuracyreaches 99.7% for single human tracking.

Place, publisher, year, edition, pages
IEEE, 2023
Series
Mediterranean Conference on Embedded Computing (MECO), ISSN 2637-9511, E-ISSN 2377-5475
Keywords
Millimeter-Wave radar; Human fall detection; Human Activity Recognition (HAR); Data processing; Real-time system; Internet of Things (IoT) application
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-196401 (URN)10.1109/MECO58584.2023.10155097 (DOI)979-8-3503-2291-0 (ISBN)979-8-3503-2292-7 (ISBN)
Conference
2023 12th Mediterranean Conference on Embedded Computing (MECO), 06-10 June 2023, Budva, Montenegro
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-07-31 Created: 2023-07-31 Last updated: 2025-11-17
Wang, Z., Lu, J. & Nunez-Yanez, J. L. (2022). A Low-complexity FPGA TDC based on a DSP Delay Line and a Wave Union Launcher. In: 2022 25th Euromicro Conference on Digital System Design (DSD): . Paper presented at 25th Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain, 31 August 2022 - 02 September 2022 (pp. 101-108). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Low-complexity FPGA TDC based on a DSP Delay Line and a Wave Union Launcher
2022 (English)In: 2022 25th Euromicro Conference on Digital System Design (DSD), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 101-108Conference paper, Published paper (Refereed)
Abstract [en]

High-precision time-to-digital converters (TDCs) are key components for controlling quantum systems and FPGAs have gained popularity for this task thanks to their low-cost and flexibility compared with Application Specific Integrated Circuits (ASICs). This paper investigates a novel FPGA-based TDC architecture that combines a wave union launcher and delay lines constructed with DSP blocks. The configuration achieves a 8.07ps RMS resolution on a low-cost Zynq FPGA with a power usage of only 0.628W. The low power consumption is achieved thanks to a combination of operating frequency and logic resource usage that are lower than other methods, such as multi-chain DSP based TDCs and multi-chain CARRY4 based TDCs

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
EUROMICRO Conference Proceedings, ISSN 1089-6503
Keywords
FPGA, quantum, energy efficient
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-190912 (URN)10.1109/DSD57027.2022.00023 (DOI)000946536500002 ()9781665474047 (ISBN)9781665474054 (ISBN)
Conference
25th Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain, 31 August 2022 - 02 September 2022
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-01-08 Created: 2023-01-08 Last updated: 2023-04-13Bibliographically approved
Nikov, K., Georgiou, K., Chamski, Z., Eder, K. & Nunez-Yanez, J. L. (2022). Accurate Energy Modelling on the Cortex-M0 Processor for Profiling and Static Analysis. In: 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS): . Paper presented at 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Glasgow, United Kingdom, 24-26 October 2022 (pp. 1-4). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Accurate Energy Modelling on the Cortex-M0 Processor for Profiling and Static Analysis
Show others...
2022 (English)In: 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 1-4Conference paper, Published paper (Refereed)
Abstract [en]

Energy modelling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific config-urations, neither are they suitable for static energy consumption estimation. This paper introduces a set of comprehensive energy models for Arm's Cortex-M0 processor, ready to support energy-aware development of edge computing applications using either profiling- or static-analysis-based energy consumption estimation. We use a commercially representative physical platform together with a custom modified Instruction Set Simulator to obtain the physical data and system state markers used to generate the models. The models account for different processor configurations which all have a significant impact on the execution time and energy consumption of edge computing applications. Unlike existing works, which target a very limited set of applications, all developed models are generated and validated using a very wide range of benchmarks from a variety of emerging IoT application areas, including machine learning and have a prediction error of less than 5%.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
IoT, embedded systems, edge computing, energy modelling, Arm Cortex-M0
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-191020 (URN)10.1109/ICECS202256217.2022.9971086 (DOI)000913346300202 ()9781665488235 (ISBN)9781665488242 (ISBN)
Conference
2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Glasgow, United Kingdom, 24-26 October 2022
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Funding: European Union [779882]

Available from: 2023-01-15 Created: 2023-01-15 Last updated: 2023-03-02Bibliographically approved
Olgu, K., Nikov, K. & Nunez-Yanez, J. L. (2022). Analysis of Graph Processing in Reconfigurable Devices for Edge Computing Applications. In: 2022 25th Euromicro Conference on Digital System Design (DSD): . Paper presented at 25th Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain, 31 August - 02 September 2022 (pp. 16-23). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Analysis of Graph Processing in Reconfigurable Devices for Edge Computing Applications
2022 (English)In: 2022 25th Euromicro Conference on Digital System Design (DSD), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 16-23Conference paper, Published paper (Refereed)
Abstract [en]

Graph processing is an area that has received significant attention in recent years due to the substantial expansion in industries relying on data analytics. Alongside the vital role of finding relations in social networks, graph processing is also widely used in transportation to find optimal routes and biological networks to analyse sequences. The main bottleneck in graph processing is irregular memory accesses rather than computation intensity. Since computational intensity is not a driving factor, we propose a method to perform graph processing at the edge more efficiently. We believe current cloud computing solutions are still very costly and have latency issues. The results demonstrate the benefits of a dedicated sparse graph processing algorithm compared with dense graph processing when analysing data with low density. As graph datasets grow exponentially, traversal algorithms such as breadth-first search (BFS), fundamental to many graph processing applications and metrics, become more costly to compute. Our work focuses on reviewing other implementations of breadth-first search algorithms designed for low power systems and proposing our solution that utilises advanced enhancements to achieve a significant performance boost up to 9.2x better performance in terms of MTEPS compared to other state-of-the-art solutions with a power usage of 2.32W.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
EUROMICRO Conference Proceedings, ISSN 1089-6503
Keywords
graph processing, graph, FPGA, low-cost, breadth-first, search, bfs, zedboard, edge, computing
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-190906 (URN)10.1109/DSD57027.2022.00012 (DOI)000946536500010 ()9781665474047 (ISBN)9781665474054 (ISBN)
Conference
25th Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain, 31 August - 02 September 2022
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-01-07 Created: 2023-01-07 Last updated: 2023-04-13Bibliographically approved
Kong, M. & Nunez-Yanez, J. L. (2022). Entropy-Based Early-Exit in a FPGA-Based Low-Precision Neural Network. In: Gan, Lin; Wang, Yu; Xue, Wei; Chau, Thomas (Ed.), Applied Reconfigurable Computing. Architectures, Tools, and Applications: . Paper presented at 18th International Symposium, ARC 2022, September 19–20, 2022 (pp. 72-86). Springer Nature, 13569
Open this publication in new window or tab >>Entropy-Based Early-Exit in a FPGA-Based Low-Precision Neural Network
2022 (English)In: Applied Reconfigurable Computing. Architectures, Tools, and Applications / [ed] Gan, Lin; Wang, Yu; Xue, Wei; Chau, Thomas, Springer Nature, 2022, Vol. 13569, p. 72-86Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we investigate the application of early-exit strategies to fully quantized neural networks, mapped to low-complexity FPGA SoC devices. The challenge of accuracy drop with low bitwidth quantized first convolutional layer and fully connected layers has been resolved. We apply an early-exit strategy to a network model that combines weights and activation with extremely low bitwidth and binary arithmetic precision based on the ImageNet dataset. We use entropy calculations to decide which branch of the early-exit network to take. The experiments show an improvement in inferred speed of $$1.52\times $$1.52×using an early-exit system, compared with using a single primary neural network, with a slight accuracy decrease of 1.64%.

Place, publisher, year, edition, pages
Springer Nature, 2022
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13569
Keywords
FPGA, neural network, energy efficient
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-189827 (URN)10.1007/978-3-031-19983-7_6 (DOI)001424388600006 ()2-s2.0-85142716279 (Scopus ID)9783031199820 (ISBN)9783031199837 (ISBN)
Conference
18th International Symposium, ARC 2022, September 19–20, 2022
Available from: 2022-11-07 Created: 2022-11-07 Last updated: 2025-10-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5153-5481

Search in DiVA

Show all publications