liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite
Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-5153-5481
2022 (English)In: IEEE Micro, ISSN 0272-1732, E-ISSN 1937-4143, IEEE Micro, ISSN 0272-1732, Vol. 42, no 6, p. 55-66Article in journal (Refereed) Published
Abstract [en]

In this paper we present a hardware architecture optimized for sparse and dense matrix processing in TensorFlow Lite and compatible with embedded-heterogeneous devices that integrate CPU and FPGA resources. The FADES (Fused Architecture for DEnse and Sparse matrices) design offers multiple configuration options that trade-off parallelism and complexity and uses a dataflow model to create four stages that read, compute, scale and write results. All stages are designed to support TensorFlow Lite operations including asymmetric quantized activations, column-major matrix write, per-filter/per-axis bias values and current scaling specifications. The configurable accelerator is integrated with the TensorFlow Lite inference engine running on the ARMv8 processor. We compare performance/power/energy with the state-of-the-art RUY software multiplication library showing up to 18x acceleration and 48x in dense and sparse modes respectively. The sparse mode benefits from structural pruning to fully utilize the DSP blocks present in the FPGA device.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC , 2022. Vol. 42, no 6, p. 55-66
Keywords [en]
Sparse matrices, Computer architecture, Hardware, Field programmable gate arrays, Parallel processing, Table lookup, Training
National Category
Embedded Systems Computer Systems
Identifiers
URN: urn:nbn:se:liu:diva-188831DOI: 10.1109/MM.2022.3196705ISI: 000878171600016OAI: oai:DiVA.org:liu-188831DiVA, id: diva2:1699269
Note

Funding: Royal Society Industry fellowship [INF\192044]; EPSRC HOPWARE [EP040863\1]; Leverhurme trust international fellowship Highperformance video analytics with parallel heterogeneous neural networks [IF-2021-003]

Available from: 2022-09-27 Created: 2022-09-27 Last updated: 2022-11-23

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Nunez-Yanez, Jose Luis

Search in DiVA

By author/editor
Nunez-Yanez, Jose Luis
By organisation
Computer EngineeringFaculty of Science & Engineering
In the same journal
IEEE Micro
Embedded SystemsComputer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 164 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf