liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
VectorPU: A Generic and Efficient Data-container and Component Model for Transparent Data Transfer on GPU-based Heterogeneous Systems
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering. (PELAB)ORCID iD: 0000-0001-8976-0484
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering. (PELAB)ORCID iD: 0000-0001-5241-0026
2017 (English)In: Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms  (PARMA-DITAM'17), Association for Computing Machinery (ACM), 2017, p. 7-12Conference paper, Published paper (Refereed)
Abstract [en]

We present VectorPU, a C++ based programming framework providing high-level and efficient unified memory access on heterogeneous systems, in particular GPU-based systems. VectorPU consists of a light-weight runtime library providing a generic, "smart" data-container abstraction for transparent software caching of array operands with programmable memory coherence, and a light-weight component model realized by macro-based data access annotations. VectorPU thereby enables a flexible unified memory view with data transfer and device memory management abstracted away from programmers, while keeping the efficiency of expert-written code with manual data movement and memory management. We provide a prototype of VectorPU for (CUDA) GPU-based systems, and show that it can achieve 1.40x to 13.29x speedup over good quality code using Nvidia's Unified Memory by experiments on several machines ranging from laptops to supercomputer nodes, with Kepler and Maxwell GPUs. We also show the expressiveness and wide applicability of VectorPU, and its low overhead and equal efficiency compared to expert-written code.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2017. p. 7-12
Keywords [en]
heterogeneous computing, programming model, flow signature, programming framework, run-time system, memory coherence management, software caching, VectorPU, GPGPU, CUDA, unified memory
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-168605DOI: 10.1145/3029580.3029582ISBN: 9781450348775 (print)OAI: oai:DiVA.org:liu-168605DiVA, id: diva2:1461472
Conference
8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM'17), Stockholm, Sweden, Jan. 2017
Funder
Swedish e‐Science Research Center, PSDEAvailable from: 2020-08-26 Created: 2020-08-26 Last updated: 2020-08-27

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Kessler, Christoph

Search in DiVA

By author/editor
Li, LuKessler, Christoph
By organisation
Software and SystemsFaculty of Science & Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 4 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf