liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology. (PELAB)
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology. (PELAB)
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology. (PELAB)ORCID iD: 0000-0001-5241-0026
2013 (English)In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA-2013),, 2013Conference paper, Published paper (Refereed)
Abstract [en]

SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. SkePU provides programmability, portability and even performance portability, but up to now application written using SkePU could only run on a single multi-GPU node. We present the extension of SkePU for GPU clusters without the need to modify the SkePU application source code. With our prototype implementation, we performed two experiments. The first experiment demonstrates the scalability with regular algorithms for N-body simulation and electric field calculation over multiple GPU nodes. The results for the second experiment show the benefit of lazy memory copying in terms of speedup gained for one level of Strassen’s algorithm and another synthetic matrix sum application.

Place, publisher, year, edition, pages
2013.
Keyword [en]
structured parallel programming, skeleton programming, GPU cluster, SkePU, scalability, MPI, scientific applications
National Category
Computer Science
Identifiers
URN: urn:nbn:se:liu:diva-102578OAI: oai:DiVA.org:liu-102578DiVA: diva2:679355
Conference
International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA-2013), Las Vegas, USA, July 2013
Projects
SeRC - OpCoReSEU FP7 PEPPHER
Funder
Swedish e‐Science Research Center, OpCoReSEU, FP7, Seventh Framework Programme, 248481
Available from: 2013-12-15 Created: 2013-12-15 Last updated: 2014-10-08

Open Access in DiVA

No full text

Other links

Author version (PDF)

Authority records BETA

Majeed, MudassarDastgeer, UsmanKessler, Christoph

Search in DiVA

By author/editor
Majeed, MudassarDastgeer, UsmanKessler, Christoph
By organisation
Software and SystemsThe Institute of Technology
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 73 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf