liu.seSearch for publications in DiVA
ReferencesLink to record
Permanent link

Direct link
A Skeleton Programming Library for Multicore CPU and Multi-GPU Systems
2010 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudentuppsats (Examensarbete)
Abstract [en]

This report presents SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP back end. It also supports multi-GPU systems.

Benchmarks show that copying data between the host and the GPU is often a bottleneck. Therefore a container which uses lazy memory copying has been implemented to avoid unnecessary memory transfers.

SkePU was evaluated with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. The best performance gains are received when the computation load is large compared to memory I/O (the lazy memory copying can help to achieve this). We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to ten times faster run times when using SkePU with a GPU back end compared to a sequential solver running on a fast CPU.

From the benchmarks we can conclude that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. SkePU does however have some disadvantages too; there is some overhead in using the library which we can see from the dot product and LibSolve benchmarks. Although not big, it is still there and if performance is of uttermost importance, then a hand coded solution would be best. One cannot express all calculations in terms of skeletons either, if one have such a problem, specialized routines must still be created.

Place, publisher, year, pages
2010. 103 p.
Keyword [en]
CUDA, OpenCL, Skeleton Programming, Parallel Computing, Data Parallelism
National Category
Computer Science
Identifiers
urn:nbn:se:liu:diva-60319 (URN)LIU-IDA/LITH-EX-A--10/037--SE (ISRN)oai:DiVA.org:liu-60319 (OAI)
Presentation
2010-09-20, Donald Knuth, Linköpings universitet 581 83, Linköping, 15:00 (English)
Uppsok
Technology
Supervisors
Examiners
Available from2010-10-12 Created:2010-10-11 Last updated:2010-10-12Bibliographically approved

Open Access in DiVA

fulltext(942 kB)234 downloads
File information
File name FULLTEXT01.pdfFile size 942 kBChecksum SHA-512
b1bcc468619cacdb232a355948751e8a0faceae449a68b58cb3517f836cda98903f76386335554dce5229253a1ae1bea5c0b0438091016c0ee53c17180f16cfb
Typ fulltextMimetype application/pdf

Search in DiVA

By author/editor
Enmyren, Johan
By organisation
Department of Computer and Information Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Totalt: 234 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available
Totalt: 203 hits
ReferencesLink to record
Permanent link

Direct link