liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
A Skeleton Programming Library for Multicore CPU and Multi-GPU Systems
Linköping University, Department of Computer and Information Science.
2010 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This report presents SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP back end. It also supports multi-GPU systems.

Benchmarks show that copying data between the host and the GPU is often a bottleneck. Therefore a container which uses lazy memory copying has been implemented to avoid unnecessary memory transfers.

SkePU was evaluated with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. The best performance gains are received when the computation load is large compared to memory I/O (the lazy memory copying can help to achieve this). We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to ten times faster run times when using SkePU with a GPU back end compared to a sequential solver running on a fast CPU.

From the benchmarks we can conclude that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. SkePU does however have some disadvantages too; there is some overhead in using the library which we can see from the dot product and LibSolve benchmarks. Although not big, it is still there and if performance is of uttermost importance, then a hand coded solution would be best. One cannot express all calculations in terms of skeletons either, if one have such a problem, specialized routines must still be created.

Place, publisher, year, edition, pages
2010. , 103 p.
Keyword [en]
CUDA, OpenCL, Skeleton Programming, Parallel Computing, Data Parallelism
National Category
Computer Science
URN: urn:nbn:se:liu:diva-60319ISRN: LIU-IDA/LITH-EX-A--10/037--SEOAI: diva2:356176
2010-09-20, Donald Knuth, Linköpings universitet 581 83, Linköping, 15:00 (English)
Available from: 2010-10-12 Created: 2010-10-11 Last updated: 2010-10-12Bibliographically approved

Open Access in DiVA

fulltext(942 kB)308 downloads
File information
File name FULLTEXT01.pdfFile size 942 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Enmyren, Johan
By organisation
Department of Computer and Information Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 308 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 378 hits
ReferencesLink to record
Permanent link

Direct link