LiU Electronic Press
Download:
File size:
942 kb
Format:
application/pdf
Author:
Enmyren, Johan (Linköping University, Department of Computer and Information Science)
Title:
A Skeleton Programming Library for Multicore CPU and Multi-GPU Systems
Department:
Linköping University, Department of Computer and Information Science
Publication type:
Student thesis
Language:
English
Level:
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Undergraduate subject:
Computer and information science at the Institute of Technology
Uppsok:
Technology
Pages:
103
Year of publ.:
2010
URI:
urn:nbn:se:liu:diva-60319
Permanent link:
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-60319
ISRN:
LIU-IDA/LITH-EX-A--10/037--SE
Subject category:
Computer Science
SVEP category:
Computer science
Keywords(en) :
CUDA, OpenCL, Skeleton Programming, Parallel Computing, Data Parallelism
Abstract(en) :

This report presents SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP back end. It also supports multi-GPU systems.

Benchmarks show that copying data between the host and the GPU is often a bottleneck. Therefore a container which uses lazy memory copying has been implemented to avoid unnecessary memory transfers.

SkePU was evaluated with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. The best performance gains are received when the computation load is large compared to memory I/O (the lazy memory copying can help to achieve this). We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to ten times faster run times when using SkePU with a GPU back end compared to a sequential solver running on a fast CPU.

From the benchmarks we can conclude that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. SkePU does however have some disadvantages too; there is some overhead in using the library which we can see from the dot product and LibSolve benchmarks. Although not big, it is still there and if performance is of uttermost importance, then a hand coded solution would be best. One cannot express all calculations in terms of skeletons either, if one have such a problem, specialized routines must still be created.

Presentation:
2010-09-20, Donald Knuth, Linköpings universitet 581 83, Linköping, 15:00 (English)
Supervisor:
Kessler, Christoph, Professor (Linköping University, Department of Computer and Information Science, PELAB - Programming Environment Laboratory)
Examiner:
Kessler, Christoph (Linköping University, Department of Computer and Information Science, PELAB - Programming Environment Laboratory)
Available from:
2010-10-12
Created:
2010-10-11
Last updated:
2010-10-12
Statistics:
200 hits
FILE INFORMATION
File size:
942 kb
Mimetype:
application/pdf
Type:
fulltext
Statistics:
231 hits