liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A performance-portable generic component for 2D convolution computations on GPU-based systems
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering. (PELAB)
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering. (PELAB)ORCID iD: 0000-0001-5241-0026
2012 (English)In: Proceedings of the Fifth International Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2012) at the HiPEAC-2012 conference, Paris, Jan. 2012 / [ed] E. Ayguade, B. Gaster, L. Howes, P. Stenström, O. Unsal, 2012Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we describe our work on providing a generic yet optimized GPU (CUDA/OpenCL) implementation for the 2D MapOverlap skeleton. We explain our implementation with the help of a 2D convolution application, implemented using the newly developed skeleton. The memory (constant and shared memory) and adaptive tiling optimizations are applied and their performance implications are evaluated on different classes of GPUs. We present two different metrics to calculate the optimal tiling factor dynamically in an automated way which helps in retaining best performance without manual tuning while moving to newGPU architectures. With our approach, we can achieve average speedups by a factor of 3.6, 2.3, and 2.4 over an otherwise optimized (without tiling) implementation on NVIDIA C2050, GTX280 and 8800 GT GPUs respectively. Above all, the performance portability is achieved without requiring any manual changes in the skeleton program or the skeleton implementation.

Place, publisher, year, edition, pages
2012.
Keywords [en]
convolution, stencil computation, autotuning, GPU, CUDA, parallel computing, performance optimization, adaptive tiling
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-136906OAI: oai:DiVA.org:liu-136906DiVA, id: diva2:1092016
Conference
Fifth International Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2012) at the HiPEAC-2012 conference, Paris, Jan. 2012
Projects
EU FP7 PEPPHERSeRC-OpCoReS
Funder
EU, FP7, Seventh Framework Programme, 248481Swedish e‐Science Research Center, OpCoReS
Note

Proceedings published on USB and has not an ISBN.

Available from: 2017-04-28 Created: 2017-04-28 Last updated: 2019-05-09Bibliographically approved

Open Access in DiVA

fulltext(270 kB)3 downloads
File information
File name FULLTEXT01.pdfFile size 270 kBChecksum SHA-512
a3c92f58149d8637e02e96e90736172b22bd814440fe22aa6434359e6cf3d207d3ce413dd218298865452ec817a5b7368c793ab3d76da7affc98d617a9a46169
Type fulltextMimetype application/pdf

Authority records BETA

Dastgeer, UsmanKessler, Christoph

Search in DiVA

By author/editor
Dastgeer, UsmanKessler, Christoph
By organisation
Software and SystemsFaculty of Science & Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 24 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf