liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A performance-portable generic component for 2D convolution computations on GPU-based systems
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering. (PELAB)
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering. (PELAB)ORCID iD: 0000-0001-5241-0026
2012 (English)In: Proceedings of the Fifth International Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2012) at the HiPEAC-2012 conference, Paris, Jan. 2012 / [ed] E. Ayguade, B. Gaster, L. Howes, P. Stenström, O. Unsal, 2012Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we describe our work on providing a generic yet optimized GPU (CUDA/OpenCL) implementation for the 2D MapOverlap skeleton. We explain our implementation with the help of a 2D convolution application, implemented using the newly developed skeleton. The memory (constant and shared memory) and adaptive tiling optimizations are applied and their performance implications are evaluated on different classes of GPUs. We present two different metrics to calculate the optimal tiling factor dynamically in an automated way which helps in retaining best performance without manual tuning while moving to newGPU architectures. With our approach, we can achieve average speedups by a factor of 3.6, 2.3, and 2.4 over an otherwise optimized (without tiling) implementation on NVIDIA C2050, GTX280 and 8800 GT GPUs respectively. Above all, the performance portability is achieved without requiring any manual changes in the skeleton program or the skeleton implementation.

Place, publisher, year, edition, pages
2012.
Keyword [en]
convolution, stencil computation, autotuning, GPU, CUDA, parallel computing, performance optimization, adaptive tiling
National Category
Computer Science
Identifiers
URN: urn:nbn:se:liu:diva-136906OAI: oai:DiVA.org:liu-136906DiVA: diva2:1092016
Conference
Fifth International Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2012) at the HiPEAC-2012 conference, Paris, Jan. 2012
Projects
EU FP7 PEPPHERSeRC-OpCoReS
Funder
EU, FP7, Seventh Framework Programme, 248481Swedish e‐Science Research Center, OpCoReS
Available from: 2017-04-28 Created: 2017-04-28 Last updated: 2017-05-04Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Dastgeer, UsmanKessler, Christoph
By organisation
Software and SystemsFaculty of Science & Engineering
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

Total: 6 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf