liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Hybrid CPU-GPU Parallel Simulations of 3D Front Propagation
Linköping University, Department of Mechanical Engineering, Solid Mechanics.
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This master thesis studies GPU-enabled parallel implementations of the 3D Parallel Marching Method (PMM). 3D PMM is aimed at solving the non-linear static Jacobi-Hamilton equations, which has real world applications such as in the study of geological foldings, where each layer of the Earth’s crust is considered as a front propagating over time. Using the parallel computer architectures, fast simulationscan be achieved, leading to less time consumption, quicker understanding of the inner Earth and enables early exploration of oil and gas reserves. Currently 3D PMM is implemented in shared memory architecture using OpenMP Application Programming Interface (API) and the MINT programming model, which translates C code into Compute Unified Device Architecture (CUDA) code for a single Graphical Process Unit (GPU). Parallel architectures have seen rapid growth in recent years, especially GPUs, allowing us to do faster simulations. In this thesis work, a new parallel implementation for 3D PMM has been done to exploit multicore CPU architectures as well as single and multiple GPUs. In a multiple GPU implementation, 3D data isdecomposed into 1D data for each GPU. CUDA streams are used to overlap the computation and communication within the single GPU. Part of the decomposed 3D volume data is kept in the respective GPU to avoid complete data transfer between the GPUs over a number of iterations. In total, there are two kinds of datatransfers that are involved while doing computation in the multiple GPUs: boundary value data transfer and decomposed 3D volume data transfer. The decomposed 3D volume data transfer is optimized between the multiple GPUs by using the peer to peer memory transfer in CUDA. The speedup is shown and compared between shared memory CPUs (E5-2660, 16cores), single GPU (GTX-590, C2050 and K20m) and multiple GPUs. Hand coded CUDA has shown slightly better performance than the Mint translated CUDA, and the multiple GPU implementation showed promising speedup compared to shared memory multicore CPUs and single GPU implementations.

Place, publisher, year, edition, pages
2014. , 58 p.
National Category
Computational Mathematics Applied Mechanics Computer and Information Science
URN: urn:nbn:se:liu:diva-114935ISRN: LIU-IEI-TEK-A--14/02114—SEOAI: diva2:919913
External cooperation
Simula Research Laboratory
Subject / course
Solid Mechanics
2014-12-12, Linköping University, 17:55 (English)
Available from: 2016-05-10 Created: 2015-03-05 Last updated: 2016-05-10Bibliographically approved

Open Access in DiVA

fulltext(4089 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 4089 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Solid Mechanics
Computational MathematicsApplied MechanicsComputer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 66 hits
ReferencesLink to record
Permanent link

Direct link