Hybrid CPU-GPU Parallel Simulations of 3D Front Propagation
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
This master thesis studies GPU-enabled parallel implementations of the 3D Parallel Marching Method (PMM). 3D PMM is aimed at solving the non-linear static Jacobi-Hamilton equations, which has real world applications such as in the study of geological foldings, where each layer of the Earth’s crust is considered as a front propagating over time. Using the parallel computer architectures, fast simulationscan be achieved, leading to less time consumption, quicker understanding of the inner Earth and enables early exploration of oil and gas reserves. Currently 3D PMM is implemented in shared memory architecture using OpenMP Application Programming Interface (API) and the MINT programming model, which translates C code into Compute Unified Device Architecture (CUDA) code for a single Graphical Process Unit (GPU). Parallel architectures have seen rapid growth in recent years, especially GPUs, allowing us to do faster simulations. In this thesis work, a new parallel implementation for 3D PMM has been done to exploit multicore CPU architectures as well as single and multiple GPUs. In a multiple GPU implementation, 3D data isdecomposed into 1D data for each GPU. CUDA streams are used to overlap the computation and communication within the single GPU. Part of the decomposed 3D volume data is kept in the respective GPU to avoid complete data transfer between the GPUs over a number of iterations. In total, there are two kinds of datatransfers that are involved while doing computation in the multiple GPUs: boundary value data transfer and decomposed 3D volume data transfer. The decomposed 3D volume data transfer is optimized between the multiple GPUs by using the peer to peer memory transfer in CUDA. The speedup is shown and compared between shared memory CPUs (E5-2660, 16cores), single GPU (GTX-590, C2050 and K20m) and multiple GPUs. Hand coded CUDA has shown slightly better performance than the Mint translated CUDA, and the multiple GPU implementation showed promising speedup compared to shared memory multicore CPUs and single GPU implementations.
Place, publisher, year, edition, pages
2014. , 58 p.
Computational Mathematics Applied Mechanics Computer and Information Science
IdentifiersURN: urn:nbn:se:liu:diva-114935ISRN: LIU-IEI-TEK-A--14/02114—SEOAI: oai:DiVA.org:liu-114935DiVA: diva2:919913
Simula Research Laboratory
Subject / course
2014-12-12, Linköping University, 17:55 (English)
Cai, Xing, ProfessorKlarbring, Anders, Professor
Thore, Carl-Johan, Assistant Professor