Optimized Pipelined Parallel Merge Sort on the Cell BE
2008 (English)In: 2nd Int. Workshop on Highly Parallel Processing on a Chip HPPC-2008,2008, Berlin: Springer , 2008Conference paper (Refereed)
Chip multiprocessors designed for streaming applications such as Cell BE offer impressive peak performance but suffer from limited bandwidth to off-chip main memory.
As the number of cores is expected to rise further, this bottleneck will become more critical in the coming years.
Hence, memory-efficient algorithms are required.
As a case study, we investigate parallel sorting on Cell BE as a problem of great importance and as a challenge where the ratio between computation and memory transfer is very low.
Our previous work led to a parallel mergesort that reduces memory bandwidth requirements by pipelining between SPEs, but the allocation of SPEs was rather ad-hoc.
In our present work, we investigate mappings of merger nodes to SPEs.
The mappings are designed to provide optimal trade-offs between load balancing, buffer memory consumption, and communication load on the on-chip bus.
We solve this multi-objective optimization problem by deriving an integer linear programming formulation and
compute Pareto-optimal solutions for the mapping of merge trees with up to 127 merger nodes.
For mapping larger trees, we give a fast divide-and-conquer based approximation algorithm.
We evaluate the sorting algorithm resulting from our
mappings by a discrete event simulation.
Place, publisher, year, edition, pages
Berlin: Springer , 2008.
Parallel computing, multicore processor, computer architecture, algorithm engineering, parallel sorting, integer linear programming
IdentifiersURN: urn:nbn:se:liu:diva-43703Local ID: 74568OAI: oai:DiVA.org:liu-43703DiVA: diva2:264563