Large Matrix Multiplication on a Novel Heterogeneous Parallel DSP Architecture
2009 (English)In: ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, Springer Berlin/Heidelberg, 2009, 408-419 p.Conference paper (Refereed)
This paper introduces a novel master-multi-SIMD on-chip multi-core architecture for embedded signal processing. The parallel architecture and its memory subsystem are described in this paper. We evaluate the large size matrix multiplication performance on this parallel architecture and compare it with a SIMD-extended data parallel architecture. We also examine how well the new architecture scales for different numbers of SIMD co-processors. The experimental results show that the ePUMA architecture's memory subsystem can effectively hide the data access overhead. With its 8-way SIMD data path and multi-SIMD parallel execution, the ePUMA architecture improves the performance of matrix multiplication with a speedup of 45x from the conventional SIMD extension.
Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2009. 408-419 p.
, Lecture Notes in Computer Science, ISSN 0302-9743
ePUMA, matrix multiplication, parallel DSP, SIMD, vector memory, permutation
Engineering and Technology
IdentifiersURN: urn:nbn:se:liu:diva-50678DOI: 10.1007/978-3-642-03644-6_32ISBN: 978-3-642-03643-9 (print)ISBN: 978-3-642-03644-6 (online)OAI: oai:DiVA.org:liu-50678DiVA: diva2:271901
8th International Symposium on Advanced Parallel Processing Technologies