Matrix transposition, the procedure of swapping rows and columns of a matrix, has applications in various signal processing applications, such as massive multiple-input multiple-output (MIMO) communication systems, data compression, and multidimensional fast Fourier transforms – which are used in MIMO radar systems. In low-latency high-throughput streaming applications, specialized circuits for matrix transposition are needed in order to perform transposition in real-time. This is in contrast to "slower" applications, where transposition can be adequately performed by storing a matrix in a shared memory and afterward reading it back in a transposed order. In this paper, a design procedure for streaming matrix transposition on field-programmable gate arrays (FPGAs) using distributed memories is presented. It is shown that significantly fewer FPGA resources are required for small- to medium-sized streaming matrix transpositions compared to recent related works.
Funding: Swedish Foundation for Strategic Research (SSF) [CHI19-0001]; Excellence Center at Linkoping-Lund in Information Technology (ELLIIT)