Image convolution is a common algorithm that can be found in most graphics editors. It is used to filter images by multiplying and adding pixel values with coefficients in a filter kernel. Previous research work have implemented this algorithm on different platforms, such as FPGAs, CUDA, C etc. The performance of these implementations have then been compared against each other. When the algorithm has been implemented on an FPGA it has almost always been with a single convolution. The goal of this thesis was to investigate and in the end present one possible way to implement the algorithm with 16 parallel convolutions on a Xilinx Spartan 6 LX9 FPGA and then compare the performance with results from previous work. The final system performs better than multi-threaded implementations on both a GPU and CPU.