Image registration is frequently used within the medical image domain and where methods with high performance are required. The need for high accuracy coupled with high speed is especially important for applications such as adaptive radiation therapy and image-guided surgery. During the last years, a number of significant projects have been introduced to make the computational power of GPUs available to a wider audience. The most well known project is the introduction of CUDA (Compute Unified Device Architecture). In this paper, we present a CUDA based GPU implementation of a non-rigid image registration algorithm, known as the Morphon, and compare it with a CPU implementation of the Morphon. The achieved speedup, in the range of 51-54x, is also compared with speedups reported from other non-rigid registration methods mplemented on the GPU. These include the Demons algorithm and a mutual information based algorithm.