Reinforcement learning has been applied recently more and more for the optimisation of agent behaviours. This approach became popular due to its adaptive and unsupervised learning process. One of the key ideas of this approach is to estimate the value of agent states. For huge state spaces however, it is difficult to implement this approach. As a result, various models were proposed which make use of function approximators, such as neural networks, to solve this problem. This paper focuses on an implementation of value estimation with a particular class of neural networks, known as self organizing maps. Experiments with an agent moving in a gridworld and the autonomous robot Khepera have been carried out to show the benefit of our approach. The results clearly show that the conventional approach, done by an implementation of a look-up table to represent the value function, can be out performed in terms of memory usage and convergence speed.