Finding the geometrical state of an object from a single 2D image is of major importance for a lot of future applications in industrial automation such as bin picking and expert systems for augmented reality as well as a whole range of consumer products including toys and house-hold appliances. Previous research in this field has showed that there are a number of steps that need to fulfill a minimum level of functionality to make the whole system operational all the way from image to pose estimate. Important properties of a real-world system for pose estimation is robustness against changes in scale, lighting condition and occlusion. Robustness to scale is usually solved by some kind scale-space approach [9], but there are so far no really good ways to achieve robustness to lighting changes and occlusion. Occlusion is usually handled by using local features which is done here also. The local feature and the framework for pose estimation presented here has been tested in a setting that is constrained to the case of knowing what object to look for, but with no information on the state of the object. The inspiration to the work presented here comes from active vision and the idea of using steerable sensors with a foveal sampling around each point of interest [11]. Each point of interest detected in this work can be seen as a point of fixation for a steerable camera that then uses foveal sampling as a means of concentrating processing in the area close to that point.