Creating a 3D model from photos require an estimate of the position and orientation (pose) of the camera for each photo that is acquired. This paper presents a method to estimate the camera pose using only image data. The images are acquired at a low frequency using a stereo rig, consisting of two rigidly attached SLR cameras. Features are extracted and an optimization problem is solved for each new stereo image. The results are used to merge multiple stereo images and building a larger model of the scene. The accumulated error after processing 10 images can with the present methods be less than 1.2 mm in translation and 0.1 degrees in rotation.