The problem of estimating the position and orientation (pose) of a camera is approached by fusing measurements from inertial sensors (accelerometers and rate gyroscopes) and a camera. The sensor fusion approach described in this contribution is based on nonlinear filtering using the measurements from these complementary sensors. This way, accurate and robust pose estimates are available for the primary purpose of augmented reality applications, but with the secondary effect of reducing computation time and improving the performance in vision processing. A real-time implementation of a nonlinear filter is described, using a dynamic model for the 22 states, where 100 Hz inertial measurements and 12.5 Hz vision measurements are processed. An example where an industrial robot is used to move the sensor unit, possessing almost perfect precision and repeatability, is presented. The results show that position and orientation accuracy is sufficient for a number of augmented reality applications.