Personnel positioning is important for safety in e.g. emergency response operations. In GPS-denied environments, possible positioning solutions include systems based on radio frequency communication, inertial sensors, and cameras. Many camera-based systems create a map and localize themselves relative to that. The computational complexity of most such solutions grows rapidly with the size of the map. One way to reduce the complexity is to divide the visited region into submaps. This paper presents a novel method for merging conditionally independent submaps (generated using e.g. EKF-SLAM) by the use of smoothing. Using this approach it is possible to build large maps in close to linear time. The method is demonstrated in two indoor scenarios, where data was collected with a trolley-mounted stereo vision camera.