Learning to recognize common activities such as traffic activities and robot behavior is an important and challenging problem related both to AI and robotics. We propose an unsupervised approach that takes streams of observations of objects as input and learns a probabilistic representation of the observed spatio-temporal activities and their causal relations. The dynamics of the activities are modeled using sparse Gaussian processes and their causal relations using a probabilistic graph. The learned model supports in limited form both estimating the most likely current activity and predicting the most likely future activities. The framework is evaluated by learning activities in a simulated traffic monitoring application and by learning the flight patterns of an autonomous quadcopter.