Machine learning models are, just as us humans, exposed to the uncertainty of the world. Following the complexity of real-world events, these models are often employed for prediction tasks where there is no single, ground-truth answer, meaning that it may be impossible to determine the precise outcome of the predicted event beforehand. This aleatoric uncertainty is potentially, but not necessarily, a result of the event in question being part of a larger system, where some information remains undisclosed.
Moreover, machine learning models are data-driven and typically learn everything they know from data, called training data. The quality of the training data is vital in deter-mining the extent of a machine learning model’s knowledge and, consequently, how well the model performs on a given task. For instance, when training data is limited, this can result in uncertainty originating from a lack knowledge, often referred to as epistemic uncertainty. Furthermore, collected through observation, or measurements, of real-world events, the training data naturally incorporates the uncertainty inherent to these events. Some-times, additional uncertainty is integrated through the processes used to acquire the data, following, for instance, measurement error or human error. One such type of uncertainty is in this thesis termed annotation uncertainty, and relates to the collection of annotations for training models through supervised learning.
The focus of this thesis lies on probabilistic predictive machine learning models, as an approach to representing different sources of so-called predictive uncertainty, including aleatoric, epistemic and annotation uncertainty. Special attention is given to annotation uncertainty, beginning with an exploration of possible negative effects of this type of uncertainty on the performance of probabilistic predictive models. We analyse how annotation uncertainty, or noise, affects the properties of asymptotic risk minimisers when training models with two different classes of loss functions: strictly proper and a group of previously proposed robust loss functions. The analysis emphasises the importance of considering a model’s ability to accurately estimate predictive uncertainty, also referred to as the model’s reliability, when developing training algorithms robust to annotation noise.
However, under the umbrella of weak supervision, we also provide two examples of when annotation uncertainty can be allowed, to instead benefit model performance. In the first example, we use ensemble models to generate annotations for the training data, with the aim to teach individual probabilistic models to estimate both aleatoric and epistemic uncertainty in their predictions. Having this ability is beneficial in many applications, one of them being active learning, and, notably, the active learning algorithm constituting the second example. This specific active learning algorithm acquires data samples based on high epistemic uncertainty, believed to represent samples for which there is much gain to be made in terms of model performance. The contribution does not lie in the particular approach to acquiring data samples, but instead in introducing the possibility to make a trade-off between annotation costs and quality of annotations, as part of the active learning algorithm. Such a trade-off has the potential to lead to an improved model performance under a fixed annotation budget.
The thesis also explores topics beyond annotation uncertainty. First, in the context of learning probabilistic machine learning models, we focus on unnormalised probabilistic models, with energy-based models among them. We establish a link between two groups of important methods used for estimating unnormalised models, namely noise-contrastive estimation and approximate maximum likelihood methods. This link provides an improved under-standing of noise-contrastive estimation and serves to create a more coherent framework for the estimation of unnormalised models. Second, for deeper insights into the generalisation behaviour of machine learning models trained using gradient-based learning, we study the epoch-wise double descent phenomenon in two-layer linear neural networks. With this, we identify additional factors contributing to epoch-wise double descent that has not been observed for the simpler linear regression model, which is commonly central to theoretical studies. Although not specific to probabilistic models, these insights could potentially be extended to such models in the future and used to further explore the interplay between annotation uncertainty and model performance.