liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Perspectives on Predictive and Annotation Uncertainty in Probabilistic Machine Learning
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.ORCID iD: 0000-0003-4209-874X
2024 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Perspektiv på prediktiv- och etikettosäkerhet i probabilistisk maskininlärning (Swedish)
Abstract [en]

Machine learning models are, just as us humans, exposed to the uncertainty of the world. Following the complexity of real-world events, these models are often employed for prediction tasks where there is no single, ground-truth answer, meaning that it may be impossible to determine the precise outcome of the predicted event beforehand. This aleatoric uncertainty is potentially, but not necessarily, a result of the event in question being part of a larger system, where some information remains undisclosed. 

Moreover, machine learning models are data-driven and typically learn everything they know from data, called training data. The quality of the training data is vital in deter-mining the extent of a machine learning model’s knowledge and, consequently, how well the model performs on a given task. For instance, when training data is limited, this can result in uncertainty originating from a lack knowledge, often referred to as epistemic uncertainty. Furthermore, collected through observation, or measurements, of real-world events, the training data naturally incorporates the uncertainty inherent to these events. Some-times, additional uncertainty is integrated through the processes used to acquire the data, following, for instance, measurement error or human error. One such type of uncertainty is in this thesis termed annotation uncertainty, and relates to the collection of annotations for training models through supervised learning. 

The focus of this thesis lies on probabilistic predictive machine learning models, as an approach to representing different sources of so-called predictive uncertainty, including aleatoric, epistemic and annotation uncertainty. Special attention is given to annotation uncertainty, beginning with an exploration of possible negative effects of this type of uncertainty on the performance of probabilistic predictive models. We analyse how annotation uncertainty, or noise, affects the properties of asymptotic risk minimisers when training models with two different classes of loss functions: strictly proper and a group of previously proposed robust loss functions. The analysis emphasises the importance of considering a model’s ability to accurately estimate predictive uncertainty, also referred to as the model’s reliability, when developing training algorithms robust to annotation noise. 

However, under the umbrella of weak supervision, we also provide two examples of when annotation uncertainty can be allowed, to instead benefit model performance. In the first example, we use ensemble models to generate annotations for the training data, with the aim to teach individual probabilistic models to estimate both aleatoric and epistemic uncertainty in their predictions. Having this ability is beneficial in many applications, one of them being active learning, and, notably, the active learning algorithm constituting the second example. This specific active learning algorithm acquires data samples based on high epistemic uncertainty, believed to represent samples for which there is much gain to be made in terms of model performance. The contribution does not lie in the particular approach to acquiring data samples, but instead in introducing the possibility to make a trade-off between annotation costs and quality of annotations, as part of the active learning algorithm. Such a trade-off has the potential to lead to an improved model performance under a fixed annotation budget.

The thesis also explores topics beyond annotation uncertainty. First, in the context of learning probabilistic machine learning models, we focus on unnormalised probabilistic models, with energy-based models among them. We establish a link between two groups of important methods used for estimating unnormalised models, namely noise-contrastive estimation and approximate maximum likelihood methods. This link provides an improved under-standing of noise-contrastive estimation and serves to create a more coherent framework for the estimation of unnormalised models. Second, for deeper insights into the generalisation behaviour of machine learning models trained using gradient-based learning, we study the epoch-wise double descent phenomenon in two-layer linear neural networks. With this, we identify additional factors contributing to epoch-wise double descent that has not been observed for the simpler linear regression model, which is commonly central to theoretical studies. Although not specific to probabilistic models, these insights could potentially be extended to such models in the future and used to further explore the interplay between annotation uncertainty and model performance.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2024. , p. 68
Series
Linköping Studies in Arts and Sciences, ISSN 0282-9800 ; 890Linköping Studies in Statistics, ISSN 1651-1700 ; 18
Keywords [en]
Probabilistic machine learning, Uncertainty estimation, Weak supervision
Keywords [sv]
Probabilistisk maskininlärning, Osäkerhetsskattning, Dataosäkerhet
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:liu:diva-207737DOI: 10.3384/9789180757997ISBN: 9789180757980 (print)ISBN: 9789180757997 (electronic)OAI: oai:DiVA.org:liu-207737DiVA, id: diva2:1899011
Public defence
2024-10-25, Ada Lovelace, B-building, Campus Valla, Linköping, 09:00 (English)
Opponent
Supervisors
Note

Funding: This research was financially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, the Excellence Center at Linköping-Lund in Information Technology (ELLIIT), and the Swedish Research Council.

2024-09-19: The thesis was first published online. The online published version reflects the printed version.

2024-11-19: The thesis was updated with an errata list which is also downloadable from the DOI landing page. Before this date the PDF has been downloaded 119 times.

Available from: 2024-09-19 Created: 2024-09-19 Last updated: 2024-11-19Bibliographically approved
List of papers
1. A General Framework for Ensemble Distribution Distillation
Open this publication in new window or tab >>A General Framework for Ensemble Distribution Distillation
2020 (English)In: 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2020, p. 1-6Conference paper, Published paper (Refereed)
Abstract [en]

Ensembles of neural networks have shown to give better predictive performance and more reliable uncertainty estimates than individual networks. Additionally, ensembles allow the uncertainty to be decomposed into aleatoric (data) and epistemic (model) components, giving a more complete picture of the predictive uncertainty. Ensemble distillation is the process of compressing an ensemble into a single model, often resulting in a leaner model that still outperforms the individual ensemble members. Unfortunately, standard distillation erases the natural uncertainty decomposition of the ensemble. We present a general framework for distilling both regression and classification ensembles in a way that preserves the decomposition. We demonstrate the desired behaviour of our framework and show that its predictive performance is on par with standard distillation.

Place, publisher, year, edition, pages
IEEE, 2020
Series
IEEE International Workshop on Machine Learning for Signal Processing, ISSN 2161-0363
Keywords
Uncertainty, Predictive models, Data models, Computational modeling, Training, Toy manufacturing industry, Neural networks, Ensemble, distillation
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-175056 (URN)10.1109/MLSP49062.2020.9231703 (DOI)000630907800032 ()9781728166629 (ISBN)
Conference
30th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Aalto Univ, ELECTR NETWORK, sep 21-24, 2020
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Funding: Wallenberg Al, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation

Available from: 2021-04-16 Created: 2021-04-16 Last updated: 2024-09-19Bibliographically approved
2. Robustness and Reliability When Training With Noisy Labels
Open this publication in new window or tab >>Robustness and Reliability When Training With Noisy Labels
2022 (English)In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022, JMLR , 2022, Vol. 151, p. 922-942Conference paper, Published paper (Refereed)
Abstract [en]

Labelling of data for supervised learning canbe costly and time-consuming and the riskof incorporating label noise in large data setsis imminent. When training a flexible discriminative model using a strictly proper loss,such noise will inevitably shift the solution towards the conditional distribution over noisylabels. Nevertheless, while deep neural networks have proven capable of fitting randomlabels, regularisation and the use of robustloss functions empirically mitigate the effectsof label noise. However, such observationsconcern robustness in accuracy, which is insufficient if reliable uncertainty quantificationis critical. We demonstrate this by analysingthe properties of the conditional distributionover noisy labels for an input-dependent noisemodel. In addition, we evaluate the set ofrobust loss functions characterised by noiseinsensitive, asymptotic risk minimisers. Wefind that strictly proper and robust loss functions both offer asymptotic robustness in accuracy, but neither guarantee that the finalmodel is calibrated. Moreover, even with robust loss functions, overfitting is an issue inpractice. With these results, we aim to explain observed robustness of common training practices, such as early stopping, to labelnoise. In addition, we aim to encourage thedevelopment of new noise-robust algorithmsthat not only preserve accuracy but that alsoensure reliability. 

Place, publisher, year, edition, pages
JMLR, 2022
Series
The proceedings of Machine Learning research, ISSN 2640-3498
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-185232 (URN)000828072700039 ()
Conference
International Conference on Artificial Intelligence and Statistics, ELECTR NETWORK, mar 28-30, 2022
Note

Funding: Swedish Research Council via the project Handling Uncertainty in Machine Learning Systems [2020-04122]; Swedish Foundation for Strategic Research via the project Probabilistic Modeling and Inference for Machine Learning [ICA16-0015]; Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; ELLIIT

Available from: 2022-05-22 Created: 2022-05-22 Last updated: 2024-09-19
3. Active Learning with Weak Supervision for Gaussian Processes
Open this publication in new window or tab >>Active Learning with Weak Supervision for Gaussian Processes
2023 (English)In: Neural Information Processing 29th International Conference, ICONIP 2022, Virtual Event, November 22–26, 2022, Proceedings, Part V / [ed] M. Tanveer et al., Singapore: Springer Nature, 2023, p. 195-204Conference paper, Published paper (Refereed)
Abstract [en]

Annotating data for supervised learning can be costly. When the annotation budget is limited, active learning can be used to select and annotate those observations that are likely to give the most gain in model performance. We propose an active learning algorithm that, in addition to selecting which observation to annotate, selects the precision of the annotation that is acquired. Assuming that annotations with low precision are cheaper to obtain, this allows the model to explore a larger part of the input space, with the same annotation budget. We build our acquisition function on the previously proposed BALD objective for Gaussian Processes, and empirically demonstrate the gains of being able to adjust the annotation precision in the active learning loop.

Place, publisher, year, edition, pages
Singapore: Springer Nature, 2023
Series
Communications in Computer and Information Science, ISSN 1865-0929, E-ISSN 1865-0937 ; 1792
Keywords
Machine learning, Active learning, Weak supervision
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-195039 (URN)10.1007/978-981-99-1642-9_17 (DOI)001417354200017 ()2-s2.0-85161628122 (Scopus ID)978-981-99-1641-2 (ISBN)978-981-99-1642-9 (ISBN)
Conference
29th International Conference on Neural Information Processing, ICONIP 2022, Virtual Event, November 22–26, 2022
Available from: 2023-06-14 Created: 2023-06-14 Last updated: 2025-10-10
4. On the connection between Noise-Contrastive Estimation and Contrastive Divergence
Open this publication in new window or tab >>On the connection between Noise-Contrastive Estimation and Contrastive Divergence
2024 (English)In: INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, PMLR , 2024, Vol. 238, p. 3016-3024Conference paper, Published paper (Refereed)
Abstract [en]

Noise-contrastive estimation (NCE) is a popular method for estimating unnormalised probabilistic models, such as energy-based models, which are effective for modelling complex data distributions. Unlike classical maximum likelihood (ML) estimation that relies on importance sampling (resulting in ML-IS) or MCMC (resulting in contrastive divergence, CD), NCE uses a proxy criterion to avoid the need for evaluating an often intractable normalisation constant. Despite apparent conceptual differences, we show that two NCE criteria, ranking NCE (RNCE) and conditional NCE (CNCE), can be viewed as ML estimation methods. Specifically, RNCE is equivalent to ML estimation combined with conditional importance sampling, and both RNCE and CNCE are special cases of CD. These findings bridge the gap between the two method classes and allow us to apply techniques from the ML-IS and CD literature to NCE, offering several advantageous extensions.

Place, publisher, year, edition, pages
PMLR, 2024
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
Keywords
Unnormalised models, noise-contrastive estimation, contrastive divergence
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-204020 (URN)001286500301029 ()
Conference
International Conference on Artificial Intelligence and Statistics, 2-4 May 2024, Palau de Congressos, Valencia, Spain
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Funding Agencies|Swedish Research Council via the project Handling Uncertainty in Machine Learning Systems [2020-04122]; Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Excellence Center at Linkoping-Lund in Information Technology (EL-LIIT)

Available from: 2024-05-31 Created: 2024-05-31 Last updated: 2025-08-20Bibliographically approved

Open Access in DiVA

fulltext(2678 kB)479 downloads
File information
File name FULLTEXT03.pdfFile size 2678 kBChecksum SHA-512
31ccc7ed38eef2742b00dc1e78d28c14e07e6a3094c46314a50e672b3b977bcdddd2141e8229f38aba75257ceb3b29232a7d2887458350d6482f6d59ddc1c6fe
Type fulltextMimetype application/pdf
errata(221 kB)54 downloads
File information
File name ERRATA01.pdfFile size 221 kBChecksum SHA-512
2a07077308044eb24582c26be2563dc0a003c93e6ade824c3a092d35b337b2cb90def8c7881fa4f6c83f18b86f495bc522f689aa8c9e013cf4637cf65b906d28
Type errataMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Olmin, Amanda

Search in DiVA

By author/editor
Olmin, Amanda
By organisation
The Division of Statistics and Machine LearningFaculty of Arts and Sciences
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 601 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 2015 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf