Abstract
Multiclass classification in machine learning allows the automation of optimal decision making by means of learning algorithms and annotated datasets. However, it is still difficult to precisely quantify the uncertainty of the estimates of the learned models during deployment. Some of the reasons are: the built-in assumptions of the models, the hypothesis space considered, the data distribution and the correctness of the datasets; including their annotations.In this thesis, I question the current interpretation of probabilistic estimates obtained by some of the most common classification algorithms. I provide an overview of the different types of models for statistical classification, and under what circumstances each type can be used for optimal decision making.
First, I demonstrate in a large study that common probabilistic classifiers do not always provide accurate probabilities. I provide a literature overview on classifier calibration, which includes error measures, visualisations, and calibration methods. I extend the available calibration measures by proposing new ones that evaluate the probabilities for every class, which is essential for optimal decision making (e.g. classwise expected calibration error). Furthermore, given the lack of multiclass calibration methods, I present Dirichlet calibration as a natural extension of the binary Beta calibration. A large range of experiments demonstrate state-of-the-art results on 11 classifiers, and 14 deep neural networks.
Furthermore, most classifiers are not robust against imperfect labels. However, annotation processes are usually expensive, hindering the amount of labels that are collected. One possible solution is to alleviate the annotation process in order to accept weak labels instead. Nonetheless, the resulting type of uncertainty in the annotations may propagate to the probabilities predicted by classifiers. I provide an overview of methods that can be used in order to train classifiers with weak labels. I perform several experiments that compare different methods to train with weak labels, and even propose to combine multiple types of weakening processes if a dataset has been annotated with various methods.
However, posterior probabilities alone can not quantify another type of uncertainty. Epistemic uncertainty appears from the lack of knowledge which can be potentially reduced and quantified. I provide real-world scenarios in which it is important to quantify this type of uncertainty to avoid costly misclassification errors. I discuss these topics and present a general technique called Background Check, to augment arbitrary probabilistic classifiers with an additional probability that can be used to quantify epistemic uncertainty. I show with a set of experiments that the provided probabilities achieve state-of-the-art results in different settings, in pair with other specifically designed methods.
Date of Award | 9 May 2023 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Peter A Flach (Supervisor) & Raul Santos-Rodriguez (Supervisor) |
Keywords
- Machine learning
- Probability
- Uncertainty
- Probabilistic forecasts
- Probabilistic modelling
- Classification
- Decision making
- Optimal decision making
- Probabilistic classification
- Weak labels
- Anomaly detection
- Outlier detection
- Proper losses
- Cautious classification
- Classification with confidence