Abstract
In recent years, the use of machine learning (ML) models, particularly black box neuralnetworks, has increased in popularity and is deployed in real-life settings for a range of
different tasks. However, ML models can make overconfident and incorrect predictions on
out-of-distribution (OOD) data which can lead to severe consequences in safety-critical settings.
Therefore, two fundamental requirements for the deployment of ML models are: 1) being able to
identify OOD data; 2) being able to explain why a data point is classified as OOD by the model. To
achieve these requirements, this thesis is concerned with understanding why certain approaches are effective at OOD detection, developing new approaches to detect OOD data, and explaining why a model considers a data point as OOD.
The first part of the thesis looks at performing experiments with instance discrimination
and supervised contrastive learning trained models to understand in which contexts contrastive models are effective at OOD detection. We see that instance discrimination is effective for far-OOD detection and that supervised contrastive learning is effective at both far and near-OOD detection as a result of learning eigenvectors with several different directions of significant variance. The second part of the thesis involved the development of a novel approach for detecting individual OOD data points using supervised contrastive learning in conjunction with the nearest neighbors of a data point and the concept of typicality, an approach that we refer to as 1D Typicality.
Our proposed approach is robust to different k values and able to outperform several different
baselines, including other approaches that use models trained with contrastive learning. In the
third section, we use the framework of counterfactual explanations to develop a new approach to explain why a model classifies a data point as OOD - we refer to this as OOD CF. This involved generating latent features of the data and then separating the features into 2 categories which are interpretable by humans, a class-discriminative partition and a non-class discriminative partition. We then generate counterfactuals for the OOD data by perturbing the features in each partition in separate stages. We test our approach in a synthetic 2D dataset, as well as tabular and image datasets, and see that the counterfactuals generated are more realistic than the baselines. We finally discuss the results we have presented and give suggestions for future work directly related to the work in the thesis as well as other upcoming research areas that may aid in model explanation and OOD detection.
Date of Award | 10 Dec 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Jonathan Lawry (Supervisor) & Raul Santos-Rodriguez (Supervisor) |