Abstract
This thesis focuses on utilising and adapting active learning pipelines to address the complexities of real-world scientific datasets and optimising for the constraints of the human labeller. The current assumptions applied to the research of active learning are focused on reducing model training times by sampling thousands of data points at each iteration. In real-world datasets, labels are often scarce and so the bottleneck is the labeller and not the computation time. With the assumption that experts can provide thousands of labels at each iteration, methods end up appearing to work well on standardised benchmarks but become almost unusable in practical problems. The expert-in-the-loop is constrained by complex datasets requiring expert evaluation whilst working on strict labelling budgets. This misalignment means that the intended user of such methods cannot fully utilise them, leading to models that significantly underperform.By analysing the impacts of severe class imbalance, overlapping class boundaries, and incorrect labels, this thesis provides a solutions-driven approach to improving classification performance and reducing the amount of labelling required. The machine learning and active learning adaptations presented lead to models converging faster, as well as the ability to extract more information about each sampled instance. To enable experts to quickly and easily integrate the proposed active learning solutions into their pipelines, AstronomicAL, an interactive dashboard for training and labelling has been developed. Combining domain-specific plots with the ability to integrate external datasets allows experts to provide labels that are accurate and reliable. By utilising astronomy datasets as a proxy for other scientific domains, active learning is shown to produce classifiers that can achieve equal or better performance whilst substantially reducing the labelling bottleneck for experts.
Date of Award | 1 Oct 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Sotiria Fotopoulou (Supervisor), Oliver Ray (Supervisor) & Malcolm N Bremer (Supervisor) |