Abstract
The aim of this work is to select galaxy cluster candidates from the XXL X-ray source catalogue by applying a supervised machine learning based selection method. The biggest hurdle when applying supervised machine learning selection methods to astrophysical catalogues is the need for a sufficiently large, perfectly labelled set of training data that accurately reflects real data. The creation of such training sets for astrophysics is a highly involved complex problem. This work presents an alternative approach.By adapting the machine learning model to account for uncertainty on the training labels we remove the need for a perfectly labelled training set, instead requiring one that can be created by labelling a source catalogue based on the purity of existing source samples. We describe in chapter 3 the adaption of a Gaussian process binary classifier to account for uncertainties on the training labels.
The adapted classifier was separately trained on the North and South XXL X-ray source catalogues labelled based on the existing XXL cluster selection samples (chapter 4). To avoid the model simply re-learning the existing selection criteria those measured source properties used by XXL to select galaxy clusters were not provided to the model. The capability of the model with respect to cluster selection was assessed using three methods. We first made use of a simulated XXL catalogue with labelled galaxy cluster detections, but it was found to insufficiently recreate the real XXL catalogues to be of use. A set of XXL sources with evidence of an increased likelihood of being a galaxy cluster detection based on their association with an optically-selected cluster, showed the model is able to distinguish such sources from the general population. Finally we visually inspect a subset of sources within the North catalogue to determine a reliable cluster selection criteria based on the output of the ML model. The cluster sample produced contains 623 sources from the North catalogue. Of the 248 sources previously selected by XXL, 225 were recovered by this sample. The sample was found to have a purity of 0.45+0.03−0.03 and contain an expected 280 cluster candidates, 101 of which were not previously selected by XXL. The new candidates were often found to differ in their X-ray morphologies from those previously selected by XXL, tending not to be dominated by a single X-ray component that follows a β-model surface brightness profile.
Interpretation of the model’s selection criteria (chapter 5) showed it learnt to identify clusters based on a sources count rates measured by separately fitting an extended and point source emission model. We note that while the output of the binary classifier was robust to being trained on either the North or South XXL source catalogues, our investigation into the selection criteria showed a subtle and unresolved difference in behaviour, possibly due to differences in the properties of the two fields (e.g. differences in Galactic column and foreground, or time-varying instrument calibration or background characteristics). Overall, we find that the classifier is complementary to the standard XXL processing. However, the advantage of the Gaussian process is that it allows for additional information (e.g. from other wavebands) to be incorporated into the uncertainties on the labels used for training, or in the classification process (chapter 6)
Date of Award | 3 Oct 2023 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Ben J Maughan (Supervisor) & Malcolm N Bremer (Supervisor) |
Keywords
- Astronomy
- galaxy clusters
- X-ray: galaxy clusters
- Gaussian Processes
- Machine learning
- XXL X-ray survey
- XMM Newton
- Source clasification