Abstract
In this thesis we present three new methods for probabilistic machine learning which extend widely used algorithms for approximate Bayesian inference and phylogenetic comparative methods. The first is a modification of Expectation Propagation (EP), called gamma-EP, which incorporates a bias term into the approximate factors. The advantage of doing this becomes apparent when we adjust the coefficient gamma of the bias term to maximise the evidence, as gamma-EP is able to converge to solutions which make the data more probable.The gamma-EP method also provides an efficient algorithm for training sparse Bayesian linear classifiers. This makes it applicable to classification with repeated data points, which EP cannot handle robustly. It is simple to implement as it only requires a few modifications to the canonical EP algorithm. The gamma-EP algorithm is extended to use kernel matrices and applied to oncogenic single nucleotide variant (SNV) classification.
The second method is a new phylogenetic regression model called Phylogenetic Relevance Vector Machine (PhyRVM). We present the first analytical solution for the phylogenetic signal lambda and show the PhyRVM outperforms the widely used maximum likelihood approach Phylogenetic Least Squares (PGLS) on a simulated dataset and on the problem of predicting optimal growth temperature of archaea. We pursue this application further with the RVM as we investigate whether we can learn scientifically meaningful genomic correlates using the most relevant features. Our trained RVM model achieves state-of-the-art performance for archaeal OGT prediction and predicts a hyperthermophilic last universal common ancestor. The final method we present is a new phylogenetic dimensionality reduction technique called Phylogenetic Probabilistic Principal Components Analysis (P3CA). The advantage of P3CA is that it is a probabilistic model so it can optimise the phylogenetic signal lambda by maximising the likelihood.
Date of Award | 2 Dec 2021 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | I C G Campbell (Supervisor), Tom R Gaunt (Supervisor) & Tom Williams (Supervisor) |