Abstract
Missing data is present in many world settings. Whether the missingness is structural, due to unavoidable gaps in the data collection process, or idiosyncratic, improper treatment of it can lead to biased and invalid inference. In this thesis we explore the effect of missing data on multiple different learning regimes and provide estimation methods to overcome these effects. We initially explore density ratio estimation, a technique for characterising the difference between two distributions. Here we adapt existing density ratio estimation techniques to account for the presence of missing not a random (MNAR) data. We then go on to apply our estimator to Neyman-Pearson classification which we also adapt to ensure the class specific error guarantees continue to hold when data is MNAR. After this we move on to explore a higher dimensional problem in score matching. We expand score matching to handle partially missing multi-dimensional data where for each sample, each coordinate has a non-zero chance of being missing. Finally, we return to fully missing data which is now missing at random (MAR) as we work on the field of heterogeneous treatment effect (HTE). HTE aims to characterise the effect of a binary treatment on an outcome given covariates typically via a difference in outcome or outcome distribution. HTE represents a structurally MAR data problem as we cannot observe an individuals outcome on the treatment they did not receive and treatment assignment can depend upon the covariates. We propose a new HTE estimand called the conditional quantilecomparator (CQC) which provides a quantile preserving mapping from the untreated to the treated response conditional on the covariates. We provide multiple estimators for the CQC which we show to account for the MAR data structure while also being robust to estimation error in intermediary estimands.
| Date of Award | 20 Jan 2026 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Song Liu (Supervisor) & Henry W J Reeve (Supervisor) |
Cite this
- Standard