Using linked health and administrative data to reduce bias due to missing data and measurement error in observational research

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)


Missing data and measurement error are common problems in epidemiological studies. Missing data will lead to a loss of power and can result in bias. A complete case analysis, which uses only observations with fully observed data, will generally produce a biased estimate of the exposure-outcome association if the missingness mechanism depends on the outcome of interest. Misclassification – measurement error in a categorical variable – will always bias exposure-outcome estimates.
I use data from the Avon Longitudinal Study of Parents and Children to examine the impact of missingness and misclassification on exposure-outcome estimates by studying three epidemiological questions. I use proxies obtained via linkage (i) to examine the missing data mechanism; (ii) as auxiliary variables in inverse probability weighting (IPW), multiple imputation (MI) and full information maximum likelihood (FIML) models; and (iii) to correct for misclassification. I use simulations to evaluate bias and efficiency of these methods under a range of conditions.
I show that linked proxies can be used to establish a set of plausible missingness mechanisms and thus help identify an appropriate analysis strategy. Through simulations I demonstrate that, when the complete case analysis is biased, inclusion of proxies in MI (and FIML for a continuous outcome) will lead to reductions in bias and increases in efficiency provided the proxies are reasonably well correlated with the missing study variable. IPW may not always reduce bias and will lead to reduced precision if the proxies are also incomplete. Further, I find that MI provides a flexible way to simultaneously address missing data and misclassification and show that bias due to misclassification (in a binary exposure) is reduced even when the gold standard is missing not at random.
I provide guidance on how to approach missing data and misclassification problems when proxies are available through linkage to external datasets.
Date of Award7 May 2019
Original languageEnglish
Awarding Institution
  • The University of Bristol
SponsorsMedical Research Council
SupervisorKate M Tilling (Supervisor) & John A A Macleod (Supervisor)

Cite this