Analyses using multiple imputation need to consider missing data in auxiliary variables

Research output: Contribution to journalArticle (Academic Journal)peer-review

Abstract

Auxiliary variables are used in multiple imputation (MI) to reduce bias and increase efficiency. These variables may often themselves be incomplete. We explored how missing data in auxiliary variables influenced estimates obtained from MI. We implemented a simulation study with three different missing data mechanisms for the outcome. We then examined the impact of increasing proportions of missing data and different missingness mechanisms for the auxiliary variable on bias of an unadjusted linear regression coefficient and the fraction of missing information. We illustrate our findings with an applied example in the Avon Longitudinal Study of Parents and Children. We found that where complete records analyses were biased, increasing proportions of missing data in auxiliary variables, under any missing data mechanism, reduced the ability of MI including the auxiliary variable to mitigate this bias. Where there was no bias in the complete records analysis, inclusion of a missing not at random auxiliary variable in MI introduced bias of potentially important magnitude (up to 17% of the effect size in our simulation). Careful consideration of the quantity and nature of missing data in auxiliary variables needs to be made when selecting them for use in MI models.
Original languageEnglish
Article numberkwae306
JournalAmerican Journal of Epidemiology
Early online date27 Aug 2024
DOIs
Publication statusE-pub ahead of print - 27 Aug 2024

Research Groups and Themes

  • ALSPAC

Keywords

  • Auxiliary variables
  • Bias
  • Missing data
  • Multiple imputation
  • Simulation

Fingerprint

Dive into the research topics of 'Analyses using multiple imputation need to consider missing data in auxiliary variables'. Together they form a unique fingerprint.

Cite this