Abstract
Auxiliary variables are used in multiple imputation (MI) to reduce bias and increase efficiency. These variables may often themselves be incomplete. We explored how missing data in auxiliary variables influenced estimates obtained from MI. We implemented a simulation study with three different missing data mechanisms for the outcome. We then examined the impact of increasing proportions of missing data and different missingness mechanisms for the auxiliary variable on bias of an unadjusted linear regression coefficient and the fraction of missing information. We illustrate our findings with an applied example in the Avon Longitudinal Study of Parents and Children. We found that where complete records analyses were biased, increasing proportions of missing data in auxiliary variables, under any missing data mechanism, reduced the ability of MI including the auxiliary variable to mitigate this bias. Where there was no bias in the complete records analysis, inclusion of a missing not at random auxiliary variable in MI introduced bias of potentially important magnitude (up to 17% of the effect size in our simulation). Careful consideration of the quantity and nature of missing data in auxiliary variables needs to be made when selecting them for use in MI models.
Original language | English |
---|---|
Article number | kwae306 |
Journal | American Journal of Epidemiology |
Early online date | 27 Aug 2024 |
DOIs | |
Publication status | E-pub ahead of print - 27 Aug 2024 |
Research Groups and Themes
- ALSPAC
Keywords
- Auxiliary variables
- Bias
- Missing data
- Multiple imputation
- Simulation
Fingerprint
Dive into the research topics of 'Analyses using multiple imputation need to consider missing data in auxiliary variables'. Together they form a unique fingerprint.Equipment
-
HPC (High Performance Computing) and HTC (High Throughput Computing) Facilities
Alam, S. R. (Manager), Eccleston, P. E. (Other), Williams, D. A. G. (Manager) & Atack, S. H. (Other)
Facility/equipment: Facility