Skip to content

Two-sample instrumental variable analyses using heterogeneous samples

Research output: Contribution to journalArticle

  • Qingyuan Zhao
  • Jingshu Wang
  • Wes Spiller
  • Jack Bowden
  • Dylan S. Small
Original languageEnglish
Pages (from-to)317-333
Number of pages17
JournalStatistical Science
Issue number2
DateAccepted/In press - 19 Dec 2018
DatePublished (current) - 19 Jul 2019


Instrumental variable analysis is a widely used method to estimate causal effects in the presence of unmeasured confounding. When the instruments, exposure and outcome are not measured in the same sample, Angrist and Krueger (1992) suggested to use two-sample instrumental variable (TSIV) estimators that use sample moments from an instrument-exposure sample and an instrument-outcome sample. However, this method is biased if the two samples are from heterogeneous populations so that the distributions of the instruments are different. In linear structural equation models, we derive a new class of TSIV estimators that are robust to heterogeneous samples under the key assumption that the structural relations in the two samples are the same. The widely used two-sample two-stage least squares estimator belongs to this class. It is generally not asymptotically efficient, although we find that it performs similarly to the optimal TSIV estimator in most practical situations. We then attempt to relax the linearity assumption. We find that, unlike one-sample analyses, the TSIV estimator is not robust to misspecified exposure model. Additionally, to nonparametrically identify the magnitude of the causal effect, the noise in the exposure must have the same distributions in the two samples. However, this assumption is in general untestable because the exposure is not observed in one sample. Nonetheless, we may still identify the sign of the causal effect in the absence of homogeneity of the noise.

Download statistics

No data available



  • Full-text PDF. Final published version.

    Rights statement: This is the final published version of the article (version of record). It first appeared online via Project Euclid at 10.1214/18-STS692 . Please refer to any applicable terms of use of the publisher.

    Final published version, 419 KB, PDF document

    Licence: Other



View research connections

Related faculties, schools or groups