Abstract
In this thesis, we consider the problem of instrumental variable (IV) selection when we have a large number of available instruments. We allow that some of these candidate instruments may be invalid in the sense that they may violate the exclusion restriction and enter the model as explanatory variables. We propose three methods for selecting the valid IVs from the candidates. The first method is the Confidence Interval (CI) method. It selects as valid the largest group of instruments where all the confidence intervals of their instrument-specific causal estimates mutually overlap with each other. It can achieve consistent IV selection under the plurality rule, which assumes that all the valid instruments form the largest group, where instruments form a group if their instrument-specific estimators converge to the same value. We apply this method to estimate the effect of Body Mass Index (BMI) on diastolic blood pressure using 96 SNPs as candidate instruments. The second method is the adaptive Lasso IV selection method, which contributes to the literature by allowing for two endogenous regressors. Under the assumption that the number of invalid instruments is smaller than half of the total number of candidate instruments minus one, we develop a median-of-medians estimator, which is $\sqrt{n}$-consistent for the causal effects. Adaptive Lasso using the median-of-medians estimator as penalty weights can select valid instruments consistently. We apply this method to estimate the direct effects of educational attainment and cognitive ability on BMI. The third method combines the agglomerative hierarchical clustering (AHC) algorithm,a commonly used statistical learning method for clustering analysis, with the downward testing procedure based on the Sargan-Hansen test for overidentifying restrictions. Under the plurality assumption, the AHC method can select valid instruments consistently. The main advantage of this method is that it performs well in the presence of weak instruments, can be extended to allow for multiple endogenous regressors, and can be used to detect potential heterogeneous causal effects. We apply this method to estimate the short- and long-term effects of immigration on wages in the US labor market.
Date of Award | 12 May 2022 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Frank Windmeijer (Supervisor), Senay Sokullu (Supervisor) & Sami Stouli (Supervisor) |
Keywords
- Instrumental variables;
- Invalid instruments;
- IV Selection;
- Model Selection;
- Causal inference