🎳Intro to Econometrics Unit 9 – Instrumental Variables & Two-Stage LS
Instrumental variables and two-stage least squares are powerful tools for addressing endogeneity in econometric models. These methods help economists estimate causal effects when explanatory variables are correlated with error terms, which can arise from omitted variables, measurement error, or simultaneous causality.
By using valid instruments that are correlated with endogenous variables but uncorrelated with error terms, researchers can isolate exogenous variation and obtain consistent estimates. The two-stage least squares approach implements this strategy, first regressing endogenous variables on instruments, then using predicted values in the main regression.
Instrumental variables (IV) address endogeneity issues in regression models when explanatory variables are correlated with the error term
Endogeneity arises from omitted variables, measurement error, or simultaneous causality leading to biased and inconsistent OLS estimates
Valid instruments are correlated with the endogenous explanatory variable but uncorrelated with the error term
Instruments should be relevant (strong correlation with the endogenous variable) and exogenous (no direct effect on the dependent variable)
Two-stage least squares (2SLS) is a common method for implementing IV estimation
First stage regresses the endogenous variable on the instrument(s) and other exogenous variables
Second stage uses the predicted values from the first stage in place of the endogenous variable
IV and 2SLS aim to obtain consistent estimates of the causal effect of the explanatory variable on the dependent variable
Weak instruments (low correlation with the endogenous variable) can lead to biased IV estimates and large standard errors
Overidentification occurs when there are more instruments than endogenous variables allowing for testing the validity of the instruments
Problem of Endogeneity
Endogeneity violates the assumption of zero conditional mean of the error term E[u∣X]=0 required for unbiased and consistent OLS estimates
Omitted variable bias occurs when a relevant variable is excluded from the model and is correlated with both the dependent and explanatory variables
Example: estimating the effect of education on earnings without controlling for ability
Measurement error in the explanatory variable leads to attenuation bias (downward bias) in the OLS estimates
Example: using self-reported income instead of actual income
Simultaneous causality or reverse causality arises when the dependent variable also affects the explanatory variable
Example: estimating the effect of police on crime while crime levels influence police allocation
Endogeneity causes the explanatory variable to be correlated with the error term leading to biased and inconsistent estimates of the causal effect
IV methods aim to isolate the exogenous variation in the endogenous explanatory variable to obtain consistent estimates
Instrumental Variables (IV) Explained
Instrumental variables (IV) are used to address endogeneity by finding a source of exogenous variation in the endogenous explanatory variable
An instrument Z is a variable that is correlated with the endogenous explanatory variable X but uncorrelated with the error term u
Cov(Z,X)=0 (relevance condition)
Cov(Z,u)=0 (exogeneity condition)
The instrument affects the dependent variable Y only through its effect on the endogenous explanatory variable X
Example: using distance to college as an instrument for education when estimating the effect of education on earnings
IV estimation isolates the exogenous variation in X that is uncorrelated with the error term to obtain a consistent estimate of the causal effect
The IV estimator is given by βIV=Cov(Z,X)Cov(Z,Y) which is consistent under the relevance and exogeneity conditions
Multiple instruments can be used for a single endogenous variable to improve efficiency and allow for overidentification tests
The reduced form equation regresses the dependent variable directly on the instrument(s) and other exogenous variables
Criteria for Valid Instruments
Relevance: the instrument must be correlated with the endogenous explanatory variable
Weak instruments (low correlation) can lead to biased IV estimates and large standard errors
The first-stage F-statistic tests the strength of the instrument(s) with a rule of thumb of F > 10 indicating a strong instrument
Exogeneity: the instrument must be uncorrelated with the error term in the structural equation
The instrument should not have a direct effect on the dependent variable other than through the endogenous explanatory variable
Overidentifying restrictions tests (e.g., Sargan-Hansen test) can be used to assess the validity of multiple instruments
Exclusion restriction: the instrument should not be correlated with any omitted variables that affect the dependent variable
This condition is not directly testable and relies on theoretical justification
Monotonicity: the effect of the instrument on the endogenous variable should be monotonic (always positive or always negative) for all individuals
This assumption is required for the interpretation of the local average treatment effect (LATE) in the presence of heterogeneous treatment effects
External validity: the IV estimate may not be generalizable to the entire population if the effect of the endogenous variable varies across individuals
The IV estimate represents the LATE for the subpopulation affected by the instrument (compliers)
Two-Stage Least Squares (2SLS) Method
Two-stage least squares (2SLS) is a common method for implementing IV estimation when the endogenous explanatory variable is continuous
The first stage regresses the endogenous explanatory variable X on the instrument(s) Z and other exogenous variables W:
X=δ0+δ1Z+δ2W+v
This stage isolates the exogenous variation in X that is uncorrelated with the error term in the structural equation
The second stage regresses the dependent variable Y on the predicted values of X from the first stage (X^) and other exogenous variables W:
Y=β0+β1X^+β2W+u
The predicted values X^ are uncorrelated with the error term u by construction
The 2SLS estimator is consistent and asymptotically normal under the relevance and exogeneity conditions
Standard errors in the second stage need to be adjusted to account for the two-step estimation process
This can be done using the Huber-White sandwich estimator or bootstrapping
2SLS can be extended to multiple endogenous variables and multiple instruments (e.g., three-stage least squares)
Implementing IV and 2SLS
Identify the endogenous explanatory variable(s) and potential instruments based on theoretical considerations and institutional knowledge
Check the relevance condition by regressing the endogenous variable on the instrument(s) and testing for significance (first-stage F-statistic)
If the instruments are weak, consider finding stronger instruments or using alternative methods (e.g., limited information maximum likelihood)
Assess the exogeneity condition using overidentification tests if there are more instruments than endogenous variables
Sargan-Hansen J-test for overidentifying restrictions
Failure to reject the null hypothesis supports the validity of the instruments
Estimate the first-stage regression and obtain the predicted values of the endogenous variable
Estimate the second-stage regression using the predicted values from the first stage in place of the endogenous variable
Interpret the IV estimates as the local average treatment effect (LATE) for the subpopulation affected by the instrument (compliers)
The LATE may differ from the average treatment effect (ATE) if the effect of the endogenous variable varies across individuals
Report the first-stage F-statistic, overidentification test results, and adjusted standard errors in addition to the IV estimates
Conduct robustness checks using alternative instruments, subsamples, or estimation methods to assess the sensitivity of the results
Limitations and Challenges
Finding valid instruments that satisfy the relevance and exogeneity conditions can be difficult in practice
Instruments that are theoretically justified may be weakly correlated with the endogenous variable leading to biased estimates
Instruments that are strongly correlated with the endogenous variable may have direct effects on the dependent variable violating the exclusion restriction
Weak instruments can lead to biased IV estimates, large standard errors, and incorrect inference
The bias of the IV estimator is inversely proportional to the strength of the instrument (first-stage F-statistic)
Weak instrument robust inference methods (e.g., Anderson-Rubin test) can be used to construct confidence intervals
The LATE interpretation of the IV estimate may not be generalizable to the entire population if the effect of the endogenous variable varies across individuals
The IV estimate represents the effect for the subpopulation affected by the instrument (compliers) which may differ from the average treatment effect (ATE)
IV estimation can be less efficient than OLS when the instruments are weak or the sample size is small
The standard errors of the IV estimator are larger than those of the OLS estimator in the absence of endogeneity
Measurement error in the instrument can lead to biased IV estimates and incorrect inference
The bias is proportional to the degree of measurement error and inversely proportional to the strength of the instrument
IV estimation relies on strong assumptions (relevance, exogeneity, exclusion restriction, monotonicity) that are not directly testable and may be violated in practice
Sensitivity analysis using alternative instruments and estimation methods can help assess the robustness of the results
Real-World Applications
Estimating the returns to education using compulsory schooling laws or distance to college as instruments for educational attainment
Addresses the endogeneity of education due to omitted variables (e.g., ability) or measurement error
Evaluating the effect of military service on earnings using the Vietnam War draft lottery as an instrument for veteran status
Exploits the random assignment of draft eligibility based on birth dates to identify the causal effect of military service
Assessing the impact of air pollution on health outcomes using wind direction or traffic congestion as instruments for pollutant concentrations
Addresses the endogeneity of pollution due to omitted variables (e.g., industrial activity) or measurement error
Estimating the effect of immigration on native wages using historical settlement patterns or policy changes as instruments for immigrant inflows
Deals with the endogeneity of immigration due to self-selection or reverse causality (e.g., immigrants may be attracted to areas with higher wages)
Analyzing the impact of financial development on economic growth using legal origins or geographic characteristics as instruments for financial institutions
Addresses the endogeneity of finance due to omitted variables (e.g., institutional quality) or reverse causality (e.g., growth may lead to financial development)
Evaluating the effect of health insurance on healthcare utilization using Medicaid eligibility rules or employer-provided coverage as instruments for insurance status
Deals with the endogeneity of insurance due to self-selection or omitted variables (e.g., health status)