solutions to endogeneity: try to say something definite about the likely biases of OLS, include proxies for omitted variables, use RCTs/natural experiments, instrumental variables
No omitted variable bias if either the variables are uncorrelated or one of the variables is irrelevant to the determination of Y
Proxying for omitted variables typically motivates the inclusion of demographic variables like family characteristics, and race and region dummies etc.
proxies cannot be given a causal interpretation as they are not included in the causal model
the coefficient on a mismeasured regressor is subject to attenuation bias (biased towards 0), coefficients on other regressors may be biased in either direction
RCT characterised by: a sample of individuals under study who have elected/been compelled to participate, assignment of Xi being under the control of the researcher, for the individuals in that sample
actual treatment equals assigned treatment if there is perfect compliance
RCTs make X and u independent by construction
Limitation of RCTs: needs to be feasible and ethical, so only possible for some kinds of 'treatment'
population linear regression coefficient is the difference of population group means, if treatment is binary
control variables used for RCTs must be pre-treatment characteristics
can test for balancedness: the treated and the untreated should have approximately similar pre-treatment characteristics
a shortcoming of average treatment effect (ATE) is that it averages effect of treatment over people who may never receive it in reality
internal validity: are inferences on causal effects credible for the population studied
external validity: can inferences be credibly generalised to other populations?
threats to internal validity: imperfect compliance, small samples, attrition (people may drop out), Hawthorne effects
problems for external validity arise when: populations differ in a way that matters for the determination of Y and which is not accounted for by the model
a quasi-/natural experiment is an observational study where 'nature' partly replicates an RCT - X is 'as if' randomly assigned
instrumental variables must satisfy: 1. Z is correlated with X 2. Z is uncorrelated with u (is exogenous) 3. Z does not enter the structural equation
exogeneity and exclusion of IVs mean changes in Z do not affect Y directly, relevance means a change in Z affects Y by 'shifting' X
source of endogeneity in X is not important for IVs, as long as we have a Z that fulfils the criteria it can be used as an IV
under homoskedasticity 2SLS if less efficient than OLS
weak instruments arises when the relevance condition may technically hold, but with a coefficient too small relative to the sample size for normal approximation to be reliable
Can use F test to detect weak instruments with a larger rule of thumb value of c=10
If you have weak instruments, standard inferences cannot be drawn for Beta1 and usual confidence intervals may be misleadingly narrow
if random assignment fails, can randomly assign an inducement - if the inducement affects uptake of treatments and is independent of individual characteristics it is an instrumental variable
Local average treatment effect (LATE) = a population average of individual-level treatment effects, weighted by responsiveness to the instrument