AQRM part 2: causal analysis when endogeneity likely

?
  • Created by: charlie
  • Created on: 21-05-17 15:04
exogeneity holds (regressor uncorrelated with ui)
OLS will predict a CAUSAL relationship between outcome and regressors, OLS will be BLUE (consistent + unbias)
1 of 51
exogeneity doesn't hold/ endogeneity (regressor correlated with ui)
OLS will predict a NON-CAUSAL relationship between outcome and regressors, OLS inconsistent + bias
2 of 51
(4) reasons for endogeneity
(1) OVB (2) Measurement error (3) Simultaneity bias (4) Selection bias
3 of 51
reasons for endogeneity: (1) OVB
excluding relevant variables that are correlated with X's
4 of 51
reasons for endogeneity: (1) OVB (upwards bias/ overestimated)
SAME SIGNS some of effect of OV mistakenly attributed to X (occurs when +ve/+ve (or -ve/-ve): cov(OV,Y)>0 and important (coef not 0), and cov(OV,X)>0)
5 of 51
reasons for endogeneity: (1) OVB (downwards bias/ underestimated)
DIFFERENT SIGNS some of effect of OV mistakenly attributed to X (occurs when +ve/-ve (or -ve/+ve) cov(OV,Y)>0 and important (coef not 0), and cov(OV,X)
6 of 51
reasons for endogeneity: (2) measurement error
OLS will not estimate B2 consistently if Xi is likely to be measured with error (e.g permanent income as a proxy for current income/ asking people no. of cigarettes smoked per week)
7 of 51
reasons for endogeneity: (3) simultaneity bias
both equations determined simultaneously where they both depend on each other (regress doesn't take this into account) e.g. supply+demand/ police spending + crime
8 of 51
reasons for endogeneity: (4) selection bias (lecture example)
sample isn't representative of population (relationship between Y and X of sampled individuals is different to that of unsampled) e.g. people attending lecture after drinking will not be as affected by alcohol (self-select + miss most important data)
9 of 51
(2) solutions to endogeneity
(1) randomisation in OLS (2) use of instrument variables
10 of 51
solutions to endogeneity: (1) Randomisation in OLS
randomly assigning X (by some device) will make X uncorrelated with all characteristics (regressors and omitted variables in ui), will on avg assign same mix (in large samples everything else on avg is the same and variation in outcome only due to X)
11 of 51
solutions to endogeneity: (1) Randomisation: RCT healthcare example
treatment effect captured by including randomly assigned dummy variable =1 if treated (=0 if control)
12 of 51
solutions to endogeneity: (1) Randomisation: RCT (5) limitations on human subjects
(1) unethical to force people (2) only on people choosing to participate (3) impossible to strictly assign variable (control group will obtained by other means) (4) effect of having 'option' not actual treatment (5) limited duration of experiments
13 of 51
solutions to endogeneity: (2) IV (Z) reason
helps identify causal relationship as will observe some exogenous variation in X through Z
14 of 51
solutions to endogeneity: (2) IV (2) validity assumptions
(1) RELEVANCE Z must be correlated with X (can directly test) (2) EXOGENEITY Z must not determine Y + not be correlated with omitted variables that determine Y (ui) (can't directly test)
15 of 51
solutions to endogeneity: (2) IV 2SLS (stage 1)
(1) regress X on Z and obtain fitted values, separates 2 sources of variation (exog from Z, endog), creates fitted value that only captures exogenous variation in X (coming from Z)
16 of 51
solutions to endogeneity: (2) IV 2SLS (stage 2)
(2) [predict X(hat) from 1st stage] then regress Y on X(hat), X(hat) predicts causal effect of X on Y (only using variation in X that is unrelated to ui)
17 of 51
solutions to endogeneity: (2) IV 2SLS (1 command corrects s.e)
manually computing in stata = incorrect s.e as doesn't take into account X(hat) also estimated using OLS, STATA command: ivregress 2sls lnearn (S= sm sf siblings) female wexp...
18 of 51
solutions to endogeneity: (2) IV (4) additional issues
(1) how doe we test endogeneity of X? (2) RELEVANCE need to test strength of IV (3) EXOGENEITY need to test exogeneity of IV (4) what if multiple endogenous regressors?
19 of 51
IV: with additional exogenous variables: IV assumptions
(1) RELEVANCE (different): Z must be correlated with X after accounting for W (D2 cannot =0 holding W cst) (2) EXOGENEITY (same)
20 of 51
IV: with additional exogenous variables: 2SLS
(stage 1) include W in regression (stage 2) include W in regression
21 of 51
IV: with multiple instruments + additional exogenous variables: IV assumptions
(1) RELEVANCE (different): At least one IV must be correlated with X after accounting for W (cannot be D2=D3=0) (2) EXOGENEITY (different) both IV must not be correlated with ui
22 of 51
IV: with multiple instruments + additional exogenous variables: 2SLS
(stage 1) include all IV and W in regression (stage 2) include just fitted value and W in regression
23 of 51
IV: RELEVANCE testing for weak or strong instrument in stata
test in (stage 1) of 2SLS... (manual) regress 1st stage of 2SLS + test coef D2=D3=0 + check F-stat (quick) ivregress 2sls + estat firststage + check F-stat
24 of 51
IV: RELEVANCE weak IV
F-stat
25 of 51
IV: RELEVANCE strong IV
F-stat>10, 2SLS estimator will be closer to true parameter value (stronger for larger sample sizes)
26 of 51
IV: ENDOGENEITY testing validity of IV: multiple IV identification
UNDER ID (no. of IV < endogenous reg. cannot use 2SLS) EXACT ID ( no. of IV = endogenous reg. can use 2SLS) OVER ID (no. of IV > endogenous reg. can use 2SLS + indirectly test exogeneity)
27 of 51
IV: ENDOGENEITY testing validity of IV: multiple IV Sargan Test
H0: All IV valid (exogenous) H1: At least one IV is invalid, ivregress 2sls + estat overid + check p-value of Sargan score (if p
28 of 51
IV: Randomisation
IV assigned randomly as easier to defend exogeneity assumption
29 of 51
IV: Randomisation ITT effect (causal effect of being ASSIGNED treatment)
difference in E(Y|Z) when randomised dummy Z=1 and Z=0 (e.g. difference in marks when sent email link for video, and not sent link for video)
30 of 51
IV: Randomisation ITT effect problem
UNDERESTIMATES causal effect (ignores non-compliance of individuals (e.g. some who received video won't watch it + some who don't receive video may find it by some other means))
31 of 51
IV: Randomisation causal effect of ACTUAL treatment
ITT effect (effect of being assigned random IV on Y) / effect of being assigned random IV on X (coef of stage 1 of 2SLS)
32 of 51
natural experiment (quasi-experiment)
can't conduct randomisation (RCT) exploit 'natural' event as source for randomness that makes endogeneity unlikely (e.g. natural/ unexpected/ reform/ regulation)
33 of 51
Regression Discontinuity (RD): definition
a situation (natural experiment) which treatment D depends on an observed continuos variable Q (running variable)
34 of 51
Regression Discontinuity (RD): features
cut off point (q0) get treatment (D=1 Q>q0) and don't get treatment (D=0 Q
35 of 51
Sharp RD: (2) features (e.g. effect of alc restrictions on mortality)
treatment status D is deterministic function of q0 (strictly enforce at cut off)/ D is discontinuous function of q0 (100% jump at cut off either able to buy or not)
36 of 51
Sharp RD: OLS model characteristics
.
37 of 51
Sharp RD: problems
only ST effect (other factors begin to affect after cut off)/ need large sample size (introduce range of other factors attributing to Y)
38 of 51
Sharp RD: OVB taken care of by controlling for trend f(Q)
unobservables will be correlated with D, but D is determined solely by age and can control for age using trend (e.g. can't choose when you turn 21 so can be controlled for)
39 of 51
Sharp RD/ Fuzzy RD: non-linear relationship
include another term (quadratic f(Q)) if likely that the general trend is not linear, never know exact coef as don't know precise trend relationship
40 of 51
Sharp RD: cumulative effects
include another term (X-X0) that captures how many units past cut off you are (treatment effect will now be combination D at cut off + D past cut off), only credible if no other influential factors past cut off
41 of 51
Sharp RD: (3) important points
(1) study doesn't tell us about any other policy changes (2) validity depends on willingness to extrapolate away from cut-off (larger sample) (3) should be no other discontinuities at cut-off
42 of 51
Fuzzy RD: (2) features (e.g. effect of raising school leaving age on age leave education
Treatment status D isn't deterministic function of q (not strictly enforced/ indirect effect) (2) probability/ intensity of treatment jumps at cut-off (not 100% change as still have option)
43 of 51
Fuzzy RD: (stage 1) estimate 2 linear models using OLS
use D (treatment effect) to see how much it affects regressor of interest
44 of 51
Fuzzy RD: (stage 2) use D in 2SLS
D (treatment) now used as IV: RELEVANT (correlated with X)+ EXOGENOUS (only related to Y through X) and regressor of interest X is used as TREATMENT
45 of 51
Fuzzy RD: assumptions
(1) trend captures all relevant differences before + after cut-off (2) IV exogenous (after adjusting for running variable will ply affect Y through X) + (3) No other factors affecting D (other policies)
46 of 51
Diff-in-diff: features
observe 2 groups over time (treatment/comparison)/ comparison is different from treatment + untreated in both periods/ treatment experiences policy change
47 of 51
Diff-in-diff: assumptions
DONT ASSUME: identical DO ASSUME: both would have followed same trend if left untreated
48 of 51
Diff-in-diff: treatment effect equation (diff-in-diff)
removes confounding factors on treatment/ control group that would be assumed to affect both groups (TREND), leaving just the policy change
49 of 51
Diff-in-diff: treatment effect equation bias
(1) dont compare directly between years as many other factors changing over time (2) don't compare directly between groups as will be systematically different
50 of 51
Diff-in-diff: OLS estimation equation
constant + dummy variable (spatial trend) + dummy variable (time trend) + interaction (treatment effect)
51 of 51

Other cards in this set

Card 2

Front

OLS will predict a NON-CAUSAL relationship between outcome and regressors, OLS inconsistent + bias

Back

exogeneity doesn't hold/ endogeneity (regressor correlated with ui)

Card 3

Front

(1) OVB (2) Measurement error (3) Simultaneity bias (4) Selection bias

Back

Preview of the back of card 3

Card 4

Front

excluding relevant variables that are correlated with X's

Back

Preview of the back of card 4

Card 5

Front

SAME SIGNS some of effect of OV mistakenly attributed to X (occurs when +ve/+ve (or -ve/-ve): cov(OV,Y)>0 and important (coef not 0), and cov(OV,X)>0)

Back

Preview of the back of card 5
View more cards

Comments

No comments have yet been made

Similar Economics resources:

See all Economics resources »See all research methods resources »