Chapter 27 — Causal Inference

Causal Inference

When you can't run an experiment, how do you still claim X caused Y? Confounding, the methods (diff-in-diff, matching, IV, RDD), and how not to fool yourself.

Correlation is everywhere; causation drives decisions. When an A/B test is impossible (ethics, history, scale), causal inference estimates the effect from observational data — carefully.

27.1 Why correlation isn't causation

the confounder problem

          Confounder (Z)
          /            \
         ▼              ▼
   Treatment (X) ───?──► Outcome (Y)

Ice-cream sales (X) correlate with drownings (Y)
   ... because temperature (Z) drives both.
Control for Z or the effect is an illusion.

27.2 The gold standard & its substitutes

Method	Use when	Key assumption
Randomized experiment (RCT)	You can randomize	Randomization balances confounders
Difference-in-Differences	A group got treated at a known time, another didn't	Parallel trends pre-treatment
Matching / propensity scores	Compare similar treated vs untreated units	No unmeasured confounders
Instrumental variables	A variable shifts treatment but not outcome directly	Valid, relevant instrument
Regression discontinuity	Treatment assigned by a cutoff	Units near cutoff are comparable

27.3 Difference-in-Differences (the analyst workhorse)

Compare the change in a treated group to the change in a control group. The control's change estimates what would have happened anyway.

DiD logic

                 Before    After    Change
Treated group      A         B       B−A
Control group      C         D       D−C
                                     ─────
Causal effect =  (B−A) − (D−C)   ← removes shared trends

python

# DiD as a regression: interaction term IS the effect
import statsmodels.formula.api as smf
m = smf.ols('y ~ treated * post', data=df).fit()
print(m.params['treated:post'])   # the causal estimate

DiD only works if the two groups moved in parallel before treatment. Always plot pre-period trends — if they diverge, the estimate is biased.

27.4 Propensity score matching

Model probability of being treated from covariates (logistic regression)
Match each treated unit to control unit(s) with a similar score
Check covariate balance after matching (standardized mean differences)
Compare outcomes within matched pairs

27.5 Picking a method

decision tree

Can you randomize?
│
├── YES ───────────────────────► Run an RCT (Chapter 26)
└── NO
    ├── Clear before/after + control group ──► Difference-in-Differences
    ├── Treatment by a sharp threshold ──────► Regression Discontinuity
    ├── Have a valid instrument ─────────────► Instrumental Variables
    └── Rich covariates, no clean design ────► Matching / propensity scores

Professional recommendation

Prefer a real experiment whenever feasible. When not, draw the causal diagram (DAG) first to decide what to control for — controlling for the wrong variable (a collider or mediator) adds bias. State your identifying assumption explicitly and test it (e.g. parallel-trends plot).

Common mistakes to avoid

Claiming causation from a regression coefficient on observational data
Controlling for a mediator or collider, which introduces bias
Using DiD without checking the parallel-trends assumption
Adding every available variable as a control "to be safe"
Ignoring unmeasured confounders that no model can fix

Quick cheatsheet

y ~ treated * post -> diff-in-diff effect

parallel trends plot -> validate DiD

PsmPy / nearest-neighbor -> propensity matching

DAG -> decide what to control for

linearmodels IV2SLS -> instrumental variables