A brief introduction
University of Cambridge
Social scientists are usually interested in causal relationships1
Causal inference is about answering these questions
\(X\) causes \(Y\) if changing only \(X\) would change \(Y\)
This is quite broad:
All that matters for the relationship to be causal is that if the distribution of \(X\) was different, it would result in a different distribution of \(Y\)
We’ll focus on two-valued \(X\): a binary treatment
We can define a causal effect—an estimand of interest—as the difference (\(\delta\)) between two worlds or potential outcomes:
\[\delta = Y^1 - Y^0\]
This raises a fundamental problem
We want to know the difference between the factual (what happened) and the counterfactual (what did not)
This is a missing data problem
Let us assume:
We can estimate \(\delta\) if we can estimate the value of the unobserved outcome(s)
Causal inference is about estimating these counterfactuals we cannot see
What is the effect of clearing vacant lots on nearby violent crime?1
We can look at vacant lots and see what happens
But we only see one outcome per lot:
What could we do?
We could compare different lots:
Use the difference (\(\delta\)) in observed crime as the causal effect
What if the difference is because the lots are different?
We could look at identical looking lots
But if those lots are identical, why did one get cleared and not the other?
We can’t be sure they’re similar without knowing why it got cleared
Could compare the same lots:
Use the before-and-after difference (\(\delta\)) as the causal effect
But what if the change was going to happen anyway?
Maybe the crime change even caused the clearing!
What if the lot was cleared because it would make a difference?
Maybe there would have been no difference in other lots!
These are really the same problem:
Differences in potential outcomes between cleared and uncleared lots
Put in two equivalent ways:
How do we make sure the cleared and uncleared lots are the same?
We can do this with a randomized controlled trial:
The probability of being treated is now independent of the outcomes:
\[(Y^0,Y^1) \perp T \]
While all the units may have different potential outcomes, the outcomes will not differ systematically between treatment groups
This means, essentially, we can ignore the treatment assignment process
If we randomly assign a treatment, we can estimate a causal effect as the average difference in outcomes between the treated units and the untreated units
This is the average treatment effect:
\[ATE = E[Y^1]-E[Y^0]\]
You can calculate this with a cross-tab1
But what if we’re interested in things we can’t randomly assign?
Causal inference hinges on substituting in plausible counterfactuals in place of the potential outcomes we never observe
Substituting in counterfactuals requires knowing why units got treated
This is the same as with randomization:
We can again achieve ignorability conditional on this information about treatment assignment
Ignorability can be achieved if we know \(Z\), which is either (or both):
This requires an additional positivity assumption:
Every level of \(Z\) must have a non-zero probability of receiving every treatment
\[(Y^0,T^1) \perp T|Z,\;\; 0<Pr(T=t|Z)<1 \]
We can then calculate the conditional average treatment effect (CATE):
\[CATE=E(Y^1-Y^0|Z=z)\]
But there are some other concerns

Must have well-defined potential outcomes
Consistency:
No interference:
Commonly violated in experiments and observational research
Nearly all studies assume symmetry—but reality is often asymmetric
Occurs when some units’ treatments affect other units’ treatments
These spillovers are common in theory and practice:
Can often rule out in experiments, but…
These issues are:
But if you’re aware, they can often be addressed with:
Consistency
Use realistic treatments
Pay attention to implementation
Measurement
Model specification
Interference
And it is okay to give up sometimes
Causal inference hinges on having the right counterfactual
What if we don’t know what \(Y^{0}\) or \(Y^{1}\) we want?
What if the \(Y_{t=0}^{0}\) we have is not similar to \(Y_{t=1}^{0}\)?
Plausible counterfactuals come from theory and contextual knowledge
Potential outcomes is compatible with and complementary to structural causal models, i.e., DAGs1
Use both to inform your thinking
Know and state counterfactuals
Ignorability (i.e., non-spuriousness) is important but it isn’t everything
Consistency violations rarely considered
Interference is common and more than a nuisance
Be critical researchers, consumers, and reviewers
Contact:
Charles C. Lanfear
Institute of Criminology
University of Cambridge
cl948@cam.ac.uk