Reciprocal relationships, reverse causality, and temporal ordering

Testing theories with cross-lagged panel models

Charles C. Lanfear

University of Cambridge

Thiago R. Oliveira

University of Manchester

What is this?

We were invited to write an article on Cross-Lagged Panel Models (CLPMs) for the Journal of Developmental and Life Course Criminology

 

CLPMs are commonly used to examine reciprocality in developmental and life course research in Criminology, Psychology, and Sociology

Reciprocality

 

 

Classic criminological concerns

What is this?

We were invited to write a on Cross-Lagged Panel Models (CLPM) for the Journal of Developmental and Life Course Criminology

 

CLPMs are commonly used to examine reciprocality in developmental and life course research in Criminology, Psychology, and Sociology

 

They are also often used thoughtlessly or unnecessarily

From Hell’s heart I stab at thee

We were invited to write a on Cross-Lagged Panel Models (CLPM) for the Journal of Developmental and Life Course Criminology

 

CLPMs are commonly used to examine reciprocality in developmental and life course research in Criminology, Psychology, and Sociology

 

They are also often used thoughtlessly or unnecessarily

This vexes me
This is an effort to provide accessible guidance

 

But first, what is a cross-lagged panel model?

The classic CLPM

 

 

Provided repeated observations, cross-lags “enforce” temporal order

Strength and direction often inferred from cross-lagged path estimates

It is Maslow’s Hammer for panel data

An example: Disorder and fear

 

 

Does this make sense with yearly panel data?

 

 

When would we expect observing disorder to impact fear and vice versa?

Does this align with the theory we’re testing?

Contemporaneous effects

 

 

Can capture immediate effects… but it assumes the problem away

Strong theory might justify this, however

Three key messages

 

Good empirical research starts with a strong theoretical foundation

 

Panel survey data are not appropriate for answering every question

 

You should default to robust estimators

Our approach

 

The world is recursive

  • Non-recursive theory is a symptom of ignoring time or mechanisms
    • E.g., supply and demand is a micro-macro model
  • Use a directed acyclic graphs (DAG) for theories

Data are imperfect

  • Estimators must often handle ambiguity, and assumptions should be clear
  • Use classical structural equation model (SEM) path diagrams for estimators
    • The structural model should be recursive1 and derived from the theoretical DAG

DAG for theory

 

 

Units of time should be clearly stated

No bidirectional paths

Include latent or missing variables

SEM for estimation

 

Encode ambiguities and assumptions, e.g., (in)dependence

The process

Clear theory is a prerequisite before specifying an estimator:

  • Derive recursive model from theory
    • Use a DAG or recursive equation
    • Include unobserved mechanisms when appropriate
  • Specify estimand
    • Quantitative definition of the estimate of interest
  • Specify estimator
    • Contemporaneous (fast) vs. lagged (slow) effects in structure
    • Covariances to address ambiguity and relax assumptions

Theory, estimands, and time

When is a CLPM needed

A cross-lagged panel model is only needed when reciprocal effects over time are presumed in the theoretical model. When the researcher can theoretically assume no effects of the dependent variable on the independent variable over time, other statistical models will be more appropriate than a cross-lagged panel model

We distinguish between two forms of reciprocality:

  • Theoretical reciprocality (substance)
    • We care about both paths; e.g., which is stronger?
  • Reverse causality (nuisance)
    • We care about one path but the other may bias our estimate

Two forms

 

Start with theory

CLPMs are frequently used without careful attention to the theoretical process they are meant to capture… [which] should inform… the definition of the causal quantity of interest—the estimand, which is often left implicit, determined not by theory but by the structure of the available data. When the starting point is not a clearly specified causal process, one risks arriving at an estimator that produces a correct answer to the wrong question.

 

E.g., “the effect of last year’s informal social control on this year’s crime”

The one true causal timing

 

Estimands connect your theoretical question to your data by specifying:

  • Value of exposure
    • E.g., a counterfactual difference
  • Timing of effect

 

what is important is recognizing that there is no “true causal timing” waiting to be discovered because causal timing is a choice a researcher makes when defining their estimand of interest.

Drinking problems

 

what is the true causal timing of drinking an additional pint of beer on one’s perceived wellbeing? What is this effect at 20 seconds, 20 minutes, 20 hours, or 20 years? What about the effect of an additional pint per day after 10 years? In this context, the reader may know intimately which exposure and which lags between measurement of exposure and outcome correspond to detectable and substantively interesting causal effects. Strong theories should be specific about the role of time and thus able to inform what sort of causal timings—whether delay between exposure and outcome or aggregations of exposures—are of substantive interest.

An example

 

Theories1 posit that employment reduces offending—with obvious reverse causality as offending threatens job loss—and we have yearly data

  • If we think employment reduces offending by keeping people involved in day-to-day activities… we probably want daily or weekly data instead!
  • But if we think employment reduces offending by gradually committing people to conventional life… our yearly data might be useful!

A feasible estimand: The expected reduction in number of self-reported offenses in the present year per additional month of full-time work in the past year

Good, an answerable question!

 

 

Now let’s see how everything can go terribly wrong

Panel data and composite variables

It comes and goes in waves

 

Observations rarely correspond perfectly to a theoretical timing of interest

We discretize for convenience: time is continuous!1

In discrete time, missing data are infinite!

Composite period variables

Many common measures in panel studies are composites:

  • Time spent (un)employed
  • Number of crimes committed

When our observation periods do not correspond to the timing of our underlying theoretical causal model, these period measures become what we will call composite period variables. They are a composite because the measured value is a deterministic sum of values that would be obtained by dividing the observation period up into smaller intervals and aggregating across them.

Composite variable are tricky

Composite variables

  • Can be usefully represented in DAGs
  • Raise horrible unique causal identification issues

Composite \(X \rightarrow Y\)

\(X_A \Rightarrow X\) indicates deterministic child-parent relationship

Violated assumptions

 

Ignorability / Exchangeability: Treatment status is (conditionally) independent of potential outcomes

 

\(X_A\) and \(X_B\) may have different causes!

 

Consistency: No hidden versions of treatment

 

One value of \(X\) can correspond to many permutations of \(X_A\) and \(X_B\)!

Oh no

 

Even worse when components aren’t simultaneously determined

Oh noes

I offer no solution

 

 

 

I just want it to haunt your dreams

 

But you should consider how it might impact your research design

Three common applied problems

Unobserved time-stable heterogeneity

Unobserved time-stable heterogeneity

While we may be interested in change over time within individuals, theory often leads us to expect relatively stable differences between individuals

Sometimes these are nuisances, other times they are substantively interesting

 

Unfortunately:

  • Autoregressive terms cannot fully capture time-stable confounding
  • Cross-lagged terms conflate within-unit change with between-unit differences
  • Can’t just toss in fixed effects due to Nickell bias from endogenous lag
    • Bias shrinks in proportion to \(T\), but most panel surveys are small \(T\)

Work, crime, and self-control

Confounded by an unmeasured time-stable trait of individuals

Solution 1: Allison et al.’s (2017) ML-SEM

One-sided Mundlak estimator for reverse causality and nuisance heterogeneity

Solution 2: Hamaker et al.’s (2015) RI-CLPM

Better for theoretical reciprocality and explicitly separating effects

Temporal misspecification

Temporal misspecification

As illustrated by Vaisey & Miles (2017), if…

  • \(y = \beta x_t + \alpha_i + e_{it}\) is the “true” contemporaneous model1
  • \(y = \beta^* x_{t-1} + \alpha_i + e_{it}\), the lagged model is estimated instead

The resulting “bias” is: \(E(\beta^*) = -0.5\beta\)

Incorrect temporal order can reverse signswhich is bad:

  • False negatives: Confidently reject theories when true
  • Implies opposite effects
  • Lag-only is most people’s default specification

Illustration

This is just a violated independence assumption

You’ll see it everywhere now

 

 

Be very suspicious of unexpected reversed signs

Robust estimators

Proper solution depends on what assumptions about temporal order theory allows us to make

Solutions

  • There is no “true” causal timing to discover, there is only a target estimand
    • Motivate estimand with theory
  • Use robust estimators:
    • Contemporaneous covariance when “fast” effects are ambiguous
    • Contemporaneous effects when they’re not
  • If an ambiguous contemporaneous path is of substantive interest:
    • Consider collecting better data, you fool
    • As a last resort, non-recursive models

Here be dragons

 

IV assumptions are strong but a pint may make them believable

Low inter-temporal variation

Low inter-temporal variation

 

Cross-lagged panel models are models of change over time

 

If things don’t change, you have nothing to explain

 

  • \(Var(Y_2|Y_1) \rightarrow 0\) as \(\rho(Y_1,Y_2) \rightarrow 1\)
  • Unstable and imprecise estimates
  • Measurement error becomes a proportionally larger component
  • Common with short observation periods and stable traits

Example paper

 

Sometimes a near-perfect multicollinearity problem:

 

What remains to explain when prior values account for 90% of the variation in present values?

Solutions

Data collection

  • Collect data over a longer period, you fool
  • Embed an experiment or look for exogenous shocks
    • A more plausible IV estimator
  • Oversample for change
    • Change may be rapid for subgroups

 

Estimation

  • Consider a cross-sectional analysis
  • Use longer lags
  • Different aggregations
    • E.g., smaller spatial units
  • Measurement models for error
    • Do this anyway because outcomes are regressors and random measurement error attenuates estimates

Or perhaps give up and go get a pint

Giving up

Consider doing something else

researchers should think carefully about whether the data they have are suitable for answering their questions to begin with… panel survey data are better equipped to test slow and lasting effects proposed by developmental and life course theories… than rapid or transient processes from cognitive and interactional theories… (22)

Consider experiments, momentary assessments, etc.

Panel data with narrow observation intervals are sometimes also poorly suited for testing slow processes; high intertemporal correlations and proportions of variance explained may be signals there is insufficient change over time to produce precise estimates.

Just because you have panel data doesn’t mean you should use it

Again, but louder

  • Separate theory from estimation
    • Theory comes first
      • a strong estimation strategy cannot make up for a theoretical deficit
    • Then a relevant estimand
    • Then an appropriate estimator last
  • Panel survey data are not appropriate for answering every question
  • Default to robust estimators and be clear about assumptions
    • Contemporaneous covariances or effects
    • ML-SEM and RI-CLPM

 

Feedback and Questions

 

I am pleased to share the draft!

 

Contact:

Charles C. Lanfear
Institute of Criminology
University of Cambridge
cl948@cam.ac.uk