Causal Inference for Geographic Questions

Correlation, confounding, and how to ask whether a place-based factor actually changes an outcome

Published

April 4, 2026

Before You Start

You should know
That two variables can move together without one causing the other.

You will learn
How causal questions differ from predictive ones, why geographic data is especially vulnerable to confounding, and what design habits make causal claims more credible.

Why this matters
Many of the book’s biggest policy questions are causal: does transit improve access, does density raise productivity, does wildfire smoke worsen health outcomes, does green infrastructure reduce flooding?

If this gets hard, focus on…
Keep asking: compared with what credible alternative world?

Suppose dense cities show higher wages than smaller cities. Does density itself raise productivity? Maybe. But maybe higher-skill workers choose dense cities, firms sort into stronger labour markets, or transport hubs historically attracted both people and investment long before modern wages were measured. The observed pattern is real, but the mechanism is not automatically identified. This is the central problem of causal inference in geography: places differ from one another for many reasons at once.

This chapter introduces a practical causal-inference framework for geographic questions. It is not a full econometrics course. The goal is to help readers stop making accidental causal claims from observational maps and regressions. We will use counterfactual thinking, confounding diagrams, and a few practical design strategies that recur across urban geography, environmental health, hazard studies, and policy evaluation.

1. The Question

When can we say a geographic factor actually changes an outcome rather than merely being associated with it?

Examples:

Does proximity to care improve survival, or do healthier populations sort into well-served areas?
Does urban tree cover reduce local heat exposure, or is tree cover simply higher in wealthier neighbourhoods that differ in many other ways?
Does a new transit line increase access and employment, or was it built in areas already trending upward?

The mathematical question: How do we compare what happened with a credible estimate of what would have happened otherwise?

2. Counterfactual Thinking

A causal claim compares:

observed outcome under exposure
counterfactual outcome without that exposure

For one place at one time, we never observe both directly.

So causal inference tries to construct a comparison that approximates the missing counterfactual.

The causal effect

For unit i:

\tau_i = Y_i(1) - Y_i(0)

Where:

Y_i(1) = outcome if exposed or treated
Y_i(0) = outcome if unexposed or untreated

The problem is that we observe only one of these.

3. Confounding

A confounder affects both the exposure and the outcome.

Example:

neighbourhood income may affect both tree cover and heat vulnerability

If we ignore income, we may incorrectly attribute the whole observed heat difference to trees.

Confounding

The Main Causal Error Is Mistaking A Shared Driver For A Treatment Effect

The discipline here is visual and conceptual before it is statistical. Draw the possible pathways, look for shared causes, and decide what comparison would block the confounding path.

How to read the diagram

The green path is the effect we want to learn about.
The purple paths show a shared cause that can create a misleading association even if the green effect is small.
Causal design tries to block or neutralize those backdoor paths.

If we compare hot and cool neighbourhoods without accounting for shared drivers, we risk attributing too much to the visible variable and too little to the hidden structure behind it.

4. Why Geography Makes This Hard

Sorting and selection

People, firms, and infrastructure do not locate randomly.

Spatial autocorrelation

Nearby places share background conditions, so apparent effects can partly reflect shared context rather than treatment.

History matters

Past policy, land use, and infrastructure shape both present exposures and present outcomes.

Interference

One place’s treatment can affect nearby places. A new transit line or wildfire smoke plume does not respect administrative boundaries.

5. Design Strategies

Adjustment with observed covariates

Regression can help if key confounders are measured well.

Useful when:

confounders are known
treatment assignment is not too extreme
overlap exists between treated and untreated units

Matching or weighting

Construct treated and untreated groups that look more comparable on observed covariates.

Difference-in-differences

Compare before-after change in treated areas against before-after change in control areas.

Best when:

intervention timing is clear
untreated comparison areas exist
parallel trends are plausible

Instrumental variables

Use a variable that shifts exposure but affects the outcome only through that exposure.

Useful but demanding. The assumptions are strong.

Regression discontinuity

Exploit a threshold or border where units just above and below a cutoff are similar except for treatment.

6. Worked Example

Question:

Does a new urban greening program reduce land-surface temperature?

Naive comparison

Compare neighbourhoods with high tree cover to those with low tree cover.

Problem:

High-tree areas may also have:

higher income
lower building density
larger lots
more parkland historically

So the naive comparison mixes treatment effect with pre-existing differences.

Better design

Suppose the greening program was rolled out in 2024 only in neighbourhoods crossing a heat-vulnerability threshold. We now have:

pre-2024 temperature data
post-2024 temperature data
treated and untreated neighbourhoods near the threshold

Now the comparison can be framed as:

\text{Effect} \approx (\text{treated after} - \text{treated before}) - (\text{control after} - \text{control before})

This does not guarantee truth, but it is much closer to a causal design than a simple cross-sectional map comparison.

7. Prediction Is Not The Same As Explanation

A highly predictive model can still be poor for causal inference.

Why?

prediction cares about any stable signal
causal inference cares about isolating one mechanism from competing explanations

A model might predict hospital visits from smoke concentration well, but causal interpretation still depends on whether confounding, measurement error, or selective exposure has been addressed.

8. Good Causal Questions

Better:

What happened to flood claims after green infrastructure was installed, relative to similar untreated areas?
What happened to asthma visits during smoke events, relative to comparable low-smoke periods or nearby low-exposure places?

Worse:

Which map variable has the strongest coefficient?

Coefficients are not automatically causal estimates.

9. Summary

Causal inference in geography asks what would have happened otherwise, not just what variables co-move on a map.

causal claims require a counterfactual comparison
confounding is the main obstacle
spatial sorting, shared context, and history make geography especially challenging
stronger research design matters as much as stronger regression

The reward is worth it: better decisions, more credible policy claims, and less confusion between prediction and explanation.

10. Try It Yourself

Pick one chapter elsewhere in the book and ask whether its main empirical claim is:

descriptive
predictive
causal

Then ask what design would be needed to support the causal version of that claim. That habit alone will raise the quality of inference across the whole project.