Causal Inference for Geographic Questions
Correlation, confounding, and how to ask whether a place-based factor actually changes an outcome
Before You Start
You should know
That two variables can move together without one causing the other.
You will learn
How causal questions differ from predictive ones, why geographic data is especially vulnerable to confounding, and what design habits make causal claims more credible.
Why this matters
Many of the book’s biggest policy questions are causal: does transit improve access, does density raise productivity, does wildfire smoke worsen health outcomes, does green infrastructure reduce flooding?
If this gets hard, focus on…
Keep asking: compared with what credible alternative world?
Suppose dense cities show higher wages than smaller cities. Does density itself raise productivity? Maybe. But maybe higher-skill workers choose dense cities, firms sort into stronger labour markets, or transport hubs historically attracted both people and investment long before modern wages were measured. The observed pattern is real, but the mechanism is not automatically identified. This is the central problem of causal inference in geography: places differ from one another for many reasons at once.
This chapter introduces a practical causal-inference framework for geographic questions. It is not a full econometrics course. The goal is to help readers stop making accidental causal claims from observational maps and regressions. We will use counterfactual thinking, confounding diagrams, and a few practical design strategies that recur across urban geography, environmental health, hazard studies, and policy evaluation.
1. The Question
When can we say a geographic factor actually changes an outcome rather than merely being associated with it?
Examples:
- Does proximity to care improve survival, or do healthier populations sort into well-served areas?
- Does urban tree cover reduce local heat exposure, or is tree cover simply higher in wealthier neighbourhoods that differ in many other ways?
- Does a new transit line increase access and employment, or was it built in areas already trending upward?
The mathematical question: How do we compare what happened with a credible estimate of what would have happened otherwise?
2. Counterfactual Thinking
A causal claim compares:
- observed outcome under exposure
- counterfactual outcome without that exposure
For one place at one time, we never observe both directly.
So causal inference tries to construct a comparison that approximates the missing counterfactual.
The causal effect
For unit i:
\tau_i = Y_i(1) - Y_i(0)
Where:
- Y_i(1) = outcome if exposed or treated
- Y_i(0) = outcome if unexposed or untreated
The problem is that we observe only one of these.
3. Confounding
A confounder affects both the exposure and the outcome.
Example:
- neighbourhood income may affect both tree cover and heat vulnerability
If we ignore income, we may incorrectly attribute the whole observed heat difference to trees.
The Main Causal Error Is Mistaking A Shared Driver For A Treatment Effect
The discipline here is visual and conceptual before it is statistical. Draw the possible pathways, look for shared causes, and decide what comparison would block the confounding path.
How to read the diagram
- The green path is the effect we want to learn about.
- The purple paths show a shared cause that can create a misleading association even if the green effect is small.
- Causal design tries to block or neutralize those backdoor paths.
4. Why Geography Makes This Hard
Sorting and selection
People, firms, and infrastructure do not locate randomly.
Spatial autocorrelation
Nearby places share background conditions, so apparent effects can partly reflect shared context rather than treatment.
History matters
Past policy, land use, and infrastructure shape both present exposures and present outcomes.
Interference
One place’s treatment can affect nearby places. A new transit line or wildfire smoke plume does not respect administrative boundaries.
5. Design Strategies
Adjustment with observed covariates
Regression can help if key confounders are measured well.
Useful when:
- confounders are known
- treatment assignment is not too extreme
- overlap exists between treated and untreated units
Matching or weighting
Construct treated and untreated groups that look more comparable on observed covariates.
Difference-in-differences
Compare before-after change in treated areas against before-after change in control areas.
Best when:
- intervention timing is clear
- untreated comparison areas exist
- parallel trends are plausible
Instrumental variables
Use a variable that shifts exposure but affects the outcome only through that exposure.
Useful but demanding. The assumptions are strong.
Regression discontinuity
Exploit a threshold or border where units just above and below a cutoff are similar except for treatment.
6. Worked Example
Question:
Does a new urban greening program reduce land-surface temperature?
Naive comparison
Compare neighbourhoods with high tree cover to those with low tree cover.
Problem:
High-tree areas may also have:
- higher income
- lower building density
- larger lots
- more parkland historically
So the naive comparison mixes treatment effect with pre-existing differences.
Better design
Suppose the greening program was rolled out in 2024 only in neighbourhoods crossing a heat-vulnerability threshold. We now have:
- pre-2024 temperature data
- post-2024 temperature data
- treated and untreated neighbourhoods near the threshold
Now the comparison can be framed as:
\text{Effect} \approx (\text{treated after} - \text{treated before}) - (\text{control after} - \text{control before})
This does not guarantee truth, but it is much closer to a causal design than a simple cross-sectional map comparison.
7. Prediction Is Not The Same As Explanation
A highly predictive model can still be poor for causal inference.
Why?
- prediction cares about any stable signal
- causal inference cares about isolating one mechanism from competing explanations
A model might predict hospital visits from smoke concentration well, but causal interpretation still depends on whether confounding, measurement error, or selective exposure has been addressed.
8. Good Causal Questions
Better:
- What happened to flood claims after green infrastructure was installed, relative to similar untreated areas?
- What happened to asthma visits during smoke events, relative to comparable low-smoke periods or nearby low-exposure places?
Worse:
- Which map variable has the strongest coefficient?
Coefficients are not automatically causal estimates.
9. Summary
Causal inference in geography asks what would have happened otherwise, not just what variables co-move on a map.
- causal claims require a counterfactual comparison
- confounding is the main obstacle
- spatial sorting, shared context, and history make geography especially challenging
- stronger research design matters as much as stronger regression
The reward is worth it: better decisions, more credible policy claims, and less confusion between prediction and explanation.
10. Try It Yourself
Pick one chapter elsewhere in the book and ask whether its main empirical claim is:
- descriptive
- predictive
- causal
Then ask what design would be needed to support the causal version of that claim. That habit alone will raise the quality of inference across the whole project.