How to Choose a Model
When to use analytical equations, GIS workflows, statistical learning, or simulation
Before You Start
You should know
That models are simplified representations of geographic systems, and that different chapters in the book use different kinds of simplification.
You will learn
How to choose between analytical equations, GIS/spatial-analysis workflows, statistical or machine-learning models, and simulation approaches.
Why this matters
Many modelling mistakes happen before any code runs: the model family does not match the question.
If this gets hard, focus on…
The central question: are you trying to explain a mechanism, estimate a relationship, transform spatial data, or simulate a process through time?
A forest fire, a trade corridor, a river basin, and a housing market can all be “modelled,” but not always with the same kind of model. Some questions want a short equation that makes the governing relationship visible. Others want a GIS workflow that transforms geometry into a useful answer. Others need a statistical model because the relationship is learned from data rather than specified from physics. Others still need simulation because the outcome depends on many interacting steps unfolding through time. The hard part is not just building a model. It is choosing the right kind of model before you begin.
This chapter gives the book a shared model-choice language. The goal is not to enforce one rigid taxonomy. The goal is to help readers make a better first decision and to understand the tradeoffs that come with each family.
1. The Question
How do we know what kind of model to build?
A useful first pass is to ask four questions:
- Is the main goal explanation, prediction, spatial transformation, or scenario exploration?
- Do we know the mechanism well enough to write it down directly?
- Do we mainly have equations, observations, or geometry?
- Do interactions unfold step by step through time, or can they be summarized more simply?
Those questions usually push the problem toward one of four broad model families:
- analytical models
- GIS / spatial-analysis workflows
- statistical or machine-learning models
- simulation models
2. The Conceptual Model
Choose The Model Family By Matching The Question To The Type Of Structure You Need
The best first decision is rarely about software. It is about structure. Do you need a governing relationship, a spatial operation, a learned pattern, or an evolving process?
Use When Mechanism Is Clear
Best for rates, balances, geometry, and compact equations where the goal is to understand how the system works and how parameters control it.
Use When The Main Task Is Spatial Transformation
Best for overlays, buffers, joins, resampling, viewsheds, and other operations that convert spatial inputs into derived spatial outputs.
Use When The Relationship Must Be Learned From Data
Best for prediction, classification, interpolation, and pattern extraction when the mechanism is partly unknown or too complex to specify directly.
Use When Dynamics And Interaction Matter
Best for systems that evolve step by step, include feedbacks, or depend on many local interactions such as spread, routing, or adaptive behavior.
1. Analytical models
Analytical models are the most compact. They write the relationship down directly:
- slope and rate
- exponential growth
- logistic equilibrium
- gravity-style interaction
- energy balance
Use them when:
- the governing mechanism is already reasonably understood
- interpretability matters more than flexibility
- parameter effects need to stay visible
Their strength is clarity. Their weakness is that they can become unrealistic if the real system is more heterogeneous or interactive than the equation allows.
2. GIS and spatial-analysis workflows
Some problems are not mainly about unknown relationships. They are about spatial operations:
- which parcels lie within a floodplain?
- which cells drain to this outlet?
- which houses are within 500 m of a road?
- what is visible from this ridge?
These are often best answered by GIS-style workflows rather than by fitting a statistical model or simulating agents. The strength of this family is direct spatial logic. The weakness is that it may transform data cleanly without necessarily explaining why the pattern exists.
3. Statistical and machine-learning models
Statistical models are strongest when the relationship must be estimated from data:
- regression for a continuous target
- classification for land cover
- dimensionality reduction for high-dimensional imagery
- probabilistic models for noisy observations
Use them when:
- observations are rich
- the mechanism is partial, uncertain, or too complicated to encode directly
- prediction or inference is the main goal
Their strength is flexibility and empirical performance. Their weakness is that they can look accurate while still being fragile, uninterpretable, or badly validated.
4. Simulation models
Simulation models advance a system through time or interaction steps:
- fire spread
- urban growth
- flood propagation
- agent-based movement
- cellular automata
- system-dynamics feedback models
Use them when:
- timing and path dependence matter
- local interactions create large-scale patterns
- scenario exploration matters more than a single closed-form answer
Their strength is realism of process and scenario power. Their weakness is that they can become parameter-heavy and difficult to validate.
3. A Decision Workflow
The following workflow is deliberately simple:
- If the core task is a spatial operation, start with GIS.
- If the main mechanism is known and compact, start with an analytical model.
- If the relationship must be learned from observations, start with a statistical model.
- If the system evolves through many interacting steps, start with simulation.
A few examples
| Question | Better first model family | Why |
|---|---|---|
| Which land parcels intersect a hazard zone? | GIS workflow | this is a spatial relation problem |
| How does temperature change with elevation? | analytical model | mechanism is compact and interpretable |
| Can we predict soil moisture from imagery? | statistical model | relationship is learned from data |
| How might a wildfire spread under new wind scenarios? | simulation | path dependence and time evolution matter |
Hybrid models are normal
Many strong projects combine families:
- GIS preprocessing + regression
- analytical model + Monte Carlo uncertainty
- simulation + statistical calibration
- remote-sensing features + spatial classification + process interpretation
The choice is therefore not “pick one forever.” It is “pick the right backbone first.”
4. Worked Example by Hand
Imagine four different questions about a mountain watershed.
Question A: Which slopes are steeper than 30°?
This is a GIS / terrain operation problem. We already know the rule and need to apply it across a raster.
Question B: How much does temperature fall with elevation?
This is an analytical model problem. A lapse-rate equation is compact, interpretable, and easy to check.
Question C: Can we predict SWE from topography and remote sensing?
This is a statistical model problem. The mapping from features to SWE is too variable to specify with one small equation.
Question D: How might avalanche risk evolve during a storm cycle?
This is a simulation problem. The answer depends on time, accumulation, wind loading, and sequence of events.
The key lesson is that “mountain watershed modelling” is not one type of model. Different sub-questions legitimately want different families.
5. What Could Go Wrong?
Forcing a favorite method onto every problem
If every question becomes a machine-learning problem or every question becomes a GIS problem, the method is leading the question instead of serving it.
Mistaking data abundance for model appropriateness
Lots of observations do not automatically mean a statistical model is best. Sometimes the physics is simple enough that an analytical model is better.
Using simulation when a simpler model would answer the question
If the real need is a threshold, ranking, or first-order estimate, a large simulation may add complexity without clarity.
Using a simple model where path dependence is the whole story
Some systems cannot be compressed safely into a static equation because the order of events matters.
Summary
- Analytical models are strongest when the mechanism is known and interpretable.
- GIS workflows are strongest when the task is a spatial transformation or relation.
- Statistical models are strongest when the relationship must be learned from data.
- Simulation is strongest when dynamics, feedbacks, and interaction matter.
- Good modelling often combines families, but good projects still begin by choosing the right backbone.