Raster Classification and Reclassification
Converting continuous values to discrete categories
Before You Start
You should know
That raster values can be continuous, and that turning them into categories requires drawing boundaries that are not already present in the data.
You will learn
How raster classification schemes work, what breakpoint choices imply, and why classification is always partly an analytical decision rather than a neutral conversion.
Why this matters
Much of practical GIS depends on turning continuous measurements into actionable categories like slope classes, risk classes, and land-cover types.
If this gets hard, focus on…
The central question: where are we choosing to place the boundaries between classes, and why?
Canada’s national land cover dataset — 30-metre resolution, updated roughly every five years from Landsat imagery — classifies every pixel in the country into one of seventeen categories: water, wetland, cropland, urban, treed upland, and so on. The classification is the product of a decision process applied to hundreds of millions of pixels, each represented by a stack of spectral reflectance values in multiple bands. The output — discrete categories on a continuous landscape — is what land managers, carbon accountants, and biodiversity assessors actually work with. But the categories are not in the data; they are imposed on it. And the choice of where to draw the boundaries between classes has a larger effect on the results than most users appreciate.
Raster classification converts continuous fields — elevation, slope, temperature, spectral reflectance, vegetation index — into discrete categories. The conversion requires two decisions: how many classes to create, and where to place the breakpoints between them. Equal-interval classification (divide the range into equal steps) is intuitive but produces classes with very unequal numbers of pixels if the data is skewed. Quantile classification (equal numbers of pixels per class) preserves the count distribution but may split a cluster of very similar values. Natural breaks (Jenks optimisation) minimises within-class variance and often produces the most cartographically satisfying result, but the algorithm is quadratic in the number of classes and requires care on large datasets. This model derives and compares all three schemes, introduces the confusion matrix for evaluating classification accuracy, and discusses when each approach is appropriate.
1. The Question
How do you convert a continuous elevation raster into slope classes: “flat”, “gentle”, “steep”, “very steep”?
Reclassification transforms raster values using decision rules:
Examples:
- Slope categories: 0-5° = flat, 5-15° = gentle, 15-30° = steep, >30° = very steep
- Land cover from NDVI: <0.2 = bare, 0.2-0.4 = sparse veg, 0.4-0.6 = moderate, >0.6 = dense
- Habitat suitability: Combine elevation + slope + aspect into “suitable” vs “unsuitable”
- Fire risk zones: Temperature + humidity + vegetation → low/medium/high risk
The mathematical question: Given continuous input values, how do we assign them to discrete classes efficiently and meaningfully?
Key decisions:
- Number of classes: Too few → information loss; too many → complexity
- Breakpoints: Where to split? Equal intervals? Natural breaks? Quantiles?
- Edge handling: Is 15.0° “gentle” or “steep”?
2. The Conceptual Model
Classification vs. Reclassification
Classification: Assign raw values to meaningful categories - Satellite imagery → land cover classes - Temperature values → climate zones
Reclassification: Transform one categorical raster to another - 10 land cover types → 3 broad categories (urban/forest/agriculture) - Detailed soil types → simplified drainage classes
Both use the same mathematical framework.
Classification Schemes
1. Equal Interval
Divide value range into equal-width bins.
\text{Class } i: \left[\min + i \cdot \frac{\max - \min}{n}, \min + (i+1) \cdot \frac{\max - \min}{n}\right)
Example: Elevation 0-1000m, 5 classes → each class spans 200m
Pros: Simple, intuitive
Cons: May have empty classes or very unbalanced distribution
2. Quantiles (Equal Count)
Each class contains equal number of pixels.
k% quantile: Value below which k% of data falls.
Example: 4 classes → breakpoints at 25th, 50th, 75th percentiles
Pros: Balanced class sizes
Cons: Breakpoints may not align with natural boundaries
3. Natural Breaks (Jenks)
Minimize within-class variance, maximize between-class variance.
Objective: Find breaks that create most homogeneous classes.
Algorithm: Dynamic programming to optimize:
\min \sum_{i=1}^{k} \sum_{x \in \text{class}_i} (x - \bar{x}_i)^2
Pros: Respects data distribution
Cons: Computationally expensive, breakpoints change with data
4. Standard Deviation
Classes based on deviations from mean.
\text{Class boundaries: } \mu - 2\sigma, \mu - \sigma, \mu, \mu + \sigma, \mu + 2\sigma
Pros: Statistical meaning (normal distribution)
Cons: Assumes normal distribution (often violated)
5. Manual/Expert
Domain expert specifies meaningful thresholds.
Example: Slope classes from geomorphology literature - 0-2°: Flat (flooding possible) - 2-5°: Gentle (easy to build on) - 5-15°: Moderate (erosion risk increases) - 15-30°: Steep (difficult access) - >30°: Very steep (landslide risk)
Pros: Incorporates domain knowledge
Cons: Subjective, may not fit specific dataset
The Same Raster Can Be Partitioned In Different Ways
Classification does not just label values. It chooses where to place boundaries in a distribution, and different schemes emphasize different stories in the same data.
Same width bins
Easy to explain, but class sizes can become very unbalanced when values cluster.
Same count per class
Balances the number of pixels in each class, even if some breaks split visually similar values.
Break where the data changes
Often matches the data or domain logic better, but the breaks depend on judgment or optimization choices.
3. Building the Mathematical Model
Simple Threshold Classification
Binary classification:
z_{\text{out}} = \begin{cases} 1 & \text{if } z_{\text{in}} \geq T \\ 0 & \text{if } z_{\text{in}} < T \end{cases}
Example: Water detection from elevation - Threshold T = 0 m (sea level) - Output: 1 = land, 0 = water
Multi-Class Range Classification
Define breakpoints: b_0 < b_1 < b_2 < \cdots < b_n
Classification function:
\text{class}(z) = \begin{cases} 1 & \text{if } b_0 \leq z < b_1 \\ 2 & \text{if } b_1 \leq z < b_2 \\ \vdots \\ n & \text{if } b_{n-1} \leq z < b_n \end{cases}
Implementation:
def classify(value, breaks):
for i, break_value in enumerate(breaks[1:]):
if value < break_value:
return i + 1
return len(breaks)Lookup Table Reclassification
Map specific input values to output values.
Lookup table:
| Input Value | Output Value |
|---|---|
| 1 (Forest) | 1 (Vegetation) |
| 2 (Grass) | 1 (Vegetation) |
| 3 (Crops) | 1 (Vegetation) |
| 4 (Urban) | 2 (Developed) |
| 5 (Water) | 3 (Water) |
Function:
z_{\text{out}} = \text{LUT}[z_{\text{in}}]
Efficient with arrays/dictionaries.
Fuzzy Classification
Instead of hard boundaries, use membership functions.
Example - “Moderate slope” membership:
\mu_{\text{moderate}}(s) = \begin{cases} 0 & s < 5 \\ \frac{s - 5}{10} & 5 \leq s < 15 \\ 1 & 15 \leq s < 25 \\ \frac{35 - s}{10} & 25 \leq s < 35 \\ 0 & s \geq 35 \end{cases}
Value between 0 and 1 indicates degree of membership.
Advantage: Represents uncertainty at boundaries.
4. Worked Example by Hand
Problem: Classify this temperature raster (°C) into 3 categories using equal intervals.
Input:
j=0 j=1 j=2 j=3
i=0 10 15 20 25
i=1 12 18 22 28
i=2 14 16 24 30
i=3 11 19 26 32
Categories:
- Cold (1)
- Moderate (2)
- Hot (3)
Solution
Step 1: Find range
\min = 10°C, \quad \max = 32°C \text{range} = 32 - 10 = 22°C
Step 2: Calculate interval width
\text{width} = \frac{22}{3} = 7.33°C
Step 3: Define breakpoints
- b_0 = 10
- b_1 = 10 + 7.33 = 17.33
- b_2 = 17.33 + 7.33 = 24.67
- b_3 = 32
Classes:
- Cold (1): [10, 17.33)
- Moderate (2): [17.33, 24.67)
- Hot (3): [24.67, 32]
Step 4: Classify each cell
Row 0:
- 10 < 17.33 → 1 (Cold)
- 15 < 17.33 → 1
- 20 ∈ [17.33, 24.67) → 2 (Moderate)
- 25 ≥ 24.67 → 3 (Hot)
Row 1:
- 12 → 1, 18 → 2, 22 → 2, 28 → 3
Row 2:
- 14 → 1, 16 → 1, 24 → 2, 30 → 3
Row 3:
- 11 → 1, 19 → 2, 26 → 3, 32 → 3
Output:
j=0 j=1 j=2 j=3
i=0 1 1 2 3
i=1 1 2 2 3
i=2 1 1 2 3
i=3 1 2 3 3
Class counts:
- Cold (1): 6 cells
- Moderate (2): 6 cells
- Hot (3): 4 cells
Not perfectly balanced (would be 5.33 each) because we used equal intervals, not quantiles.
5. Computational Implementation
Below is an interactive raster classification tool.
Try this:
- Equal interval: Fixed-width bins (may be unbalanced)
- Quantile: Balanced class sizes (breaks at data percentiles)
- Standard deviation: Statistical bins (assumes normal distribution)
- Manual: Set your own thresholds (red lines on histogram)
- Adjust class count: See how distribution changes
- Histogram: Red lines show where breaks occur in data
Key insight: Method choice dramatically affects results—no single “correct” classification.
6. Interpretation
Slope Classification Example
From DEM to actionable information:
1. Calculate slope (degrees) from DEM
2. Classify:
- 0-2°: Suitable for farming, flooding risk
- 2-5°: Good for construction
- 5-15°: Moderate difficulty, erosion control needed
- 15-30°: Forestry, recreation only
- >30°: Hazard zones, protect from development
Result: Planning tool, not just numbers.
NDVI to Land Cover
Thresholds from literature:
NDVI < 0.1: Water, barren land
0.1-0.2: Sparse vegetation (desert)
0.2-0.4: Grassland, shrubland
0.4-0.6: Cropland, mixed vegetation
0.6-0.8: Dense vegetation (forest)
>0.8: Very dense vegetation (rainforest)
Validated against ground truth from field surveys.
Multi-Criteria Suitability
Combine multiple factors:
slope_class = classify(slope, [0, 5, 15, 30])
aspect_class = classify(aspect, [0, 90, 180, 270, 360])
soil_class = reclassify(soil_type, lookup_table)
suitability = (slope_class == 1) AND
(aspect_class IN [2, 3]) AND
(soil_class IN [1, 2])
Boolean result: Suitable (1) or not (0).
7. What Could Go Wrong?
Arbitrary Breakpoints
Equal interval on skewed data:
Data: [1, 1, 2, 2, 2, 3, 3, 50]
Equal intervals (4 classes):
[1, 13.25): 7 values → Class 1
[13.25, 25.5): 0 values → Class 2
[25.5, 37.75): 0 values → Class 3
[37.75, 50]: 1 value → Class 4
Problem: Empty classes, unbalanced.
Solution: Use quantiles or remove outliers first.
Sensitivity to Outliers
One extreme value shifts all breakpoints:
Data: [10, 12, 14, 15, 16, 18, 20, 1000]
Equal intervals with outlier → huge bins
Solution:
- Remove outliers before classification
- Use robust statistics (median, IQR)
- Clip extreme values
Loss of Information
Continuous to categorical loses detail:
Original: 15.2°, 15.8° (0.6° difference)
Classified: Both → Class 2 "gentle" (appear identical)
Original: 14.9°, 15.1° (0.2° difference)
Classified: 14.9° → Class 1, 15.1° → Class 2 (appear very different)
Problem: Boundary artifacts.
Solution: Use buffer zones or fuzzy classification.
Inappropriate Method
Quantiles on categorical data:
Land cover codes: [1, 1, 1, 2, 2, 3, 3, 3]
Quantile classification → meaningless
Solution: Only classify continuous data. Reclassify categorical via lookup tables.
8. Extension: Unsupervised Classification
Automated clustering finds natural groups in data.
K-means algorithm:
1. Initialize k cluster centers randomly
2. Assign each pixel to nearest center
3. Recompute centers as mean of assigned pixels
4. Repeat 2-3 until convergence
For multi-band imagery:
pixel = [band1, band2, band3, ..., bandN]
distance = sqrt(sum((pixel - center)²))
Advantage: No manual thresholds needed.
Disadvantage: Classes may not align with semantic categories.
Example: Classify Landsat image (7 bands) into 10 land cover types automatically.
9. Math Refresher: Quantiles and Percentiles
Definition
p-th quantile (Q_p): Value below which fraction p of data falls.
Example: Median = 0.5 quantile (50th percentile)
Calculation
For sorted data x_1 \leq x_2 \leq \cdots \leq x_n:
Position:
\text{pos} = p \times (n - 1) + 1
If position is integer: Q_p = x_{\text{pos}}
If fractional: Interpolate between x_{\lfloor\text{pos}\rfloor} and x_{\lceil\text{pos}\rceil}
Example: Find 0.25 quantile of [1, 2, 3, 4, 5]
\text{pos} = 0.25 \times (5 - 1) + 1 = 2
Q_{0.25} = x_2 = 2
For Classification
Divide into k equal-count classes:
Breakpoints at quantiles: Q_{1/k}, Q_{2/k}, \ldots, Q_{(k-1)/k}
Example: 4 classes → breaks at 0.25, 0.5, 0.75 quantiles
Summary
- Classification converts continuous rasters to discrete categories
- Five methods: Equal interval, quantile, natural breaks, standard deviation, manual
- Equal interval: Fixed-width bins, simple but may be unbalanced
- Quantile: Equal-count bins, balanced but may group dissimilar values
- Natural breaks: Minimizes within-class variance, computationally expensive
- Manual thresholds: Domain expert knowledge, most meaningful for applications
- Reclassification: Uses lookup tables to transform categorical rasters
- Fuzzy classification: Membership functions instead of hard boundaries
- Applications: Slope classes, land cover mapping, suitability analysis
- Challenges: Outliers, information loss, boundary artifacts
- Method choice depends on data distribution and application requirements