Beyond Mapping
III
|
Map
Analysis book with companion CD-ROM for hands-on exercises and further reading |
Computer Processing Aids Spatial Neighborhood
Analysis — discusses
approaches for calculating slope and profile
Milking Spatial Context Information — describes
a procedure for deriving a customer density surface
Spatially Aggregated Reporting: The Probability is
Good — discusses
techniques for smoothing “salt and pepper” results and deriving
probability surfaces from aggregated incident records
Nearby Things Are More Alike — use of decay functions in weight-averaging
surrounding conditions
Filtering for the Good Stuff — investigates a couple of spatial filters for
assessing neighborhood connectivity and variability
Author’s Notes: The figures in this topic use MapCalcTM software. An educational CD with online text, exercises
and databases for “hands-on” experience in these and other grid-based analysis
procedures is available for US$21.95 plus shipping and handling (www.farmgis.com/products/software/mapcalc/
).
<Click here> right-click to
download a printer-friendly version of this topic (.pdf).
(Back to the Table of Contents)
______________________________
Computer Processing Aids Spatial
Neighborhood Analysis
(GeoWorld, October 2005, pg. 18-19)
This and the
following sections investigate a set of analytic tools concerned with
summarizing information surrounding a map location. Technically stated, the processing involves
“analysis of spatially defined neighborhoods for a map location within the
context of its neighboring locations.”
Four steps are involved in neighborhood analysis— 1) define the
neighborhood, 2) identify map values within the neighborhood, 3) summarize the
values and 4) assign the summary statistic to the focus location. Then repeat the process for every location in
a project area.
The
neighborhood values are obtained by a “roving window” moving about a map. To conceptualize the process, imagine a
French window with nine panes looking straight down onto a portion of the
landscape. If your objective was to
summarize terrain steepness from a map of digital elevation values, you would
note the nine elevation values within the window panes, and then summarize the
3-dimensional surface they form.
Now imagine the
nine values become balls floating at their respective elevation. Drape a sheet over them like the magician
places a sheet over his suspended assistant (who says
Figure
1. At a
location, the eight individual slopes can be calculated for a 3x3 window and
then summarized for the maximum, minimum, median and average slope.
Figure 1 shows a
small portion of a typical elevation data set, with each cell containing a
value representing its overall elevation.
In the highlighted 3x3 window there are eight individual slopes, as
shown in the calculations on the right side of the figure. The steepest slope in the window is 52%
formed by the center and the NW neighboring cell. The minimum slope is 11% in the NE
direction.
To get an
appreciation of this processing, shift the window one column to the right and,
on your own, run through the calculations using the focal elevation value of
459. Now imagine doing that a million
times as the roving window moves an entire project area—whew!!!
But what about
the general slope throughout the entire 3x3 analysis window? One estimate is 29%, the arithmetic average
of the eight individual slopes. Another
general characterization could be 30%, the median of slope values. But let's stretch the thinking a bit more. Imagine that the nine elevation values become
balls floating above their respective locations, as shown in Figure 2. Mentally insert a plane and shift it about
until it is positioned to minimize the overall distances from the plane to the
balls. The result is a "best-fitted
plane" summarizing the overall slope in the 3x3 window.
Figure 2.
Best-Fitted Plane and Vector
Algebra can be used to calculate overall slope.
Techy-types
will recognize this process as similar to fitting a regression line to a set of
data points in two-dimensional space. In
this case, it’s a plane in three-dimensional space. There is an intimidating set of equations
involved, with a lot of Greek letters and subscripts to "minimize the sum
of the squared deviations" from the plane to the points. Solid geometry calculations, based on the
plane's "direction cosines," are used to determine the slope (and
aspect) of the plane.
Another
procedure for fitting a plane to the elevation data uses vector algebra, as
illustrated in the right portion of Figure 2.
In concept, the mathematics draws each of the eight slopes as a line in
the proper direction and relative length of the slope value (individual
vectors). Now comes the fun part. Starting with the NW line, successively
connect the lines as shown in the figure (cumulative vectors). The civil engineer will recognize this
procedure as similar to the latitude and departure sums in "closing a
survey transect." The length of the
“resultant vector” is the slope (and direction is the aspect).
In addition to
slope and aspect, a map of the surface profiles can be computed (see figure
3). Imagine the terrain surface as a
loaf of bread, fresh from the oven. Now
start slicing the loaf and pull away an individual slice. Look at it in profile concentrating on the line
formed by the top crust. From left to
right, the line goes up and down in accordance with the valleys and ridges it
sliced through.
Use your arms
to mimic the fundamental shapes along the line.
A 'V' shape with both arms up for a valley. An inverted 'V' shape with both arms down for
a ridge. Actually there are only nine
fundamental profile classes (distinct positions for your two arms). Values one through nine will serve as our
numerical summary of profile.
Figure 3.
A 3x1 roving window is used to summarize surface profile.
The result of
all this arm waving is a profile map— the continuous distribution terrain
profiles viewed from a specified direction.
Provided your elevation data is at the proper resolution, it's a big
help in finding ridges and valleys running in a certain direction. Or, if you look from two opposing directions
(orthogonal) and put the profile maps together, a location with an inverted 'V'
in both directions is likely a peak.
There is a lot
more to neighborhood analysis than just characterizing the lumps and bumps of
the terrain. What would happen if you
created a slope map of a slope map? Or a
slope map of a barometric pressure map?
Or of a cost surface? What would
happen if the window wasn't a fixed geometric shape? Say a ten minute drive window. I wonder what the average age and income is
for the population within such a bazaar window?
Keep reading for more on neighborhood analysis.
Milking Spatial Context Information
(GeoWorld, November 2005, pg. 18-19)
The previous
discussion focused on procedures for analyzing spatially-defined neighborhoods
to derive maps of slope, aspect and profile. These techniques fall
into the first of two broad classes of neighborhood analysis—Characterizing
Surface Configuration and Summarizing map values (see figure 1).
Figure 1.
Fundamental classes of neighborhood analysis operations.
It is important
to note that all neighborhood analyses involve mathematical or statistical
summary of values on an existing map that occur within a roving window. As the window is moved throughout a project
area, the summary value is stored for the grid location at the center of the
window resulting in a new map layer reflecting neighboring characteristics or
conditions.
The difference
between the two classes of neighborhood analysis techniques is in the treatment
of the values—implied surface
configuration or direct numerical
summary. Figure 2 shows a direct
numerical summary identifying the number of customers within a quarter of a
mile of every location within a project area.

Figure 2. Approach used in deriving a
Customer Density surface from a map of customer locations.
The procedure
uses a “roving window” to collect neighboring map values and compute the total
number of customers in the neighborhood.
In this example, the window is positioned at a location that computes a
total of 91 customers within quarter-mile.
Note that the
input data is a discrete placement of customers while the output is a
continuous surface showing the gradient of customer density. While the example location does not even have
a single customer, it has an extremely high customer density because there are
a lot of customers surrounding it.
The map
displays on the right show the results of the processing for the entire
area. A traditional vector

Figure 3. Calculations involved in
deriving customer density.
Figure 3
illustrates how the information was derived.
The upper-right map is a display of the discrete customer locations of
the neighborhood of values surrounding the “focal” cell. The large graphic on the right shows this
same information with the actual map values superimposed. Actually, the values are from an Excel
worksheet with the column and row totals indicated along the right and bottom
margins. The row (and column) sum
identifies the total number off customers within the window—91 total customers
within a quarter-mile radius.
This value is
assigned to the focal cell location as depicted in the lower-left map. Now imagine moving the “Excel window” to next
cell on the right, determine the total number of customers and assign the
result—then on to the next location, and the next, and the next, etc. The process is repeated for every location in
the project area to derive the customer density surface.
The processing
summarizes the map values occurring within a location’s neighborhood (roving
window). In this case the resultant
value was the sum of all the values. But
summaries other than Total can be
used—Average, StDev, CoffVar, Maximum,
Minimum, Median, Majority, Minority, Diversity, Deviation, Proportion, Custom
Filters, and Spatial Interpolation.
The remainder of this series will focus on how these techniques can be
used to derive valuable insight into the conditions and characteristics
surrounding locations—analyzing their spatially-defined neighborhoods.
Spatially Aggregated
Reporting: The Probability is Good
(GeoWorld, January 2006, pg. 16-17)
A couple of the procedures used in the wildfire
modeling warrant “under-the-hood” discussion neighborhood operations—1)
smoothing the results for dominant patterns and 2) deriving wildfire ignition
probability based on historical fire records.
Figure 1. Smoothing
eliminates the “salt and pepper” effect of isolated calculations to uncover
dominant patterns useful for decision-making.
Figure 1 illustrates the effect smoothing
raw calculations of wildfire risk. The
map in the upper-left portion depicts the fragmented nature of the results
calculated for a set individual 30m grid cells.
While the results are exacting for each location, the resulting “salt
and pepper” condition is overly detailed and impractical for
decision-making. The situation is akin
to the old adage that “you can’t see the forest for the trees.”
The remaining three panels show the
effect of using a smoothing window of increasing radius to average the
surrounding conditions. The two-cell
reach averages the wildfire risk within a 13-cell window of slightly more than
2.5 acres. Five and ten-cell reaches
eliminate even more of the salt-and-pepper effect. An eight-cell reach (44 acre) appears best for
wildfire risk modeling as it represents an appropriate resolution for management.
Another use of a neighborhood operator is
establishing fire occurrence probability based on historical fire records. The first step in solving this problem is to
generate a continuous map surface identifying the number of fires within a
specified window reach from each map location.
If the ignition locations of individual fires are recorded by geographic
coordinates (e.g., latitude/longitude) over a sufficient time period (e.g.,
10-20 years) the solution is straightforward.
An appropriate window (e.g., 1000 acres) is moved over the point data
and the total number of fires is determined for the area surrounding each grid
cell. The window is moved over the area
to allow for determination of the likelihood of fire ignition over an area
based on the historic ignition location data.
The derived fire density surface is divided by the number of cells in
window (fires per cell) and then divided by the number of years (fires per cell
per year). The result is a continuous
map indicating the likelihood (annualized frequency) that any location will
have a wildfire ignition.
The reality of the solution, however, is
much more complex. The relative
precision of recording fires differs for various reporting units from specific
geographic coordinates, to range/township sections, to zip codes, to entire
counties or other administrative groupings.
The spatially aggregated data is particularly aggravating as all fires
within the reporting polygon are represented as occurring at the centroid of a
reporting unit. Since the actual
ignition locations can be hundreds of grid cells away from the centroid, a bit
of statistical massaging is needed.
Figure 2
summarizes the steps involved. The
reporting polygon is converted to match the resolution of the grid used by the wildfire
risk model and each location is assigned the total number of fires occurring
within its reporting polygon (#Fires). A
second grid layer is established that assigns the total number of grid cells
within its reporting polygon (#Cells).
Dividing the two layers uniformly distributes the number of fires within
a reporting unit to the each grid cell comprising the unit (Fires/Cell).
Figure 2. Fires per
cell is calculated for each location within a reporting unit then a roving
window is used to calculate the likelihood of ignition by averaging the
neighboring probabilities.
The final step
moves a roving window over the map to average the fires per cell as depicted on
the right side of the figure. The result
is a density surface of the average number of fires per cell reflecting the
relative size and pattern of the fire incident polygons falling within the
roving window. As the window moves from
a location surrounded by low probability values to one with higher values the
average probability increases as a gradient that tracks the effect of the
area-weighted average.
Figure 3. A map of
Fire Occurrence frequency identifies the relative likelihood that a location
will ignite based on historical fire incidence records.
Figure 3 shows
the operational results of stratifying the area into areas of uniform
likelihood of fire ignition. The
reference grid identifies PLSS sections used for fire reporting, with the dots
indicating total number of fires for each section. The dark grey locations identify non-burnable
areas, such as open water, agriculture lands, urbanization, etc. The tan locations identify burnable areas
with a calculated probability of zero.
Since zero probability is a result of the short time period of the
recorded data the zero probability is raised to a minimum value. The color ramp indicates increasing fire
ignition probability with red being locations having very high likelihood of
ignition.
It is important to note that
interpolation of incident data is inappropriate and simple density function
analysis only works for data that is reported with specific geographic
coordinates. Spatially aggregated
reporting requires the use of the area-weighted frequency technique described
above. This applies to any discrete
incident data reporting and analysis, whether wildfire ignition points, crime
incident reports, product sales. Simply
assigning and mapping the average to reporting polygons just won’t cut it as
geotechnology moves beyond mapping.
______________________________
Author’s Note: Discussion based on
wildfire risk modeling by Sanborn, www.sanborn.com/solutions/fire_management.htm. For more information on wildfire risk
modeling, see GeoWorld, December 2005, Vol.18, No. 12, 34-37, posted online at http://www.geoplace.com/uploads/FeatureArticle/0512ds.asp
or click here for
article with enlarged figures and .pdf hardcopy.
(GeoWorld, February 2006, pg. 16-17)
Neighborhood
operations summarize the map values surrounding a location based on the implied
Surface Configuration (slope, aspect, profile) or the Statistical Summary of
the values. The summary procedure, as
well as the shape/size of the roving window, greatly affects the results.
The previous
section investigated these effects by changing the window size and the summary
procedure to derive a statistical summary of neighbor conditions. An interesting extension to these discussions
involves using spatial filters that
change the relative weighting of the values within the window based on standard
decay function equations.
Figure 1 shows
graphs of several decay functions. A
Uniform function is insensitive to distance with all of the weights in the
window the same (1.0). The other
equations involve the assumption that “nearby things are more alike” and
generate increasingly smaller weights with greater distances. The Inverse Distance Squared function is the
most extreme resulting in nearly zero weighting within less than a 10 cell
reach. The Inverted D^2 function, on the
other hand, is the least limiting function with its weights decreasing at a
much slower rate to a reach of over 35 cells.
Figure 1. Standard mathematical decay
functions where weights (Y) decrease with increasing distance (X).
Decay functions
like these often are used by mathematicians to characterize relationships among
variables. The relationships in a
spatial filter require extending the concept to geographical space. Figure 2 shows 2D and 3D plots of the results
of evaluating the Inverse Distance, Linear Negative-Slope, Inverted
Distance-Squared and Uniform functions to the X,Y coordinates in a grid-based
system. The result is a set of weights
for a roving window (technically referred to as a “kernel”) with a radius of 38
cells.
Note the sharp
peak for the Inverse Distance filter that rapidly declines from a weight of 1.0
(blue) for the center location to effectively zero (yellow) for most of the
window. The Linear Negative-Slope
filter, on the other hand, decreases at a constant rate forming a cone of
declining weights. The weights in the
Inverted Distance-Squared filter are much more influential throughout the
window with a sharp fall-off toward the edge of the window. The Uniform filter is constant at 1.0
indicating that all values in the window are equally weighted regardless of
their distance from the center location.
Figure 2.
Example spatial filters depicting the fall-off of weights (Z) as a
function of geographic distance (X,Y).
These spatial
filters are the geographic equivalent to the standard mathematical decay functions
shown in figure 1. The filters can be
used to calculate a weighted average by 1) multiplying the map values times the
corresponding weights within a roving window, 2) summing the products, 3) then
dividing by the sum of the weights and 4) assigning the calculated value to the
center cell. The procedure is repeated
for each instance of the roving window as it passes throughout the project
area.
Figure 3
compares the results of weight-averaging using a Uniform spatial filter (simple
average) and a Linear Negative-Slope filter (weighted average) for smoothing
model calculated values. Note that the
general patterns are similar but that the ranges of the smoothed values are
different as the result of the weights used in averaging.
The use of spatial
filters enables a user to control the summarization of neighboring values. Decay functions that match user knowledge or
empirical research form the basis of distance weighted averaging. In addition, filters that affect the shape of
the window can be used, such as using direction to summarize just the values to
the north—all 0’s except for a wedge of 1’s oriented toward the north.
Figure 3.
Comparison of simple average (Uniform weights) and weighted average
(Linear weights) smoothing results.
“Dynamic spatial filters” that change with
changing geographic conditions define an active frontier of research in
neighborhood summary techniques. For
example, the shape and weights could be continuously redefined by just
summarizing locations that are uphill as a function of elevation (shape) and
slope (weights) with steep slopes having the most influence in determining
average landslide potential. Another
example might be determining secondary source pollution levels by considering
up-wind locations as a function of wind direction (shape) and speed (weights)
with values at stronger wind locations having the most weight.
The digital
nature of modern maps supporting such map-ematics
is taking us well beyond traditional mapping and our paper-map legacy. As
(GeoWorld, December 2005, pg. 18-19)
The last couple
of sections discussed procedures for analyzing spatially-defined neighborhoods
through the direct numerical summary of values within a roving window. An interesting group of extended operators
are referred to as spatial filters.
A useful
example of a spatial filter involves analysis of a Binary Progression Window (BPW) that summarizes the diagonal and
orthogonal connectivity within the window.
The left side of figure 1 shows the binary progression (multiples of 2)
assignment for the cells in a 3 by 3 window that increases left to right, top
to bottom.
Figure 1. Binary Progression Window
summarizes neighborhood connectivity by summing values in a roving window.
The interesting
characteristic of the sum of a binary progression of numbers is that each
unique combination of numbers results in a unique value. For example if a condition does not occur in
a window, the sum is zero. If all cells
contain the condition, the sum is 511.
The four example configurations on the right identify the unique sum
that characterizes the patterns shown.
The result is that all possible patterns can be easily recognized by the
computer and stored as a map.
A more
sophisticated example of a spatial filter is the Binary Comparison Matrix (
Consider the
3x3 window in figure 2 where "M" represents meadow classified
locations and "F" represents forest.
The simplest summary of neighborhood variability is to note there are
just two classes. If there was only one
class in the window, you would say there is no variability (boring); if there
were nine classes, there would be a lot of different vegetation conditions
(exciting).
The count of
the number of different classes (Diversity) is the broadest measure of
neighborhood variability. The measure of
the relative frequency of occurrence of each class (Interspersion) is a
refinement on the simple diversity count and notes that the window contains
less M’s than F’s. If the example's
three "M's" were more spread out like a checkerboard, you would
probably say there was more variability due to the relative positioning of the
classes (Juxtapositioning). The
final variability measure (Proportion) is two because there are 2 similar cells
of the 8 total adjoining cells.
Figure 2. Binary Comparison Matrix
summarizes neighborhood variability by summing various groups of matrix
pairings identified in a roving window.
A computer
simply summarizes the values in a Binary Comparison Matrix to categorize all
variability that you see. First,
"Binary" means it only uses 0's and 1's. "Comparison" says it will compare
each element in the window with every other element. If they are the same, a 1 is assigned; if
different, a 0 is assigned.
"Matrix" dictates how the data the binary data is organized
and summarized.
In figure 2,
the window elements are numbered from one through nine. In the window, is the class for cell 1 the
same as for cell 2? Yes (both are M), so
1 is assigned at the 1,2 position in the table.
How about elements 1 and 3? No,
so assign a 0 in the second position of column one. How about 1 and 4? No, then assign another 0. Repeat until all of the combinations in the
matrix contain a 0 or a 1 as depicted in the figure.
While you are
bored already, the computer enjoys completing the table for every grid
location… thousands and thousands of
Within the
table there are 36 possible comparisons.
In our example, eighteen of these are similar by summing the entire
matrix— Interspersion= 18. Orthogonal
adjacency (side-by-side and top-bottom) is computed by summing the
vertical/horizontal cross-hatched elements in the table— Juxtaposition= 9. Comparison of the center to its neighbors
computes the sum for all pairs involving element 5 having the same condition
(5,1 and 5,2 only)— Proportion= 2.
You can easily
ignore the mechanics of the computations and still be a good user of
While BPW’s
neighborhood connectivity and
______________________________
Author’s Note: This and other
“Beyond Mapping” columns have been compiled into an online book Map Analysis
posted at www.innovativegis.com/basis/. Student and instructor materials with
hands-on exercises including software are available.