Topic 3 – Basic Techniques in Spatial Analysis
Map Analysis book/CD
Use a Map-ematical Framework for GIS Modeling — describes a conceptual structure for map analysis operations and GIS modeling
Options Seem Endless When Reclassifying Maps — discusses the basic reclassifying map operations
Overlay Operations Feature a Variety of Options — discusses the basic overlaying map operations
Computers Quickly Characterize Spatial Coincidence — discusses several human considerations in implementing GIS
Further Reading — two additional sections
(GeoWorld, March 2004)
While map analysis tools might at first seem uncomfortable, they simply are extensions of traditional analysis procedures brought on by the digital nature of modern maps. Since maps are “number first, pictures later,” a map-ematical framework can be can be used to organize the analytical operations. Like basic math, this approach uses sequential processing of mathematical operations to perform a wide variety of complex map analyses. By controlling the order that the operations are executed, and using a common database to store the intermediate results, a mathematical-like processing structure is developed.
This “map algebra” is similar to traditional algebra where basic operations, such as addition, subtraction and exponentiation, are logically sequenced for specific variables to form equations—however, in map algebra the variables represent entire maps consisting of thousands of individual grid values. Most of traditional mathematical capabilities, plus extensive set of advanced map processing operations, comprise the map analysis toolbox.
grid-based map analysis, the spatial coincidence and juxtapositioning of
values among and within maps create new analytical operations, such as
coincidence, proximity, visual exposure and optimal routes. These operators are accessed through general
purpose map analysis software available in many
There are two fundamental conditions required by any map analysis package—a consistent data structure and an iterative processing environment. Previous discussion (Beyond Mapping columns July and September, 2002) described the characteristics of a grid-based data structure by introducing the concepts of an analysis frame, map stack and data types. This discussion extended the traditional discrete set of map features (points, lines and polygons) to map surfaces that characterize geographic space as a continuum of uniformly-spaced grid cells.
The second condition is the focus of this and the next couple of sections. It provides an iterative processing environment by logically sequencing map analysis operations and involves: 1) retrieval of one or more map layers from the database, 2) processing that data as specified by the user, 3) creation of a new map containing the processing results, and ) storage of the new map for subsequent processing.
Each new map derived as processing continues aligns with the analysis frame so it is automatically geo-registered to the other maps in the database. The values comprising the derived maps are a function of the processing specified for the “input map(s).”
This cyclical processing provides an extremely flexible structure similar to “evaluating nested parentheticals” in traditional math. Within this structure, one first defines the values for each variable and then solves the equation by performing the mathematical operations on those numbers in the order prescribed by the equation. For example, the equation for calculating percent change in your investment portfolio—
—identifies that the variables B and C are first defined, then subtracted and the difference stored as an intermediate solution. The intermediate solution is divided by variable C to generate another intermediate solution that, in turn is multiplied by 100 to calculate the solution variable A, the percent change value.
The same mathematical structure provides the framework for computer-assisted map analysis. The only difference is that the variables are represented by mapped data composed of thousands of organized numbers. Figure 1 shows a similar solution for calculating the percent change in animal activity for an area. Maps of activity in two periods serve as input; a difference map is calculated then divided by the earlier period and multiplied by 100. The procedure uses the same equation, just derives a different form of output—a map of percent change.
Figure 1. An iterative processing environment, analogous to basic math, is used to derive new map variables.
The processing steps shown in the figure are identical to the traditional solution except the calculations are performed for each grid cell in the study area and the result is a map that identifies the percent change at each location (a decrease of 8.51% for the example location; red tones indicate decreased and green tones indicate increased animal activity).
analysis identifies what kind of change (termed the thematic attribute)
occurred where (termed the spatial attribute).
The characterization of what
and where provides information needed
Within this iterative processing structure, four fundamental classes of map analysis operations can be identified. These include:
- Reclassifying Maps – involving the reassignment of the values of an existing map as a function of its initial value, position, size, shape or contiguity of the spatial configuration associated with each map category.
- Overlaying Maps – resulting in the creation of a new map where the value assigned to every location is computed as a function of the independent values associated with that location on two or more maps.
- Measuring Distance and Connectivity – involving the creation of a new map expressing the distance and route between locations as straight-line length (simple proximity) or as a function of absolute or relative barriers (effective proximity).
- Characterizing and Summarizing Neighborhoods – resulting in the creation of a new map based on the consideration of values within the general vicinity of target locations.
Reclassification operations merely repackage existing information on a single map. Overlay operations, on the other hand, involve two or more maps and result in the delineation of new boundaries. Distance and connectivity operations are more advanced techniques that generate entirely new information by characterizing the relative positioning of map features. Neighborhood operations summarize the conditions occurring in the general vicinity of a location. See the Author’s Notes for links to more detailed discussions of the types of map analysis operations.
reclassifying and overlaying operations based on point processing are the
backbone of current
The mathematical structure and classification scheme of Reclassify, Overlay, Distance and Neighbors form a conceptual framework that is easily adapted to modeling spatial relationships in both physical and abstract systems. A major advantage is flexibility. For example, a model for siting a new highway can be developed as a series of processing steps. The analysis might consider economic and social concerns (e.g., proximity to high housing density, visual exposure to houses), as well as purely engineering ones (e.g., steep slopes, water bodies). The combined expression of both physical and non-physical concerns within a quantified spatial context is another significant major benefit.
However, the ability to simulate various scenarios (e.g., steepness is twice as important as visual exposure and proximity to housing is four times more important than all other considerations) provides an opportunity to fully integrate spatial information into the decision-making process. By noting how often and where the proposed route changes as successive runs are made under varying assumptions, information on the unique sensitivity to siting a highway in a particular locale is described.
In the old environment, decision-makers attempted to interpret results, bounded by vague assumptions and system expressions of a specialist. Grid-based map analysis, on the other hand, engages decision-makers in the analytic process, as it both documents the thought process and encourages interaction. It’s sort of like a “spatial spreadsheet” containing map-matical equations (or recipes) that encapsulates the spatial reasoning of a problem and solves it using digital map variables.
Options Seem Endless When Reclassifying Maps
(GeoWorld, April 2004)
previous section described a map-ematical
The reassignment of existing values can be made as a function of the initial value, position, contiguity, size, or shape of the spatial configuration of the individual map categories. Each reclassification operation involves the simple repackaging of information on a single map, and results in no new boundary delineation. Such operations can be thought of as the purposeful "re-coloring" of maps.
1 shows the result of simply reclassifying a map as a function of its initial
thematic values. For display, a unique
symbol is associated with each value. In
the figure, the cover type map has categories of Open Water, Meadow and
The binary map on the right side of the figure isolates the Open Water locations by simply assigning zero to the areas of Meadow and Forest and displaying as the categories as grey. Although the operation seems trivial by itself, it has map analysis implications far beyond simply re-coloring the map categories. And it graphically demonstrate the basic characteristic of reclassify operations—values change but the spatial pattern of the data doesn’t.
Figure 1. Areas of meadow and forest on a COVERTYPE map can be reclassified to isolate large areas of open water.
A similar reclassification operation might involve the ranking or weighing of qualitative map categories to generate a new map with quantitative values. For example, a map of soil types could be assigned values that indicate the relative suitability of each soil type for residential development.
Quantitative values also might be reclassified to yield new quantitative values. This might involve a specified reordering of map categories (e.g., given a map of soil moisture content, generate a map of suitability levels for plant growth). Or, it could involve the application of a generalized reclassifying function, such as "level slicing," which splits a continuous range of map category values into discrete intervals (e.g., derivation of a contour map of just 10 contour intervals from an elevation surface composed of thousands of specific elevation values).
Other quantitative reclassification functions include a variety of arithmetic operations involving map category values and a specified or computed constant. Among these operations are addition, subtraction, multiplication, division, exponentiation, maximization, minimization, normalization and other scalar mathematical and statistical operators. For example, an elevation surface expressed in feet could be converted to meters by multiplying each map value by the appropriate conversion factor of 3.28083 feet per meter.
Reclassification operations can also relate to location, as well as purely thematic. One such characteristic is position. An overlay category represented by a single "point" location, for example, might be reclassified according to its latitude and longitude. Similarly, a line segment or area feature could be reassigned values indicating its center or general orientation.
A related operation, termed parceling, characterizes category contiguity. This procedure identifies individual "clumps" of one or more points that have the same numerical value and are spatially contiguous (e.g., generation of a map identifying each lake as a unique value from a generalized map of water representing all lakes as a single category).
Another location characteristic is size. In the case of map categories associated with linear features or point locations, overall length or number of points might be used as the basis for reclassifying the categories. Similarly, an overlay category associated with a planar area could be reclassified according to its total acreage or the length of its perimeter.
A map of water types, for example, could be reassigned values to indicate the area of individual lakes or the length of stream channels. The same sort of technique also could be used to deal with volume. Given a map of depth to bottom for a group of lakes, each lake might be assigned a value indicating total water volume based on the area of each depth category.
Figure 2. A sequence of reclassification operations (renumber, clump, size and renumber) can be used to isolate large water bodies from a cover type map.
Figure 2 identifies a similar processing sequence using the information derived in figure 1. Although your eye sees two distinct blobs of water on the OPEN WATER map, the computer only “sees” distinctions by different map category values. Because both water bodies are assigned the same value of 1, there isn’t a map-ematical distinction that the computer cannot see the distinction.
The Clump operation is used to identify the
contiguous features as separate values—clump #1 (Larry’s
In addition to the initial value, position, contiguity, and size of features, shape characteristics also can be used as the basis for reclassifying map categories. Shape characteristics associated with linear forms identify the patterns formed by multiple line segments (e.g., dendritic stream pattern). The primary shape characteristics associated with polygonal forms include feature integrity, boundary convexity, and nature of edge.
Feature integrity relates to an area’s “intact-ness”. A category that is broken into numerous "fragments" and/or contains several interior "holes" is said to have less spatial integrity than categories without such violations. Feature integrity can be summarized as the Euler Number that’s computed as the number of holes within a feature less one short of the number of fragments. Euler Numbers of zero indicates features that are spatially balanced, whereas larger negative or positive numbers indicate less spatial integrity—either broken into more pieces or poked with more holes.
Convexity and edge are other shape indices that relate to the configuration of polygonal features’ boundaries. The Convexity Index for a feature is computed by the ratio of its perimeter to its area. The most regular configuration is that of a circle which is totally convex and, therefore, not enclosed by the background at any point along its boundary.
Comparison of a feature's computed convexity to a circle of the same area, results in a standard measure of boundary regularity. The nature of the boundary at each point can be used for a detailed description of boundary configuration. At some locations the boundary might be an entirely concave intrusion, whereas others might be at entirely convex protrusions. Depending on the "degree of edginess," each point can be assigned a value indicating the actual boundary convexity at that location.
This explicit use of cartographic shape as an analytic parameter is unfamiliar to most GIS users. However, a non‑quantitative consideration of shape is implicit in any visual assessment of mapped data. Particularly promising is the potential for applying quantitative shape analysis techniques in the areas of digital image classification and wildlife habitat modeling.
A map of forest stands, for example, could be reclassified such that each stand is characterized according to the relative amount of forest edge with respect to total acreage and the frequency of interior forest canopy gaps. Stands with a large proportion of edge and a high frequency of gaps will generally indicate better wildlife habitat for many species. In any event, reclassify operations simply assign new values to old category values—sometimes seeming trivial and sometimes seemingly a bit conceptually complex.
Overlay Operations Feature a Variety of Options
(GeoWorld, May 2004)
The general class of overlay operations can be characterized as "light‑table gymnastics." These involve the creation of a new map where the value assigned to every point, or set of points, is a function of the independent values associated with that location on two or more existing map layers.
In location‑specific overlaying, the value assigned is a function of the point‑by‑point coincidence of the existing maps. In category‑wide composites, values are assigned to entire thematic regions as a function of the values on other overlays that are associated with the categories. Whereas the first overlay approach conceptually involves the vertical spearing of a set of map layers, the latter approach uses one map to identify boundaries by which information is extracted from other maps.
Figure 1 shows an example of location‑specific overlaying. Here, maps of COVERTYPE and topographic SLOPE_CLASSES are combined to create a new map identifying the particular cover/slope combination at each location. A specific function used to compute new category values from those of existing maps being overlaid can vary according to the nature of the data being processed and the specific use of that data within a modeling context.
Environmental analyses typically involve the manipulation of quantitative values to generate new values that are likewise quantitative in nature. Among these are the basic arithmetic operations such as addition, subtraction, multiplication, division, roots, and exponentials. Functions that relate to simple statistical parameters such as maximum, minimum, median, mode, majority, standard deviation or weighted average also can be applied. The type of data being manipulated dictates the appropriateness of the mathematical or statistical procedure used.
The addition of qualitative maps such as soils and land use, for example, would result in mathematically meaningless sums, since their thematic values have no numerical relationship. Other map overlay techniques include several that might be used to process either quantitative or qualitative data and generate values which can likewise take either form. Among these are masking, comparison, calculation of diversity, and permutations of map categories (as depicted in figure 1).
Figure 1. Point-by point overlaying operations summarize the coincidence of two or more maps, such as assigning a unique value identifying the COVERTYPE and SLOPE_CLASS conditions at each location.
More complex statistical techniques may also be applied in this manner, assuming that the inherent interdependence among spatial observations can be taken into account. This approach treats each map as a variable, each point as a case, and each value as an observation. A predictive statistical model can then be evaluated for each location, resulting in a spatially continuous surface of predicted values. The mapped predictions contain additional information over traditional non‑spatial procedures, such as direct consideration of coincidence among regression variables and the ability to spatially locate areas of a given level of prediction. Topic 12 investigates the considerations in spatial data mining derived by statistically overlaying mapped data.
Figure 2. Category-wide overlay operations summarize the spatial coincidence of map categories, such as generating the average SLOPE for each COVERTYPE category.
An entirely different approach to overlaying maps involves category‑wide summarization of values. Rather than combining information on a point‑by‑point basis, this group summarizes the spatial coincidence of entire categories shown on one map with the values contained on another map(s). Figure 2 contains an example of a category‑wide overlay operation. In this example, the categories of the COVERTYPE map are used to define an area over which the coincidental values of the SLOPE map are averaged. The computed values of average slope within each category area are then assigned to each of the cover type categories.
Summary statistics which can be used in this way include the total, average, maximum, minimum, median, mode, or minority value; the standard deviation, variance, or diversity of values; and the correlation, deviation, or uniqueness of particular value combinations. For example, a map indicating the proportion of undeveloped land within each of several counties could be generated by superimposing a map of county boundaries on a map of land use and computing the ratio of undeveloped land to the total land area for each county. Or a map of zip code boundaries could be superimposed over maps of demographic data to determine the average income, average age, and dominant ethnic group within each zip code.
As with location‑specific overlay techniques, data types must be consistent with the summary procedure used. Also of concern is the order of data processing. Operations such as addition and multiplication are independent of the order of processing. Other operations, such as subtraction and division, however, yield different results depending on the order in which a group of numbers is processed. This latter type of operations, termed non‑commutative, cannot be used for category‑wide summaries.
Computers Quickly Characterize Spatial Coincidence
(GeoWorld, June 2004)
knows the 'bread and butter' of a
compare how you and your computer might approach the task of identifying
coincidence. Your eye moves randomly
about the stack, pausing for a nanosecond at each location and mentally
establishing the conditions by interpreting the color. Your summary might conclude that the
northeastern portion of the area is unfavorable as it has "kind of a
magenta tone." This is the result
of visually combining steep slopes portrayed as bright red with unstable soils
portrayed as bright blue with minimal vegetation portrayed as dark green. If you want to express the result in map form,
you would tape a clear acetate sheet on top and delineate globs of color
differences and label each parcel with your interpretation. Whew!
No wonder you want a
A raster system has things a bit easier. As all locations are predefined as a consistent set of cells within a matrix, the computer merely 'goes' to a location, retrieves the information stored for each map layer and assigns a value indicating the combined map conditions. The result is a new set of values for the matrix identifying the coincidence of the maps.
The big difference between ocular and computer approaches to map overlay is not so much in technique, as it is in the treatment of the data. If you have several maps to overlay you quickly run out of distinct colors and the whole stack of maps goes to an indistinguishable dark, purplish hue. One remedy is to classify each map layer into just two categories, such as suitable and unsuitable. Keep one as clear acetate (good) and shade the other as light grey (bad). The resulting stack avoids the ambiguities of color combinations, and depicts the best areas as lighter tones. However, in making the technique operable you have severely limited the content of the data—just good and bad.
computer can mimic this technique by using binary maps. A "0" is assigned to good
conditions and a "1" is assigned to bad conditions. The sum of the maps has the same information
as the brightness scale you observe—the smaller the value the better. The two basic forms of logical combination
can be computed. "Find those
locations which have good slopes .
would you handle, "Find those locations which have good slopes .OR. good
any combination is easy to identify.
Let's say we expand our informational scale and redefine each map from
just good and bad to not suitable (0), poor (1), marginal (2), good (3) and
excellent (4). We could ask the computer
to INTERSECT SLOPES WITH SOILS WITH COVER COMPLETELY FOR
way of combining these maps is by asking to COMPUTE SLOPES MINIMIZE SOILS
MINIMIZE COVER FOR WEAK-
What would happen if, for each location (be it a polygon or a cell), we computed the sum of the three maps, then divided by the number of maps? That would yield the average rating for each location. Those with the higher averages are better. Right? You might want to take it a few steps further. First, in a particular application, some maps may be more important than others in determining the best areas. Ask the computer to AVERAGE SLOPES TIMES 5 WITH SOILS TIMES 3 WITH COVER TIMES 1 FOR WEIGHTED-AVERAGE. The result is a map whose average ratings are more heavily influenced by slope and soil conditions.
Just to get a handle on the variability of ratings at each location, you can determine the standard deviation—either simple or weighted. Or for even more information, determine the coefficient of variation, which is the ratio of the standard deviation to the average, expressed as a percent. What will that tell you? It hints at the degree of confidence you should put into the average rating. A high COFFVAR indicates wildly fluctuating ratings among the maps and you might want to look at the actual combinations before making a decision.
A statistical way to summarized coincidence between maps is a cross-tab table. If you CROSSTAB FORESTS WITH SOILS a table results identifying how often each forest type jointly occurs with each soil type. In a vector system, this is the total area in each forest/soil combinations. In a raster system, this is simply a count of all the cell locations for each forest/soil combination.
For example, reading across the first row of table in figure 1 notes that Forest category 1 (Deciduous) contains 303 cells distributed throughout the map. The total count for Soils category 1 (Lowland) is 427 cells. The next section of the table notes that the joint condition of Deciduous/Lowland occurs 299 times for 47.84 percent of the total map area. Contrast this result with that of Deciduous/Upland occurrence on the row below indicating only four “crosses” for less than one percent of the map. The coincidence statistics for the Conifer category is more balanced with 128 cells (20.48%) occurring with the Lowland soil and 194 cells (31.04%) occurring with the Upland soil.
Figure 1. A cross-tab table statistically summarizes the coincidence among the categories on two maps.
These data may cause you to jump to some conclusions, but you had better consider the last section of the Table before you do. This section normalizes the coincidence count to the total number of cells in each category. For example, the 299 Deciduous/Lowland coincidence accounts for 98.68 percent of all occurrences of Deciduous trees ((299/303)*100). That's a very strong relationship. However, from Lowland soil occurrence the 299 Deciduous/Lowland coincidence is a bit weaker as it accounts for only 70.02 percent of all occurrences of Lowland soils ((299/427)*100). In a similar vein, the Conifer/Upland coincidence is very strong as it accounts for 97.98 percent of the occurrence of all Upland soil occurrences. Both columns of coincidence percentages must be considered as a single high percent might be merely the result of the other category occurring just about everywhere.
are still a couple of loose ends before we can wrap-up point-by-point overlay
summaries. One is direct map comparison,
or change detection. For example,
if you encode a series of land use maps for an area, then subtract each
successive pair of maps; the locations that underwent change will appear as
non-zero values for each time step. In
If you are real tricky and think map-ematically you will assign a binary progression to the land use categories (1,2,4,8,16, etc.), as the differences will automatically identify the nature of the change. The only way you can get a 1 is 2-1; a 2 is 4-2; a 3 is 4-1; a 6 is 8-2; etc. A negative sign indicates the opposite change, and now all bases are covered. .
The last point-by-point operation is a weird one—covering. This operation is truly spatial and has no traditional math counterpart. Imagine you prepared two acetate sheets by coloring all of the forested areas opaque green on one sheet and all of the roads an opaque red on the other sheet. Now overlay them on a light-table. If you place the forest sheet down first the red roads will “cover” the green forests and you will see the roads passing through the forests. If the roads map goes down first, the red lines will stop abruptly at the green forest globs.
Key Concepts Characterize Unique Conditions — describes a technique for handling unique combinations of map layers (April 2006)
Use “Shadow Maps” to Understand Overlay Errors — describes how shadow maps of certainty can be used to estimate error and its propagation (September 2004)