Topic 8 – The Anatomy of a GIS Model
Spatial Reasoning book
From Recipes to Models — describes basic Binary and Rating model expressions using a simple Landslide Susceptible model
Extending Basic Models through Logic Modifications — describes logic extensions to a simple Landslide Susceptible model by adding additional criteria that changes a model’s structure
Evaluating Map-ematical Relationships — discussed the differences and similarities between the two basic types of GIS models (Cartographic and Spatial) using the Universal Soil Loss Equation as an example
(GeoWorld, December 1995)
So what's the difference between a recipe and a model? Both seem to mix a bunch of things together to create something else. Both result in a synergistic amalgamation that's more than the sum of the parts. Both start with basic ingredients and describe the processing steps required to produce the desired result-be it a chocolate cake or a landslide susceptibility map.
In a GIS, the ingredients are base maps and the processing steps are spatial handling operations. For example, a simple recipe for locating landslide susceptibility involves ingredients such as terrain steepness, soil type, and vegetation cover; areas that are steep, unstable, and bare are the most susceptible.
Before computers, identifying areas of high susceptibility required tedious manual map analysis procedures. A transparency was taped over a contour map of elevation, and areas where contour lines were spaced closely (steep) were outlined and filled with a dark color. Similar transparent overlays were interpreted for areas of unstable soils and sparse vegetation from soil and vegetation base maps. When the three transparencies were overlaid on a strong light source, the combination was deciphered easily— clear = not susceptible, and dark = susceptible. That basic recipe has been with us for a long time. Of course, the methods changed as modern drafting aids replaced the thin parchment, quill pens, and stained glass windows of the 1800s, but the conceptual approach remains the same.
In a typical vector GIS, a logical combination is achieved by first generating a topological overlay of the three maps (SLOPE, SOILS, COVERTYPE), then querying the resultant table (Tsv_olv) for susceptible areas. The Structured Query Language (SQL) query might look like the following:
Select columns: %slope, Soil_stability, Covertype
from tables: TSV_OVL
where condition: %slope > 13 AND Soil_stability = "Unstable" AND Covertype = "Bare"
into table named: L_HAZARD
The flowchart in figure 1 depicts an alternative raster-based binary model (only two states of either Yes or No), which mimics the manual map analysis process and achieves the same result as the overlay/SQL query. A slope map is created by calculating the change in elevation throughout the project area (first derivative of the elevation surface).
Figure 1. Binary, ranking and rating models of landslide susceptibility. The location indicated by the piercing arrow contains 34 percent slope, a fairly stable soil and sparse forest cover.
A Simple Binary model solution codes as “1” all of the susceptible areas on each of the factor maps (>30 percent slope, unstable soils, bare vegetative cover), whereas the non-susceptible areas are coded as “0.” The product of the three binary maps (SL_HAZARD (binary), SO_ HAZARD (binary), CO_ HAZARD (binary)) creates a final map of landslide potential— l = susceptible, and 0 = not susceptible. Only locations susceptible on all three maps retain the "susceptible" classification (1*1*1= l). In the other instances, multiplying 0 times any number forces the product to 0 (not susceptible). The map-ematical model corresponding to the flowchart (Simple Binary model) in figure 1 might be expressed (in TMAP modeling language) as:
SLOPE ELEVATION FOR SLOPES
…creates a Slope map, 1- susceptible
RENUMBER SLOPES FoR sl_BINARY ASSIGNING 0 To 1 THRu 12
ASSIGNING 1 TO 13 THRU 1000
...identifies >13% as steep, 1= susceptible
RENUMBER SOILS FOR SO__BINARY ASSIGNING 0 TO 0 THRU 2
ASSIGNING I TO 3 THRU 4
…identifies soils 3&4 as unstable, 1= susceptible
RENUMBER covERTypE FOR co_BINARY ASSIGNING 0 To 1
ASSIGNING 0 To 3 ASSIGNING 1 To 2
…identifies cover type 2 as bare= 1 susceptible
coMpuTE sl__HAZARD TIMES so__HAZARD TIMES co__HAZARD
…computes 1 * 1 * 1 = 1 to identify hazardous areas
In the multiplicative case, the arithmetic combination of the maps yields the original two states-dark or 1 = susceptible, and clear or 0 = not susceptible (at least one data layer not susceptible). It's analogous to the "AND" condition of the logical combination in the SQL query. However, other combinations can be derived. For example, the visual analysis could be extended by interpreting the various shades of gray on the stack of transparent overlays: clear = not susceptible, light gray = low susceptibility, medium gray = moderate susceptibility and dark gray = high susceptibility.
In an analogous map-ematical approach, the computed sum of the three binary maps yields a similar ranking: 0 = not susceptible, 1 = low susceptibility, 2 = moderate susceptibility and 3 = high susceptibility (l + l+ l = 3). That approach is called a Binary Ranking model, because it develops an ordinal scale of increasing landslide potential— a value of two is more susceptible than a value of 1, but not necessarily twice as susceptible.
A rating model is different, because it uses a consistent scale with more than two states to characterize the relative landslide potential for various conditions on each factor map. For example, a value of 1 is assigned to the least susceptible steepness condition (e.g., from 0 percent to 5 percent slope), while a value of 9 is assigned to the most susceptible condition (e.g., >30 percent slope). The intermediate conditions are assigned appropriate values between the landslide susceptibility extremes of 1 and 9. That calibration results in three maps with relative susceptibility ratings (SL_HAZARD (rate), SO_HAZARD (rate), CO_HAZARD (rate)) based on the 1-9 scale of relative landslide susceptibility.
Computing the simple average (Simple Rating model) of the three rate maps determines an overall landslide potential based on the relative ratings for each factor at each map location. For example, a particular grid cell might be rated 9, because it's steep, 3 because its soil is fairly stable, and I because it's forested. The average landslide susceptibility rating under these conditions is [(9+3+3)/3] = 5, indicating a moderate landslide potential.
A weighted average of the three maps (Weighted Rating model) expresses the relative importance of each factor to determine overall susceptibility. For example, steepness might be identified as five times more important than either soils or vegetative cover in estimating landslide potential. For the example grid cell described previously, the weighted average computes to [([9*5]+3+3) l7) = 7.28, which is closer to a high overall rating. The weighted average is influenced preferentially by the SL-rate map's high rating, yielding a much higher overall rating than the simple average.
All that may be a bit confusing. The four different "recipes" for landslide potential produced strikingly different results for the example grid cell in figure 1— from not susceptible to high susceptibility. It's like baking banana bread. Some folks follow the traditional recipe; some add chopped walnuts or a few cranberries. By the time diced dates and candied cherries are tossed in, you can't tell the difference between your banana bread and last years' fruitcake.
So back to the main point-what's the difference between a recipe and a model? Merely semantics? Simply marketing jargon? The real difference between a recipe and a model isn't in the ingredients, or the processing steps themselves. It's in the conceptual fabric of the process …but more on that later.
Extending Basic Models through Logic Modifications
(GeoWorld, January 1996)
The previous section described various renderings of a landslide susceptibility model. It related the results obtained for an example location using manual, logical combination, binary, ranking, and rating models. The results ranged from not susceptible to high susceptibility. Two factors in model expression were at play: the type of model and its calibration.
However, the model structure, which identified the factors considered and how they interact, remained constant. In the example, the logic was constrained to jointly considering terrain steepness, soil type, and vegetation cover. One could argue other factors might contribute to landslide potential. What about depth to bedrock? Or previous surface disturbance? Or slope length? Or precipitation frequency and intensity? Or gopher population density? Or about anything else you might dream up?
That's it. You've got the secret to seat-of-the-pants GISing. First you address the critical factors, and then extend your attention to other contributing factors. In the abstract it means adding boxes and arrows to the flowchart to reflect the added logic. In practice it means expanding the GIS macro code, and most importantly wrestling with the model's calibration.
For example, it's easy to add a fourth row to the landslide flowchart, identifying the additional criterion of depth to bed rock, and tie it to the other three factors. It's even fairly easy to add the new lines of code to the GIS macro (Binary Ranking model):
assigning 0 to 0 thru 4
assigning 1 to 5 thru 15
…identifies depth to bedrock > 4m as minimal susceptibility = 1
coMpuTE sl_binary PLUS
co_binary PLUS br_BINARY
compute 1 + 1 + 1+ 1 = 4 to identify extremely hazardous areas
Things get a lot tougher when you have to split hairs about precisely what soil depths increase landslide susceptibility (>4 meters a good guess?).
The previous discussions focused on the hazard of landslides, but not their risk. Do we really care about landslides unless there is something valuable in the way? Risk implies the threat a hazard imposes on something valuable. Common sense suggests that a landslide hazard distant from important features represents a much smaller threat than a similar hazard adjacent to a major road or school.
Figure 1. Extends the basic landslide susceptibility model to isolate hazards around roads (simple proximity “mask”).
The top portion of figure 1 shows the flowchart and commands for the basic binary landslide model. The lower portion identifies a risk extension the basic model that considers proximity to important features as a risk indicator. In the flowchart, a map of proximity to roads (R_PROX) is generated that identifies the distance from every location to the nearest road. Increasing map values indicate locations farther from a road. A binary map of buffers around roads (R_BUFFERS) is created by renumbering the distance values near roads to 1 and far from roads to 0. By multiplying this “masking" map by the landslide susceptibility map (L_HAZARD), the landslide threat is isolated for just the areas around roads (risk).
A further extension to the model involves variable-width buffers as a function of slope (figure 2). The logic in that refinement is that in steep areas the buffer width increases as a landslide poses a greater threat. The threat diminishes in gently sloped areas, so the buffer width contracts. The weighted buffer extension calibrates the slope map into an impedance map (FRICTION), which guides the proximity measurement.
Figure 2. Weighted buffer extension to the basic landslide susceptibility model.
As the computer calculates distance in steep areas (low impedance), it assigns larger effective distance values for a given geographic step than it does in gently sloped areas (high impedance). That results in an effective proximity map (R_WPROX), with increasing values indicating locations that are effectively farther away from the road. The buffer map from these data is radically different from the simple buffer in the previous model extension.
Instead of a constant geographic reach around the roads, the effective buffer varies in width, as a function of slope, throughout the map area. As before, the buffer can be used as a binary mask to isolate the hazards within the variable reach of the roads.
That iterative refinement characterizes a typical approach to GIS modeling— from simple to increasingly complex. Most applications first mimic manual map analysis procedures and are then extended to include more advanced spatial analysis tools. For example, a more rigorous map-ematical approach to the previous extension might use a mathematical function to combine the effective proximity (R_WPROX) with the relative hazard rating L_HAZARD) to calculate a risk index for each location.
For your enjoyment, some additional extensions are suggested below. Can you modify the flowchart to reflect the changes in model logic? If you have TMAP, can you develop the additional code? If you're a malleable undergraduate, you have to if you want to pass the course. But if you're a professional, you need not concern yourself with such details. Just ask the l8 year old GIS hacker down the hall to do your spatial reasoning.
HAZARD SUBMODEL MODIFICATIONS
- Consideration of other physical factors, such as bedrock type, depth to bedrock, faulting, etc.
- Consideration of disturbance factors, such as construction cuts and fills
- Consideration of environmental factors, such as recent storm frequency, intensity and duration
- Consideration of seasonal factors, such as freezing and thawing cycles in early spring
- Consideration of historical landslide data earthquake frequency
RISK SUBMODEL MODIFICATIONS
- Consideration of additional important features, such as public, commercial, and residential structures
- Extension to differentially weight the uphill and downhill slopes from a feature to calculate the effective buffer
- Extension to preferentially weight roads based on traffic volume, emergency routes, etc.
- Extension to include an economic valuation of threatened features and potential resource loss
Evaluating Map-ematical Relationships
(GeoWorld, February 1996)
As noted in the two previous sections, GIS applications come in a variety of forms. The differences aren't as much in the ingredients (maps) or the processing steps (command macros) as in the conceptual fabric of the process. In the extensions in the evolution of the landslide susceptibility, differences in the model approaches can arise through model logic and/or model expression. A Simple Binary susceptibility model (only two states of either Yes or No) is radically different from a Weighted Rating model using a weighted average of relative susceptibility indices. In mathematical terms, the rating model is more robust, because it provides a continuum of system responses. Also, it provides a foothold to extend the model even further.
There are two basic types of GIS models: cartographic and spatial. In short, a cartographic model focuses on automating manual map analysis techniques and reasoning, and a spatial model focuses on expressing mathematical relationships. In the landslide example, the logical combination and the binary map algebra solutions are obviously cartographic models. Both could be manually solved using file cabinets and transparent overlays-tedious, but feasible for the infinitely patient. The weighted average rating model, however, smacks of down and dirty map-ematics and looks like a candidate spatial model. But is it?
As with most dichotomous classifications there is a gray area of overlap between cartographic and spatial model extremes. If the weights used in rating model averaging are merely guess-timates, then the application lacks all of the rights, privileges, and responsibilities of an exalted spatial model. The model may be mathematically expressed, but the logic isn't mathematically derived, or empirically verified. In short, "Where's the science?"
One way to infuse a sense of science is to perform some data mining. That involves locating a lot of areas with previous landslides, then pushing a predictive statistical technique through a stack of potential driving variable maps. For example, you might run a regression on landslide occurrence (dependent mapped variable) with %slope, %clay, %silt, and %"cover (independent mapped variables). If you get a good fit, then substitute the regression equation for the weighted average in the rating model. That approach is at the threshold of science, but it presumes your database contains just the right set of maps over a large area. An alternative is to launch a series of "controlled" experiments under various conditions (%slope, soil composition, cover density, etc.) and derive a mathematical model through experiment. That's real science, but it consumes a lot of time, money, and energy.
A potential shortcut involves reviewing the scientific literature for an existing mathematical model and using it. That approach is used in figure 1, a map-ematical evaluation of the Revised Universal Soil Loss Equation (RUSLE)— kind of like landslides from a bug's perspective. The expected soil loss per acre from an area, such as a farmer's field, is determined from the product of six factors: the rainfall, the erodibility of the soil, the length and steepness (gradient) of the ground slope, the crop grown in the soil, and the land practices used. The RUSLE equation and its variable definitions are shown in figure 1. The many possible numerical values for each factor require extensive knowledge and preparation. However, a soil conservationist normally works in a small area, such as a single county, and often needs only one or two rainfall factors (R), values for only a few soils (K), and only a few cropping/practices systems (C and P). The remaining terrain data (L and S) are tabulated for individual fields.
Figure 1. Basic GIS model of the Revised Universal Soil Loss Equation and extensions.
The RUSLE model can be evaluated two ways: aggregated or disaggregated. An aggregated model uses a spatial database management system (DBMS) to store the six factors for each field, and then solves the equation through a database query. A map of predicted soil loss by individual field can be displayed, and the total loss for an entire watershed can be calculated by summing each of the constituent field losses (loss per acre multiplied by number of acres). That RUSLE implementation provides several advancements, such as geo-query access, automated acreage calculations, and graphic display, over the current procedures.
However, it also raises serious questions. Many fields don't fit the assumptions of an aggregated model. Field boundaries reflect ownership rather than uniformly distributed RUSLE variables. Just ask any farmer about field variability (particularly if their field's predicted soil loss puts them out of compliance). A field might have two or more soils, and it might be steep at one end and flat on the other. Such spatial variation is known to the GIS (e.g., soil and slope maps), but not used by the aggregated model. A disaggregated model breaks an analysis unit (farmer's field in this case) into spatially representative subunits. The equation is evaluated for each of the subunits, and then combined for the parent field.
In a vector system, the subunits are derived by overlaying maps of the six RUSLE factors, independent of ownership boundaries. In a raster system, each cell in the analysis grid serves as a subunit. The equation is evaluated for each "composite polyglet" or "grid cell," then weight-averaged by area for the entire field. If a field contains three different factor conditions, the predicted soil loss proportionally reflects each subunit's contribution. The aggregated approach requires the soil conservationist to fudge the parameters for each of the conditions into generally representative values, and then run the equation for the whole field. Also, the aggregated approach loses spatial guidance for the actual water drainage— a field might drain into two or more streams in different proportions.
Figure 1 shows several extensions to the disaggregated model. Inset 1 depicts the basic spatial computations for soil loss. Inset 2 uses field boundaries to calculate the average soil loss for each field based on its subunits. Inset 3 provides additional information not available with the aggregated approach. Areas of high soil loss (AMAX) are isolated from the overall soil map (A), and then combined with the FIELDS map to locate areas out of compliance. That directs the farmer's attention to portions of the field which might require different management action.
Inset 4 enables the farmer to reverse calculate the RUSLE equation. In this case, a soil loss tolerance (T) is established for an area, such as a watershed, and then the combinations of soil loss factors meeting the standard are derived. Because the climatic and physiographic factors of R, K, L, and S are beyond a farmer's control, attention is focused on vegetation cover (C) and control practices (P). In short, the approach generates a map of the set of crop and farming practices that keep the field within soil loss compliance— good information for decision making.
OK, what's wrong with the disaggregated approach? Two things: our databases and our science. For example, our digital maps of elevation may be too coarse to capture the subtle tilts and turns that water follows. And the science behind the RUSLE equation may be too coarse (modeling scale) to be applied to quarter-acre polyglets or cells. These limitations, however, tell us what we need to d0— improve our data and redirect our science. From that perspective, GIS is more of a revolution in spatial reasoning than an evolution of current practice into a graphical form.
Author's Note: Let me apologize for this brief treatise on an extremely technical subject. How water cascades over a surface, or penetrates and loosens the ground, is directed by microscopic processes. The application of GIS (or any other expansive mode) by its nature muddles the truth. The case studies presented are intended to illustrate various GIS modeling approaches and stimulate discussion about alternatives.