Beyond Mapping
|
Map
Analysis book with companion CD-ROM for hands-on exercises and further reading |
Lumpers and Splitters
Propel GIS — describes
the two camps of GIS (GeoExploration and GeoScience)
The Softer Side of GIS — describes
a Manual GIS (circa 1950) and the relationship between social science
conceptual frameworks for understanding/judgment in GIS modeling
Is GIS Technology Ahead of Science? — discusses several issues surrounding the differences in the treatment of non-spatial and spatial data
<Click
here> right-click to download a printer-friendly version of this topic
(.pdf).
(Back to
the Table of Contents)
______________________________
Lumpers and Splitters
Propel
(GeoWorld, December, 2007)
The last few columns (September-November, 2007; Topic 7 in the online Beyond Mapping III compilation at http://www.innovativegis.com/basis/MapAnalysis) have focused on the numerical nature of GIS data. Early discussion challenged the traditional assumption that all data are “normally” distributed suggesting that most spatial data are skewed and that the Median and Quartile Range often are better descriptive statistics than the Mean and Standard Deviation.
Such heresy was followed by an assertion that any central tendency statistic tends to overly generalize and often conceal inherent spatial patterns and relationships within nearly all field collected data. In most applications, Surface Modeling techniques, such as density analysis and spatial interpolation, can be applied to derive the spatial distribution of a set of point-sampled data.
Figure 1 outlines the major points of the earlier discussion. The left side of the figure depicts Desktop Mapping’s approach that reduces a set of field data to a single representative value that is assumed to be everywhere the same within each polygon (Discrete Spatial Object). Each parcel is “painted” with an appropriate color indicating the typical value—with darker green indicating a slightly lower average value derived from numerous samples falling within the polygon.
Map Analysis’s approach, on the other hand, establishes a spatial gradient based on the relative positions and values of the point-sampled data (Continuous Spatial Distribution). A color ramp is used to display the continuum of estimated values throughout each parcel—light green (low) to red (high). Note that the continuous representation identifies a cluster of extremely high values in the upper center portion of the combined parcels that is concealed by the discrete thematic mapping of the averages.
Figure 1. A data set can be
characterized both discretely and continuously to derive different perspectives
of spatial patterns and relationships.
OK, so much for review …what about the big picture? The discussion points to today’s convergent trajectory of two GIS camps— GeoExploration and GeoScience. Traditional computer companies like Google, Microsoft and Yahoo are entering the waters of geotechnology at the GeoExploration shallow end. Conversely, GIS vendors with deep keels in GeoScience are capitalizing on computer science advances for improved performance, interoperability and visualization.
An important lesson learned by the GeoScience camp is that data has to be integrated with a solution and not left as an afterthought for users to cobble together. Another lesson has been that user interfaces need to be intuitive, uncluttered and consistent across the industry. Additionally, the abstract 2D pastel map is giving way to 3D visualization and virtual reality renderings— a bit of influence from our CAD cousins and the gaming industry.
But what are the take-aways for traditional computer science vendors? First and foremost is an active awareness of the breadth of geotechnology, both in terms of its technical requirements and its business potential. Under the current yardstick of “eyeball contacts,” GeoExploration tools have been wildly successful.
But at the core, have recent technological advancements really changed mapping? …or has the wave of GeoExploration tools just changed mapping’s expression and access? …has the GIS evolution topped (or bottomed) out? …what about the future?
Current revolutionary steps in analytics and concepts are underway like the energized paddling beneath a seemingly serene swan. As a broad-brush framework for discussion of where we are heading, recall from your academic days the Philosopher’s Progression of Understanding shown in figure 2. It suggests that are differences between the spatial Data/Information describing geographic phenomena and the Knowledge/Wisdom needed for prescribing management action that solve complex spatial problems.
Figure 2. The two broad camps of
geotechnology occupy different portions of the philosopher’s progression of
understanding.
Most GeoExploration applications simply assemble spatial data into graphic form. While it might be a knock-your-socks-off graphic, the distillation of the data to information is left to visceral viewing and human interpretation and judgment (emphasizing Data and Information).
For example, a mash-up of a set of virtual pins representing crimes in a city can be poked into a Google Earth display. Interpretation and assessment of the general pattern, however, is left for the brain to construe. But there is a multitude of analytics that can be brought into play that translates the spatial data into information, knowledge and wisdom needed for decision-making. Geo-query can segment by the type of crime; density analysis can isolate unusually high and low pockets of crime; coincident statistics can search for correlation with other data layers; effective distance can determine proximity to key features; spatial data mining can derive prediction models.
While the leap from mapping to map analysis might be well known to those in GeoScience, it represents a bold new frontier to the GeoExploration camp. It suggests future development of solutions that stimulate spatial reasoning through “thinking with maps” (Information and Knowledge) rather than just visualizing data— a significant movement beyond mapping.
In
part, the differences between the GeoExploration and GeoScience camps parallel
society’s age-old dichotomy of problem perception—lumpers and splitters. A "lumper" takes a broad view
assuming that details of a problem are not as important as overall trends ...a
picture is worth a thousand words (holistic).
A "splitter" takes a detailed view of the interplay among
problem elements ...a model links thousands of pieces (atomistic).
So how does all this playout in
geotechnology’s future? The two camps
are symbiotic and can’t survive without each other; sort of like Ralph and
Alice Kramden in The Honeymooners.
GeoExploration fuels the fire of mass acceptance, and in large part
finances technology development through billions of mapping clicks (General
User; access and visualization).
GeoScience lubricantes the wheels of advancement by developing new data
structures, analytical tools and applications (Application Specialist; spatial
reasoning and understanding).
It’s important to note that neither camp
is stationary and that they are continually evolving as we move beyond
traditional mapping. A large portion of
the mystique and influence of application specialists just a few years ago are
now commonplace on the desks (and handheld devices) of the general public. Similarly, the flat, pastel colored maps of
just a few years ago have given away to interactive 3D displays. While there will always be the lumpers and
splitters differences in perspective, their contributions to the stone soup of
geotechnology are equally valuable—actually invaluable.
The Softer Side of GIS
(GeoWorld, January, 2008)
While computer-based procedures supporting Desktop Mapping seem revolutionary, the idea of linking descriptive information (What) with maps (Where) has been around for quite awhile. For example, consider the manual GIS that my father used in the 1950s outlined in figure 1.
The heart of the system was a specially designed index card that had a series of numbered holes around its edge with a comment area in the middle. In a way it was like a 3x5 inch recipe card, just a little larger and more room for entering information. For my father, a consulting forester, that meant recording timber stand information, such as area, dominant tree type, height, density, soil type and the like, for the forest parcels he examined in the field (What). Aerial photos were used to delineate the forest parcels on a corresponding map tacked to a nearby wall (Where).
Figure 1. Outline of the processing flow of a manual
GIS, circa 1950.
What went on between the index card and the
map was revolutionary for the time. The
information in the center was coded and transferred to the edge by punching out
(notching) the appropriate numbered holes.
For example, hole #11 would be notched to identify a Douglas fir timber
stand. Another card would be notched at
hole #12 to indicate a different parcel containing ponderosa pine. The trick was to establish a mutually
exclusive classification scheme that corresponded to the numbered holes for all
of the possible inventory descriptors and then notch each card to reflect the
information for a particular parcel.
Cards for hundreds of timber stands were
indiscriminately placed in a tray.
Passing a long needle through an appropriate hole and then lifting and
shaking the stack caused all of the parcels with a particular characteristic to
fallout— an analogous result to a simple SQL query to a digital database. Realigning the subset of cards and passing
the needle through another hole then shaking would execute a sequenced
query—such as Douglas fir (#11) AND Cohasset soil (#28).
The resultant card set identified the parcels
satisfying a specific query (What). The
parcel ID# on each card corresponded to a map parcel on the wall. A thin paper sheet was placed over the base
map and the boundaries for the parcels traced and color-filled (Where)—a
“database-entry geo-query.” A “map-entry
geo-query,” such as identifying all parcels abutting a stream was achieved by
viewing the map, is achieved by noting the parcel ID#’s on the map and
searching with the needle to subset the abutting parcels to get their
characteristics.
The old days wore out a lot of shoe leather
running between the index card tray and the map tacked to the wall. Today, it’s
just electrons scurrying about in a computer at gigahertz speed. However, the bottom line is that the
geo-query/mapping approach hasn’t changed substantially—linking “What is Where”
for a set of pre-defined parcels and their stored descriptors. But the future of GIS holds entirely new
spatial analysis capabilities way outside our paper map legacy.
Figure 2 graphically relates the softer
(human dimensions) and harder (technology) sides of GIS. The matrix is the result of musing over some
things lodged in my psyche years ago when I was a grad student (see Author’s
Note 1). Last month’s column (December
2007) described the Philosopher’s Levels of Understanding (first column)
that moves thinking from descriptive Data,
to relevant Information, to Knowledge of interrelationships and
finally to prescriptive Wisdom that
forms the basis for effective decision-making.
The dotted horizontal line in the progression identifies the leap from
visualization and visceral interpretation in GeoExploration of Data and
Information to the map analysis ingrained in GeoScience for gaining Knowledge
and Wisdom for problem solving.
Figure 2. Conceptual framework for moving maps from
Description to Prescription application.
The second column extends the gradient of Understanding to the stark reality of Judgment that complicates most decision-making applications of GIS. The basic descriptive level for Facts is analogous to that of Data and includes things that we know, such as the circumference of the earth, Brittney Spears’ birth date, her age and today’s temperature. Relevant Facts correspond to Information encompassing only those facts that pertain to a particular concern, such as today’s temperature of 32oF.
It is at the next two levels that the Understanding and Judgment frameworks diverge and translate into radically different GIS modeling environments. Knowledge implies certainty of relationships and forms the basis of science—discovery of scientific truths. The concept of Perception, however, is a bit mushier as it involves beliefs and preferences based on experience, socialization and culture—development of perspective. For example, a Floridian might feel that 32o is really cold, while an Alaskan feels it certainly is not cold, in fact rather mild. Neither of the interpretations is wrong and both diametrically opposing perceptions are valid.
The highest level of Opinion/Values implies actionable beliefs that reflect preferences, not universal truths. For example, the Floridian might hate the 32o weather, whereas the Alaskan loves it. This stark dichotomy of beliefs presents a real problem for many GIS technologists as the bulk of their education and experience was on the techy side of campus, where mapping is defined as precise placement of physical features (description of facts). But the other side of campus is used to dealing with opposing “truths” in judgment and sees maps as more fluid, cognitive drawings (prescription of relationships).
The columns on the right attempt to relate the dimensions of Understanding and Judgment to Map Types and Spatial Processing used in prescriptive mapping. The descriptive levels are well known to GIS’ers—Base maps from field collected data (e.g., elevation) and Derived maps calculated by analytical tools (e.g., slope from elevation).
Interpreted maps, on the other hand, calibrate Base/Derived map layers in terms of their perceived impact on a spatial solution. For example, gentle slopes might be preferred for powerline routing (assigned a value of 1) with increasing steepness less preferred (assign values 2 through 9) and very steep slopes prohibitive (assign 0). A similar preference scale might be calibrated for a preference to avoid locations of high Visual Exposure, in or near Sensitive Areas, far from Roads or having high Housing Density. In turn, the model criteria are weighted in terms of their relative importance to the overall solution, such as a homeowner’s perception that Housing Density and Visual Exposure preference ratings are ten times more important than Sensitive Areas and Road Proximity ratings (see Author’s Note 2).
Interpreted maps provide a foothold for tracking divergent assumptions and interpretations surrounding a spatially dependent decision. Modeled maps put it all together by simulating an array of opinions and values held by different stakeholder groups involved with a particular issue, such as homeowners, power companies and environmentalists concerns about routing a new powerline.
The Understanding progression assumes common truths/agreement at each step (more a natural science paradigm), whereas the Judgment progression allows differences in opinion/beliefs (more a social science paradigm). GIS modeling needs to recognize and embrace both perspectives for effective spatial solutions tuned to different applications. From the softer side perspective, GIS isn’t so much a map, as it is the change in a series of maps reflecting valid but differing sets of perceptions, opinions and values. Where these maps agree and disagree becomes the fodder for enlightened discussion, and eventually an effective decision. Judgment-based GIS modeling tends to fly in the face of traditional mapping— maps that change with opinion sound outrageous and are radically different from our paper map legacy and the manual GIS of old. It suggests a fundamental change in our paradigm of maps, their use and conjoined impact—are you ready?
_____________________________
Author’s Notes:
1) Ross Whaley, Professor Emeritus at SUNY-Syracuse and member of my
doctoral committee, in a plenary presentation at the New York State
Is
(GeoWorld,
February 1999, pg. 28-29)
The movement from
mapping to map analysis marks a turning point in the collection and processing
of geographic data. It changes our
perspective from “spatially-aggregated” descriptions and images of an area to
“site-specific” evaluation of the relationships among mapped variables. The extension of the basic map elements from
points, lines and areas to map surfaces and the quantitative treatment of these
data fuel the transition. However, this
new perspective challenges the conceptual differences between spatial and
non-spatial data, their analysis and scientific foundation.
For many it appears to propagate as many questions as it seems to answer. I recently had the opportunity to reflect on
the changes in spatial technology and its impact on science for a presentation*
before a group of scientists. Five
foundation-shaking questions emerged.
Is
the “scientific method” relevant in the data-rich age of knowledge engineering?
The
first step in the scientific method is the statement of a hypothesis. It reflects a “possible” relationship or new
understanding of a phenomenon. Once a
hypothesis is established, a methodology for testing it is developed. The data needed for evaluation is collected
and analyzed and, as a result, the hypothesis is accepted or rejected. Each completion of the process contributes to
the body of science, stimulates new hypotheses, and furthers knowledge.
The scientific method has served science well.
Above all else, it is efficient in a data-constrained environment. However, technology has radically changed the
nature of that environment. A spatial
database is composed of thousands upon thousands of spatially registered
locations relating a diverse set of variables.
In this data-rich environment, the focus of the scientific method shifts from
efficiency in data collection and analysis to the derivation of alternative
hypotheses. Hypothesis building results
from “mining” the data under various spatial, temporal and thematic partitions. The radical change is that the data collection
and initial analysis steps proceed the hypothesis statement— in effect, turning
the traditional scientific method on its head.
Is the “random thing” pertinent in deriving mapped data
A
cornerstone of traditional data analysis is randomness. In data collection it seeks to minimize the
effects of spatial autocorrelation and dependence among variables. Historically, a scientist could measure only
a few plots and randomness was needed to provide an unbiased sample for
estimating the typical state of a variable (i.e., average and standard
deviation).
For questions of central tendency, randomness is essential as it
supports the basic assumptions about analyzing data in numeric space, devoid of
“unexplained” spatial interactions.
However, in geographic space, randomness rarely exists and spatial relationships are fundamental to
site-specific management and research.
Adherence to the “random thing” runs counter to continuous spatial expression
of variables. This is particularly true
in sampling design. While efficiently
establishing the central tendency, random sampling often fails to consistently
exam the spatial pattern of variations.
An underlying systematic sampling design, such as systematic unaligned
(see
Are geographic distributions a
natural extension of numerical distributions?
To
characterize a variable in numeric space, density functions, such as the standard
normal curve, are used. They translate
the pattern of discrete measurements along a “number line” into a continuous
numeric distribution. Statistics
describing the functional form of the distribution determine the central
tendency of the variable and ultimately its probability of occurrence. Consideration of additional variables results
in an N-dimensional numerical distribution visualized as a series of
scattergrams.
The geographic distribution of a variable can be derived from discrete sample
points positioned in geographic space.
Map generalization and spatial interpolation techniques can be used to
form a continuous distribution, in a manner analogous to deriving a numeric
distribution (see
Although the conceptual approaches are closely aligned, the information
contained in numeric and geographic distributions is different. Whereas numeric distributions provide insight
into the central tendency of a variable, geographic distributions provide
information about the geographic pattern of variations. Generally speaking, non-spatial
characterization supports a “spatially-aggregated” perspective, while spatial
characterization supports “site-specific” analysis. It can be argued that research using
non-spatial techniques provides minimal guidance for site-specific management—
in fact, it might be even dysfunctional.
Can spatial dependencies be modeled
Non-spatial
modeling, such as linear regressions derived from a set of sample points,
assume spatially independent data and seeks to implement the “best overall”
action everywhere. Site-specific
management, on the other hand, assumes spatially dependent data and seeks to
evaluate “IF <spatial condition>
THEN <spatial action>” rules
for the specific conditions throughout a management area. Although the underlying philosophies of the
two approaches are at odds, the “mechanics” of their expression spring from the
same roots.
Within a traditional mathematical context, each map represents a “variable,”
each spatial unit represents a “case” and the value at that location represents
a “measurement.” In a sense, the map
locations can be conceptualized as a bunch of sample plots— it is just that
sample plots are everywhere (vis. cells in a gridded map surface). The result is a data structure that tracks spatial
autocorrelation and spatial dependency.
The structure can be conceptualized as a stack of maps with a vertical
pin spearing a sequence of values defining each variable for that location—
sort of a data shishkebab. Regression,
rule induction or a similar technique, can be applied to the data to derive a
spatially dependent model of the relationship among the mapped variables.
Admittedly, imprecise, inaccurate or poorly modeled surfaces, can incorrectly
track the spatial relationships. But,
given good data, the “map-ematical”
approach has the capability of modeling the spatial character inherent in the
data. What is needed is a concerted
effort by the scientific community to identify guidelines for spatial modeling
and develop techniques for assessing the accuracy of mapped data and the
results of its analysis.
How can “site-specific” analysis contribute to the scientific body of
knowledge?
Traditionally research has
focused on intensive investigations comprised of a limited number of
samples. These studies are well designed
and executed by researchers who are close to the data. As a result, the science performed is both
rigorous and professional. However, it
is extremely tedious and limited in both time and space. The findings might accurately reflect
relationships for the experimental plots during the study period, but offer
minimal information for a land manager 70 miles away under different
conditions, such as biological agents, soil, terrain and climate.
Land managers, on the other hand, supervise large tracks of land for long
periods of time, but are generally unaccustomed to administering scientific
projects. As a result, general
operations and scientific studies have been viewed as different beasts. Scientists and managers each do their own
thing and a somewhat nebulous step of “technology transfer” hopefully links the
two.
Within today’s data-rich environment, things appear to be changing. Managers now have access to databases and
analysis capabilities far beyond those of scientists just a few years ago. Also, their data extends over a spectrum of
conditions that can’t be matched by traditional experimental plots. But often overlooked is the reality that these
operational data sets form the scientific fodder needed to build the spatial
relationships demanded by site-specific management.
Spatial technology has changed forever land management operations— now it is
destined to change research. A close
alliance between researchers and managers is the key. Without it, constrained research (viz.
esoteric) mismatches the needs of evolving technology, and heuristic (viz.
unscientific) rules-of-thumb are substituted.
Although mapping and “free association” geo-query clearly stimulates
thinking, it rarely contains the rigor needed to materially advance scientific
knowledge. Under these conditions a
data-rich environment can be an information-poor substitute for good science.
So where do we go from here?
In the new world of spatial technology the land manager has the
comprehensive database and the researcher has the methodology for its analysis—
both are key to successfully unlocking the relationships needed for
site-specific management. In a sense, technology is ahead of science, sort
of the cart before the horse. A
______________________
(Back to the Table of Contents)