|
| Data Analysis |
 |
This
chapter reviews data manipulation and analysis
capabilities within a GIS. The focus is on reviewing
spatial data analytical functions. This chapter
categorizes analytical functions within a GIS and will be
of most interest to technical staff and GIS operators.
 |
Manipulation and
Transformation of Spatial Data |
 |
Integration
and Modeling of Spatial Data |
 |
Integrated Analytical
Functions |
The major difference between
GIS software and CAD mapping software is the provision of
capabilities for transforming the original spatial data
in order to be able to answer particular queries. Some
transformation capabilities are common to both GIS and
CAD systems, however, GIS software provides a larger
range of analysis capabilities that will be able to
operate on the topology or spatial aspects of the
geographic data, on the non-spatial attributes of these
data, or on both.
The
main criteria used to define a GIS is its
capability to transform and integrate spatial
data.
|
Manipulation and Transformation of
Spatial Data
The maintenance and transformation
of spatial data concerns the ability to input,
manipulate, and transform data once it has been created.
While many different interpretations exist with respect
to what constitutes these capabilities some specific
functions can be identified. These are reviewed below.
| Coordinate
Thinning Coordinate
thinning involves the weeding or reduction
of coordinate pairs, e.g. X and Y, from arcs.
This function is often required when data has
been captured with too many vertices for the
linear features. This can result in redundant
data and large data volumes. The weeding of
coordinates is required to reduce this
redundancy.
The thinning of coordinates
is also required in the map generalization
process of linear simplification. Linear
simplification is one component of generalization
that is required when data from one scale, e.g.
1:20,000, is to be used and integrated with data
from another scale, e.g. 1:100,000. Coordinate
thinning is often done on features such as
contours, hydrography, and forest stand
boundaries.
|
| Geometric
Transformations This function is concerned with the
registering of a data layer to a common
coordinate scheme. This usually involves
registering selected data layers to a standard
data layer already registered. The term rubber
sheeting is often used to describe this
function. Rubber sheeting involves stretching one
data layer to meet another based on predefined
control points of known locations. Two other
functions may be categorized under geometric
transformations. These involve warping a
data layer stored in one data model, either
raster or vector, to another data layer stored in
the opposite data model. For example, often
classified satellite imagery may require warping
to fit an existing forest inventory layer, or a
poor quality vector layer may require warping to
match a more accurate raster layer.
|
| Map
Projection Transformations This functionality concerns the
transformation of data in geographic coordinates
for an existing map projection to another map
projection. Most GIS software requires that data
layers must be in the same map projection for
analysis. Accordingly, if data is acquired in a
different projection than the other data layers
it must be transformed. Typically 20 or more
different map projections are supported in a GIS
software offering.
|
| Conflation
- Sliver Removal Conflation is formally defined as
the procedure of reconciling the positions of
corresponding features in different data layers.
More commonly this is referred to as sliver
removal. Often two layers that contain the
same feature, e.g. soils and forest stands both
with a specific lake, do not have exactly the
same boundaries for that feature, e.g. the lake.
This may be caused by a lack of coordination or
data prioritization during digitizing or by a
number of different manipulation and analysis
techniques. When the two layers are combined,
e.g. normally in polygon overlay, they will not
match precisely and small sliver polygons will be
created. Conflation is concerned with the process
for removing these slivers and reconciling the
common boundary.
There are several
approaches for sliver removal. Perhaps the most
common is allowing the user to define a priority
for data layers in combination with a tolerance
value. Considering the soils and forest stand
example the user could define a layer that takes
precedence, e.g. forest stands, and a size
tolerance for slivers. After polygon overlay if a
polygon is below the size tolerance it is
classified a sliver. To reconcile the situation
the arcs of the data layer that has higher
priority will be retained and the arcs of the
other data layer will be deleted. Another
approach is to simply divide the sliver down the
centre and collapse the arcs making up the
boundary. The important point is that all GIS
software must have the capability to resolve
slivers. Remember that it is generally much less
expensive to reconcile maps manually in the map
preparation and digitizing stage than afterwards.
|
| Edge
Matching Edge
matching is simply the procedure to adjust the
position of features that extend across typical
map sheet boundaries. Theoretically data from
adjacent map sheets should meet precisely at map
edges. However, in practice this rarely occurs.
Misalignment of features can be caused by several
factors including digitizing error, paper
shrinkage of source maps, and errors in the
original mapping. Edge matching always requires
some interactive editing. Accordingly, GIS
software differs considerably in the degree of
automation provided.
|
| Interactive
Graphic Editing Interactive graphic editing
functions involve the addition, deletion, moving,
and changing of the geographic position of
features. Editing should be possible at any time.
Most graphic editing occurs during the data
compilation phase of any project. Remember
typically 60 to 70 % of the time required to
complete any project involves data compilation.
Accordingly, the level of sophistication and ease
of use of this capability is vitally important
and should be rated highly by those evaluating
GIS software. Many of the editing that is
undertaken involves the cleaning up of
topological errors identified earlier. The
capability to snap to existing elements,
e.g. nodes and arcs, is critical.
The functionality of
graphic editing does not differ greatly across
GIS software offerings. However, the user
interface and ease of use of the editing
functions usually does. Editing within a GIS
software package should be as easy as using a CAD
system. A cumbersome or incomplete graphic
editing capability will lead to much frustration
by the users of the software.
|
Integration
and Modelling of Spatial Data
The integration of data provides
the ability to ask complex spatial questions that could
not be answered otherwise. Often, these are inventory or
locational questions such as how much ? or where
?. Answers to locational and quantitative questions
require the combination of several different data layers
to be able to provide a more complete and realistic
answer. The ability to combine and integrate data is the
backbone of GIS.
Often, applications do require a
more sophisticated approach to answer complex spatial
queries and what if ? scenarios. The technique
used to solve these questions is called spatial
modelling. Spatial modelling infers the use of
spatial characteristics and methods in manipulating data.
Methods exist to create an almost unlimited range of
capabilities for data analysis by stringing together sets
of primitive analysis functions. While some explicit
analytical models do exist, especially in natural
resource applications, most modelling formulae
(models) are determined based on the needs of a
particular project. The capability to undertake complex
modelling of spatial data, on an ad hoc basis, has helped
to further the resource specialists understanding of the
natural environment, and the relationship between
selected characteristics of that environment.
The use of GIS spatial modelling
tools in several traditional resource activities has
helped to quantify processes and define models for
deriving analysis products. This is particularly true in
the area of resource planning and inventory compilation.
Most GIS users are able to better organize their
applications because of their interaction with, and use
of, GIS technology. The utilization of spatial modelling
techniques requires a comprehensive understanding of the
data sets involved, and the analysis requirements.
The
critical function for any GIS is the integration
of data.
|
The raster data
model has become the primary spatial data source for
analytical modeling with GIS. The raster data model is
well suited to the quantitative analysis of numerous data
layers. To facilitate these raster modeling techniques
most GIS software employs a separate module specifically
for cell processing.
 |
| (from
Berry) |
| The
following diagram represents a logic flowchart of
a typical natural resource model using GIS raster
modeling techniques. The boxes represent raster
maps in the GIS, while the connection lines imply
an analytical function or technique. |
 |
| (from
Berry) |
Integrated
Analytical Functions in a GIS
Most GIS's provide the capability
to build complex models by combining primitive analytical
functions. Systems vary as to the complexity provided for
spatial modelling, and the specific functions that are
available. However, most systems provide a standard set
of primitive analytical functions that are accessible to
the user in some logical manner. Aronoff identifies four
categories of GIS analysis functions. These are :
 |
Retrieval,
Reclassification, and Generalization; |
 |
Topological
Overlay Techniques; |
 |
Neighbourhood
Operations; and |
 |
Connectivity
Functions. |
The range of
analysis techniques in these categories is very large.
Accordingly, this section of the book focuses on
providing an overview of the fundamental primitive
functions that are most often utilized in spatial
analyses.
| Retrieval,
Reclassification and Generalization Perhaps the initial GIS analysis
that any user undertakes is the retrieval
and/or reclassification of data. Retrieval
operations occur on both spatial and attribute
data. Often data is selected by an attribute
subset and viewed graphically. Retrieval involves
the selective search, manipulation, and output of
data without the requirement to modify the
geographic location of the features involved.
Reclassification
involves the selection and presentation of a
selected layer of data based on the classes or
values of a specific attribute, e.g. cover group.
It involves looking at an attribute, or a series
of attributes, for a single data layer and
classifying the data layer based on the range of
values of the attribute. Accordingly,
features adjacent to one another that have a
common value, e.g. cover group, but differ in
other characteristics, e.g. tree height, species,
will be treated and appear as one class. In
raster based GIS software, numerical values are
often used to indicate classes. Reclassification
is an attribute generalization technique.
Typically this function makes use of polygon
patterning techniques such as crosshatching
and/or color shading for graphic representation.
In a vector based GIS,
boundaries between polygons of common reclassed
values should be dissolved to create a
cleaner map of homogeneous continuity. Raster
reclassification intrinsically involves boundary
dissolving. The dissolving of map boundaries
based on a specific attribute value often results
in a new data layer being created. This is often
done for visual clarity in the creation of
derived maps. Almost all GIS software provides
the capability to easily dissolve boundaries
based on the results of a reclassification. Some
systems allow the user to create a new data layer
for the reclassification while others simply
dissolve the boundaries during data output.
One can see how the
querying capability of the DBMS is a necessity in
the reclassification process. The ability and
process for displaying the results of
reclassification, a map or report, will vary
depending on the GIS. In some systems the
querying process is independent from data display
functions, while in others they are integrated
and querying is done in a graphics mode. The
exact process for undertaking a reclassification
varies greatly from GIS to GIS. Some will store
results of the query in query sets independent
from the DBMS, while others store the results in
a newly created attribute column in the DBMS. The
approach varies drastically depending on the
architecture of the GIS software.
|
| Topological
Overlay The
capability to overlay multiple data layers in a
vertical fashion is the most required and common
technique in geographic data processing.
In fact, the use of a topological data structure
can be traced back to the need for overlaying
vector data layers. With the advent of the
concepts of mathematical topology polygon
overlay has become the most popular
geoprocessing tool, and the basis of any
functional GIS software package.
Topological
overlay is predominantly
concerned with overlaying polygon data with
polygon data, e.g. soils and forest cover.
However, there are requirements for overlaying
point, linear, and polygon data in selected
combinations, e.g. point in polygon, line in
polygon, and polygon on polygon are the most
common. Vector and raster based software differ
considerably in their approach to topological
overlay.
Raster based
software is oriented towards arithmetic overlay
operations, e.g. the addition, subtraction,
division, multiplication of data layers. The
nature of the one attribute
map approach, typical of the
raster data model, usually provides a more
flexible and efficient overlay capability. The
raster data model affords a strong numerically
modelling (quantitative analysis) modelling
capability. Most sophisticated spatial modelling
is undertaken within the raster domain.
In vector based systems
topological overlay is achieved by the creation
of a new topological network from two or more
existing networks. This requires the rebuilding
of topological tables, e.g. arc, node, polygon,
and therefore can be time consuming and CPU
intensive. The result of a topological overlay in
the vector domain is a new topological network
that will contain attributes of the original
input data layers. In this way selected queries
can then be undertaken of the original layer,
e.g. soils and forest cover, to determine where
specific situations occur, e.g. deciduous forest
cover where drainage is poor.
Most GIS software makes use
of a consistent logic for the overlay of multiple
data layers. The rules of Boolean logic
are used to operate on the attributes and spatial
properties of geographic features. Boolean
algebra uses the operators AND, OR, XOR, NOT to
see whether a particular condition is true or
false. Boolean logic represents all possible
combinations of spatial interaction between
different features. The implementation of Boolean
operators is often transparent to the user.
To date
the primary analysis technique used in
GIS applications, vector and raster, is
the topological overlay of selected data
layers.
|
Generally,
GIS software implements the overlay of different
vector data layers by combining the spatial and
attribute data files of the layers to create a
new data layer. Again, different GIS software
utilize varying approaches for the display and
reporting of overlay results. Some systems
require that topological overlay occur on only
two data layers at a time, creating a third
layer. This pairwise approach requires the
nesting of multiple overlays to generate a final
overlay product, if more than two data layers are
involved. This can result in numerous
intermediate or temporary data layers. Some
systems create a complete topological structure
at the data verification stage, and the user
merely submits a query string for the combined
topological data. Other systems allow the user to
overlay multiple data layers at one time. Each
approach has its drawbacks depending on the
application and the nature of the implementation.
Determining the most appropriate method is based
on the type of application, practical
considerations such as data volumes and CPU
power, and other considerations such personnel
and time requirements. Overall, the flexibility
provided to the operator and the level of
performance varies widely among GIS software
offerings.
| The following
diagram illustrates a typical overlay
requirements where several different
layers are spatially joined to created a
new topological layer. By combining
multiple layers in a topological fashion
complex queries can be answered
concerning attributes of any layer. |

|

|
| |
|
| Neighbourhood
Operations Neighbourhood
operations evaluate the characteristics of an
area surrounding a specific location.
Virtually all GIS software provides some form of
neighbourhood analysis. A range of different
neighbourhood functions exist. The analysis of
topographic features, e.g. the relief of the
landscape, is normally categorized as being a
neighbourhood operation. This involves a variety
of point interpolation techniques
including slope and aspect calculations, contour
generation, and Thiessen polygons. Interpolation
is defined as the method of predicting unknown
values using known values of neighbouring
locations. Interpolation is utilized most often
with point based elevation data.
| This
example illustrates a continuous surface
that has been created by interpolating
sample data points. |
 |
Elevation
data usually takes the form of irregular or
regular spaced points. Irregularly space points
are stored in a Triangular Irregular Network (TIN).
A TIN is a vector topological network of
triangular facets generated by joining the
irregular points with straight line segments. The
TIN structure is utilized when irregular data is
available, predominantly in vector based systems.
TIN is a vector data model for 3-D data.
An alternative in
storing elevation data is the regular point
Digital Elevation Model (DEM).
The term DEM usually refers to a grid of
regularly space elevation points. These points
are usually stored with a raster data model. Most
GIS software offerings provide three dimensional
analysis capabilities in a separate module of the
software. Again, they vary considerably with
respect to their functionality and the level of
integration between the 3-D module and the other
more typical analysis functions.
Without doubt the
most common neighbourhood function is buffering.
Buffering involves the ability to create
distance buffers around selected features, be it
points, lines, or areas. Buffers are created as
polygons because they represent an area around a
feature. Buffering is also referred to as corridor
or zone generation
with the raster data model. Usually, the results
of a buffering process are utilized in a
topological overlay with another data layer. For
example, to determine the volume of timber within
a selected distance of a cutline, the user would
first buffer the cutline data layer. They would
then overlay the resultant buffer data layer, a
buffer polygon, with the forest cover data layer
in a clipping fashion. This would result in a new
data layer that only contained the forest cover
within the buffer zone. Since all attributes are
maintained in the topological overlay and
buffering processes, a map or report could then
be generated.
Buffering is typically used
with point or linear features. The generation of
buffers for selected features is frequently based
on a distance from that feature, or on a specific
attribute of that feature. For example, some
features may have a greater zone of influence due
to specific characteristics, e.g. a primary
highway would generally have a greater influence
than a gravel road. Accordingly, different size
buffers can be generated for features within a
data layer based on selected attribute values or
feature types.
|
| Connectivity
Analysis The
distinguishing feature of connectivity operations
is that they use functions that accumulate values
over an area being traversed. Most often these
include the analysis of surfaces and networks.
Connectivity functions include proximity
analysis, network analysis, spread
functions, and three dimensional surface analysis
such as visibility and perspective
viewing. This category of analysis techniques
is the least developed in commercial GIS
software. Consequently, there is often a great
difference in the functionality offered between
GIS software offerings. Raster based systems
often provide the more sophisticated surface
analysis capabilities while vector based systems
tend to focus on linear network analysis
capabilities. However, this appears to be
changing as GIS software becomes more
sophisticated, and multi-disciplinary
applications require a more comprehensive and
integrated functionality. Some GIS offerings
provide both vector and raster analysis
capabilities. Only in these systems will one fund
a full range of connectivity analysis techniques.
|
Proximity
analysis techniques are primarily concerned
with the proximity of one feature to another.
Usually proximity is defined as the
ability to identify any feature that is near any
other feature based on location, attribute value,
or a specific distance. A simple example is
identifying all the forest stands that are within
100 metres of a gravel road, but not necessarily
adjacent to it. It is important to note that
neighbourhood buffering is often categorized as
being a proximity analysis capability. Depending
on the particular GIS software package, the data
model employed, and the operational architecture
of the software it may be difficult to
distinguish proximity analysis and buffering.
| |
 |
| Proximity
analysis is often used in urban based
applications to consider areas of
influence, and ownership queries.
Proximity to roads and engineering
infrastructure is typically important for
development planning, tax calculations,
and utility billing. |
The
identification of adjacency is another
proximity analysis function. Adjacency is defined
as the ability to identify any feature having
certain attributes that exhibit adjacency with
other selected features having certain
attributes. A typical example is the ability to
identify all forest stands of a specific type,
e.g. specie, adjacent to a gravel road.
|
| Network
analysis is a widely used analysis technique.
Network analysis techniques can be characterized
by their use of feature networks. Feature
networks are almost entirely comprised of linear
features. Hydrographic hierarchies and
transportation networks are prime examples. Two
example network analysis techniques are the allocation
of values to selected features within the
network to determine capacity zones, and the
determination of shortest path between
connected points or nodes within the network
based on attribute values. This is often referred
to as route optimization. Attribute values
may be as simple as minimal distance, or more
complex involving a model using several
attributes defining rate of flow, impedance, and
cost. |
| Three
dimensional analysis involves a range of
different capabilities. The most utilized is the
generation of perspective surfaces. Perspective
surfaces are usually represented by a wire frame
diagram reflecting profiles of the landscape,
e.g. every 100 metres. These profiles viewed
together, with the removal of hidden lines,
provide a three dimensional view. As previously
identified, most GIS software packages offer 3-D
capabilities in a separate module. Several other
functions are normally available. These include the following
functions :
 |
user definable vertical
exaggeration, viewing azimuth,
and elevation angle; |
 |
identification of viewsheds,
e.g. seen versus unseen areas; |
 |
the draping of
features, e.g. point, lines, and shaded
polygons onto the perspective surface; |
 |
generation of shaded relief
models simulating illumination; |
 |
generation
of cross section profiles; |
 |
presentation
of symbology on the 3-D surface; and |
 |
line
of sight perspective views from user
defined viewpoints. |
|
While the primitive
analytical functions have been presented the reader
should be aware that a wide range of more specific and
detailed capabilities do exist.
The
overriding theme of all GIS software is that the
analytical functions are totally integrated with
the DBMS component. This integration provides the
necessary foundation for all analysis techniques.
|
|