|
The Precision Farming Primer |
|
|
|
Overview
of the Case Study
”Some Assembly” Required in Precision Ag
Mapped Data Visualization and Summary
Preprocessing and Map Normalization
Exchanging Mapped Data
Comparing Yield Maps
Comparing Yield Surfaces
From Point Data to Map Surfaces
Benchmarking Interpolation Results
Assessing Interpolation Performance
Calculating Similarity Within A Field
Identifying Data Zones
Mapping Data Clusters
Predicting Maps
Stratifying Maps for Better Predictions
Spatial Analysis Operations
(return to Table
of Contents)
Overview of the Case Study (return to top)
Author’s note: An educational version of the MapCalc
software by Red Hen Systems used in the case study provides “hands-on”
exercises for the topics discussed in this appendix. See www.redhensystems.com for more
information and ordering; US$21.95 for Student CD and US$495.00 for Instructor
CD with multi-seat license for a computer lab (prices subject to change).
Now
that precision farming is beginning its second decade, where are we? Yield mapping is gaining commonplace status
for many crops and locales. Site-specific
management of field fertilization has a growing number of users. Remote sensing applications from satellite
imagery to aerial photos and on the ground “proximal” sensing are coming
onboard. Irrigation control, field
leveling, variable rate seeding, on-farm studies, disease/pest modeling, stress
maps and a myriad other computer mapping uses are on the horizon.
Fundamental
to all of these applications are the digital nature of a computer map and the
procedures required to turn these data into useful information for
decision-making. The bottom line for
precision farming appears to be an understanding of the numbers and what one
can do with them to enhance their interpretation.
The
next several columns will apply map analysis techniques in a case study
format. We will begin with a series of
techniques used to visualize and summarize field data and end with several
techniques that develop maps to guide management action. The focus will be on methodology (a.k.a., the
precision farming “toolkit”) and not on the scientific results or
generalizations about the complex biological, adaphic or physiographic factors
driving crop production in the example field.
These intricate relationships are left to agricultural scientists to
scrutinize, interpret and put into context with numerous other crops and
fields.
The
objective of the series of articles simply is to demonstrate the analysis
procedures on a consistent data set. The
intent is to build a case study for a single field that illustrates the wealth
of emerging analysis capabilities that are available to scientists,
agronomists, crop consultants and producers.
One
might argue that we have “the technical cart in front of the scientific
horse.” In many instances that appears
true. We can collect data, view the
numbers as maps or summary statistics, but struggle to make sense of it
all. In part, the disconnect isn’t
technical versus science but the predicament of “the chicken or the egg coming
first.”
We
have a radically different set of analysis tools than those available just ten
years ago. Concurrently, we have a
growing number of farmers collecting geo-registered data, such as crop yield,
soil nutrients and meteorological conditions.
The computer industry is turning out software and machines that are
increasing more powerful, yet costing less and less and are easier to use. The ag equipment companies are developing
implements and instruments that respond to “on the go” movements throughout a
field. In short, the pieces are coming
together.
The
accompanying table lists the topics that will be covered in the series. Most of details of the topics have been
discussed in previous “GIS Toolbox” articles.
What awaits is the demonstration of their application to a consistent
data set. Hopefully, the case study will
stimulate innovative thinking as much as it outlines how the procedures can be
applied. All of the processing will be
done with readily available software for a PC environment.
The
next decade in precision farming will focus on making sense of the
site-specific relationships we measure.
The technology is here (analysis toolbox) to bridge the gap between maps
and decisions. Its widespread
application will be determined by how effective we are in applying it.
Case Study in Agricultural Map Analysis
|
Mapped Data Visualization and Summary |
|
Preprocessing
and Map Normalization |
|
Comparing
Maps and Temporal Analysis |
|
Interpolating
and Assessing Map Surfaces |
|
Integrating
Remote Sensing Data |
|
Delineating
Management Zones |
|
Clustering
Groups of Similar Data |
|
Measuring
Correlation Among Maps |
|
Developing
Predictive Models |
|
Generating
Management Action Maps |
“Some Assembly Required” in Precision Ag (return to top)
Before
we start wrestling with the details of spatial data analysis it makes sense to
consider some larger issues surrounding their application—namely, the interaction
between technology and science, the relevant expression of results,
and the current field-centric orientation of analysis.
The underlying concept of
precision agriculture is that site-specific actions are better than whole-field
ones. In applying fertilizer, for
example, changing the rate and mixture of the nutrients throughout a field puts
more than the average where it is needed and less where it is not needed. However, precise science in the formulation
of prescription maps is critical.
To
date the infusion of site-specific science has not kept pace with the
technologies underlying site-specific agriculture. We can produce maps of yield variation and a
host of potential driving factors, from soil properties and nutrients to
electric conductivity and topographic relief.
We can generate prescription maps that guide the precise application of
various materials. Where we used to use
one recommendation, we now have several for different locations in a
field. So what’s the problem?
It’s
not that there is a problem; it’s just that there is far more potential. Introduction of spatial technology has
forever altered agriculture—for the researcher, as well as the farmer. The traditional research paradigm involves
sampling a very small portion of a field then analyzing these data without reference
to their spatial context and relationships.
The most frequently used method develops a “regression model” of one
variable on another(s), then extrapolates the results for large geographic
regions.
A
classic example is a production curve that relates various levels of a nutrient
to yield—e.g., the Yield vs. Phosphorous plots you encountered in agronomy
101. The interpretation of the curve is
that if a certain level of phosphorous is present, then a certain amount of
yield is expected. This science, plus a
whole lot of field experience (windage), is what goes into a
prescription—whole-field and site-specific alike. If a map of current phosphorous levels
exists, then the application rate can be adjusted so that each location
receives just the right amount to put it at the right spot on the production
curve.
Current
precision ag technology enables just that…site-specific application, but
based on state-wide equations (science). In fact, the “rules” used in most systems
were developed years ago at experiment stations miles away, indifferent to
geographic patterns of topographic, adaphic and biological conditions, and
analyzed with non-spatial statistics that intentionally ignore spatial
autocorrelation.
Today,
an opportunity exists for site-specific science to “tailor” the decision rules
for different conditions within a field.
For example, more phosphorous might be applied to south-facing toe
slopes than north-facing head-slopes.
But the least amount would be applied in flat or depressed areas. Of course the spatially-specific rule set
would change for different soil types and in accordance with the level of
organic matter.
The
primary difference between “non-spatial science” and its spatial cousin is the
analysis of map variables versus a small set of sample plots attempting to replicate
all possible conditions and their extent within a field. Until recent developments there wasn’t a
choice, as the enabling spatial technologies didn’t exist ten years ago.
Now
that we have spatial tools we need to infuse them into agricultural science. Ideally, these activities will gravitate from
the experiment station model to extensive on-farm studies that encompass actual
conditions on a farmer’s land. In
addition, the focus will expand from identifying “management zones” to the
analysis of continuous map surfaces.
Without shifts in perspective, the “whole-field” approach simply will be
replaced by “smaller-field-chunks” (management zones) that blindly apply
non-spatial rule sets. Without fully
engaged science, the full potential of site-specific agriculture will be
aggregated into a series of half steps dictated by technology.
In
addition to a closer science/technology marriage, the results of its
application need to be expressed in new and more informative ways. The introduction of maps has enlightened
farmers and researchers alike to the variability within agricultural
fields. Interactive queries of a stack
of maps identify conditions at specific locations. However, simply presenting base map data
doesn’t always translate into information for decision-making. Further processing is needed to decode these
data into a decision context. For
example, bushels and pounds per acre could be rendered into revenue, cost and
net profit maps under specified market conditions. As the assumptions are changed the economic
maps would change similar to a spreadsheet.
Broadening
the geographic focus of precision agriculture is a third force directing the
future development. To date, most
spatial analysis of farm inputs and outputs stop at the edge of the fields…
like the fields are islands surrounded by nothing. This myopic mindset disregards the larger
geographic context of agriculture.
Increasing environmental concern in farm practices could radically alter
the field focus in the not too distant future.
For
example, the focus might be extended to the surface flows and transport of
materials (fine soil particles, organic matter, chemicals, etc.) that connect a
field to its surroundings. Like forestry
companies, the cumulative environment effects within an air- or watershed might
need to be considered before certain farming activities can take place. The increasing urban/farm interface will
surely alter the need and methods of communication between agriculture and
other parts of society. Spatial analysis
can play a key role in how agricultural stewardship is effectively
communicated.
Precision
agriculture is not going away. The blend
of modern science and technology is like a locomotive with momentum—impossible
to stop. But the direction of the tracks
determines where it goes. Current
decisions by software developers, equipment manufacturers, service providers
and farmers themselves, are laying the groundwork for tomorrow’s direction. Thinking “out of the box” at this stage might
be the best route.
Mapped Data Visualization and Summary (return to top)
The next several columns
will apply map analysis techniques in a case study format. Discussion begins with a series of procedures
used to visualize and summarize field data and ends with several approaches
that develop maps to guide management action.
The map-ematical procedures used to translate data into useful
information is demonstrated using a subset of the data collected for a large
multidisciplinary project of the Water Management Unit of ARS and the
Experiment Station at Colorado State University. The MapCalc software by Red Hen Systems is
used throughout the case study for analysis and display of the data. A set of “hands-on” exercises based on the
case study is available for classroom and self-study (see author’s note).
The most fundamental
concept in analyzing precision farming data is the recognition that all of the
maps are “numbers first, pictures later.” While the human mind is good at a lot of
things, noting subtle differences within large, complex sets of spatial data
isn’t one of them. To visualize these
data we first must aggregate the detailed information into a few discrete
categories then assign colors to each of the groupings.
Figure 1. Two
different renderings (categorizations) of yield data.
For example, the yield maps
shown in figure 1 appear dramatically different but were generated from the
same harvest data. Both maps have five
categories extending from low to high yield and use the same color scheme.
The differences in
appearance arise from the methods used in determining the breakpoints of the
categories—the one on the left uses “Equal Ranges” while the on the right uses
“Equal Count.” Equal Ranges
is similar to
an elevation contour map since it divides the data range (2.33 to 295 bu/ac)
into five steps of about 60 bushels each. Note that the middle interval
(119.3 to 177.8) has more than 100 acres and comprises over half of the field
(54%). Equal Count, on the other hand, divides the same
data into five classes, but this time each category represents about the same
area (20%) and contains a similar number of samples (about 660).
Figure 2.
Statistical summary of yield data.
The differences between the
two approaches are best visualized from a combined statistical/graphical
perspective. The left panel in figure 2
shows the basic Descriptive Statistics for the data set—min,
max, range, mean, median, standard deviation, variance and total area. While these statistics report a fairly good
harvest overall (an average yield of 158 bu/ac), they fail to map the yield
patterns (spatial variability) throughout the field.
The two histograms on the
right side of figure 2 link the statistical perspective with the map
displays. Recall that a histogram is
simply a plot of the data range (X-axes) versus the number of samples
(Y-axis). The “vertical bars”
superimposed on the histograms identify the breakpoints for the
categories—evenly spaced along the Y-axis for Equal Ranges while variably
spaced for Equal Count. Note the direct
relationship between the placement of the bars and the category intervals shown
in the map legends shown in figure 1.
For example, the first interval begins at 2.3 bushels for both maps but
breaks at 60.8 for Equal Ranges and 140.9 for Equal Count. The differences in the placement of the
breakpoints (vertical bars) account for the differences in the appearance in
the two yield maps.
Similarly, changes in the number
of intervals, and to a lesser degree, changes in the color scheme
will produce different looking maps. By
its very nature, discrete contour mapping aggregates the spatial variability
within field-collected data. The
assignment into categories is inherently a subjective process and any changes
in the contouring parameters result in different map renderings. Since decisions are based on the appearance
of spatial patterns, different actions could be taken depending on which
visualization of the actual data was chosen.
Figure 3. A reference grid is used to directly link the
map display and the stored data.
The
map on the left-side of figure 3 shows the Equal Count contour map of yield
composed of irregular polygons that identify the discrete categories. The inset directly above the contour map
characterizes the underlying lattice data structure for a portion
of the field. A yield value is stored
for each of the line intersections of the superimposed regular grid. The breakpoint values for the map categories
are plotted along the lines connecting the actual data. The contours are formed by connecting the
breakpoints, smoothing the boundary lines and filling the polygons with the
appropriate color.
The
grid data structure utilizes a superimposed reference grid as
well. However, as depicted in the grid
map and inset on the right side of figure 3, the centroid of each grid cell is
assigned the actual value and the entire cell is filled with the appropriate
color. Note the similarities and
differences in the yield patterns between the contour and grid displays—best
yields generally in the southeast and northwest portions of the field.
Next month’s column will investigate the important considerations in representing precision farming data as continuous map surfaces, viewing these data as 3-D displays and pre-processing procedures used to “normalize” maps for subsequent analysis.
Preprocessing and normalization are critical steps in
map analysis that are often overlooked when the sole objective is to generate
graphical output. In analysis, however, map
values and their numerical relationships takes center stage.
Preprocessing involves conversion of raw
data into consistent units that accurately represents field conditions. For example, a yield monitor’s calibration
is used to translate electronic signals into measurements of crop production
units, such as bushels per acre (measure of volume) or tons per hectare
(measure of mass). The appropriate units
(e.g., bushels vs. tons) and measurement system (English acres vs. metric
hectares) is determined and applied to the raw data. While this point is important to software
developers, it is somewhat moot in most practical applications after the
initial installation of software preferences.
A
bit more cantankerous preprocessing concern involves raw data adjustments and
corrections. Like calibrations, adjustments
involve “tweaking” of the values… sort of like a slight turn on that bathroom
scale to alter the reading to what you know is your true weight.
Corrections, on the other hand,
dramatically change measurement values, both numerically and spatially. For example, as discussed in previous columns
the time lag from a combine’s header to the yield monitor can require
considerable repositioning of yield measurements. Also, a combine can pass over the same place
more than once and the duplicate records must be corrected. Even more difficult is the corrections of a
reading from a partial header width or the effects of empting and filling the
thrashing bin as a combine exits from one pass and enters another.
The
old adage “garbage in, garbage out” holds true for any data collection
endeavor. Procedures for calibration,
adjustments and corrections are used to translate raw data measurements into
the most accurate map values possible.
But even the best data often needs a further refinement before it is
ready for analysis.
Normalization involves standardization of
a data set, usually for comparison among different types of data. For example, crop rotation can provide
different views of the productivity of a field.
While direct visual comparison of onion and corn yield maps might
provide insight, some form of normalization of the data is needed prior to any
quantitative analysis. In a sense,
normalization techniques allow you to “compare apples and oranges” using a
standard “mixed fruit scale of numbers.”
The
most basic normalization procedure uses a goal to adjust map
values. For example, a goal of 250
bushels per acre might be used to normalize a yield map for corn. The equation,
Norm_GOAL = (mapValue / 250 ) *
100
derives the percentage of the goal achieved by each
location in a field. In evaluating the
equation, the computer substitutes a map value for a field location, completes
the calculation, stores the result, and then repeats the process for all of the
other map locations.
Insert
B in figure 1 shows the results of goal normalization. Note the differences in the descriptive
statistics between the original and normalized data— a data range of 2.33 to
295 with an average of 158 bushels per acre for the original data versus .934
to 118 with an average of 63.3 percent of the goal.
Figure 1. Comparison of original and “goal normalized”
data.
Now note that the two histograms look nearly identical
and that the same holds for the two maps.
In theory, the histogram and map patterns are identical. The slight differences in the figure are
simply an artifact of rounding the display intervals for the rescaled
graphics. While the descriptive
statistics are different, the relationships (patterns) in the normalized
histogram and map are identical to the original data.
That’s an important point— both the numeric and spatial relationships in the data are preserved during normalization. In effect, normalization simply “rescales” the values like changing from one set of units to another (e.g., switching from bushels per acre to cubic meters per hectare). The significance of the goal normalization is that the new scale allows comparison among different fields and even crop types— the “mixed fruit” expression of apples and oranges.Appendix_D_files\image012.png
Other
commonly used normalization expressions are 0-100 and SNV. 0-100 normalization forces a
consistent range of values by spatially evaluating the equation
Norm_0-100 = ((mapValue – min) * 100) / (max – min), where max and min are the
maximum and
minimum values of the original data.
The
result is a rescaling of the data to the range 0 to 100 while retaining the
same relative numeric and spatial patterns of the original data.
While
goal normalization benchmarks a standard value without regard to the original
data, the 0-100 procedure rescales the original data range to a fixed, standard
range. The third normalization
procedure, standard normal variable (SNV), uses yet another
approach. It rescales the data based on
the central tendency of the data by applying the equation
Norm_SNV =
((mapValue - mean) / stdev) * 100, where mean is the average and stdev
is the standard
deviation of the original data.
The result is rescaling of the data to the percent variation from the average. Mapped data expressed in this form enables you to easily identify “statistically unusual” areas— +100% locates areas that are one standard deviation above the typical yield; -100% locates areas that are one standard deviation below
.
Map
preprocessing and normalization are often forgotten steps in a rush to make a
map, but they are critical pre-cursor to a host of subsequent analyses. For precision farming to move beyond pretty
maps, the map values themselves become the focus— both their accuracy and their
appropriate scaling.
Map normalization is often a
forgotten step in the rush to make a map, but is critical to a host of
subsequent analyses from visual map comparison to advanced data analysis. The ability to easily export the data in a
universal format is just as critical.
Instead of a “do-it-all” GIS system, data exchange exports the
mapped data in a format that is easily consumed and utilized by other software
packages.
Figure 1. The
map values at each grid location form a single record in the exported table.
Figure 1 shows the process
for grid-based data. Recall that a
consistent analysis frame is used to organize the data into map layers. The map values at each cell location for selected
layers are reformatted into a single record and stored in a standard export
table that, in turn, can be imported into other data analysis software.
The example in the figure
shows the procedure for exporting a standard comma separated variable (CSV)
file with each record containing the selected data for a single grid cell. The user selects the map layers for export
and specifies the name of the output file.
The computer accesses the data and constructs a standard text line with
commas separating each data value. Note
that the column, row of the analysis frame and its latitude, longitude earth
poison is contained in each record. In
the example, the export file is brought into Excel for further processing.
Figure
2. Mapped data can be imported into
standard statistical packages for further analysis.
Figure 2 shows the agricultural
data imported into the JMP statistical package (by SAS). Area (1) shows the histograms and descriptive
statistics for the P, K and N map layers shown in figure 2. Area (2) is a “spinning 3D plot” of the data
that you can rotate to graphically visualize relationships among the map
layers. Area (3) shows the results of
applying a multiple linear regression model to predict crop yield from the soil
nutrient maps. These are but a few of
the tools beyond mapping that are available through data exchange between GIS
and traditional spreadsheet, database and statistical packages—a perspective
that integrates maps with other technologies.
Modern statistical packages
like JMP “aren’t your father’s” stat experience and are fully interactive with
point-n-click graphical interfaces and wizards to guide appropriate
analyses. The analytical tools, tables
and displays provide a whole new view of traditional mapped data. While a map picture might be worth a thousand
words, a gigabyte or so of digital map data is a whole revelation and foothold
for site-specific decisions.
One of the most fundamental operations in map analysis is the comparison of two maps. Questions like “how different are the maps?”, “how are they different?” and “where are they different?” immediately spring to mind. Quantitative answers are needed because visual comparison cannot fully consider all of the detail in an objective manner.
Recall
that there are two basic forms of mapped data used in precision farming—
discrete maps (vector) and map surfaces (grid).
Let’s consider discrete map comparison first. The two maps shown in figure 1 identify corn
yield for successive seasons on a central-pivot field in Colorado. Note that the maps have been normalized to
300 bu/ac and displayed with a common color pallet… but how different are they;
how are they different; and where are they different?
Figure
1. Discrete Yield Maps for Consecutive
Years.
While your eyes flit back and forth in an attempt to compare the maps, the computer approaches the problem more methodically. The first step converts the vector contour lines to a grid value for each cell. An analysis grid resolution is chosen (50ft cells are used in this example) and geometrically aligned with the maps. The dominant yield class within each cell is assigned its interval value (values 1 through 5 corresponding to the color range). The grid mesh used is superimposed on the yield maps for visual reference.
Figure 2. Coincidence Map Identifying the Conditions
for Both Years.
The
next step, as shown in figure 2, combines the two maps into a single map that
indicates the “joint condition” for both years.
Since the two maps have identical gridding, the computer simply
retrieves the two class assignments for a grid location then converts them to a
single number.
The map-ematical procedure merely computes the “first value times ten plus the second value” to form a compound number. In the example shown in the figure, the value “forty-three” is interpreted as class 4 in the first year but decreasing to class 3 in the next year.