The Precision Farming Primer  
   Appendix D:
Case Study in Precision Farming


© 1999
Precision Farming Primer

plough_sm.gif (4683 bytes)

Overview of the Case Study
”Some Assembly” Required in Precision Ag
Mapped Data Visualization and Summary
Preprocessing and Map Normalization
Exchanging Mapped Data
Comparing Yield Maps
Comparing Yield Surfaces
From Point Data to Map Surfaces
Benchmarking Interpolation Results
Assessing Interpolation Performance

Calculating Similarity Within A Field
Identifying Data Zones
Mapping Data Clusters
Predicting Maps
Stratifying Maps for Better Predictions
Spatial Analysis Operations


(return to Table of Contents)

Overview of the Case Study (return to top)

Author’s note:  An educational version of the MapCalc software by Red Hen Systems used in the case study provides “hands-on” exercises for the topics discussed in this appendix.   See www.redhensystems.com for more information and ordering; US$21.95 for Student CD and US$495.00 for Instructor CD with multi-seat license for a computer lab (prices subject to change).

 

Now that precision farming is beginning its second decade, where are we?  Yield mapping is gaining commonplace status for many crops and locales.  Site-specific management of field fertilization has a growing number of users.  Remote sensing applications from satellite imagery to aerial photos and on the ground “proximal” sensing are coming onboard.  Irrigation control, field leveling, variable rate seeding, on-farm studies, disease/pest modeling, stress maps and a myriad other computer mapping uses are on the horizon.

 

Fundamental to all of these applications are the digital nature of a computer map and the procedures required to turn these data into useful information for decision-making.  The bottom line for precision farming appears to be an understanding of the numbers and what one can do with them to enhance their interpretation. 

 

The next several columns will apply map analysis techniques in a case study format.  We will begin with a series of techniques used to visualize and summarize field data and end with several techniques that develop maps to guide management action.  The focus will be on methodology (a.k.a., the precision farming “toolkit”) and not on the scientific results or generalizations about the complex biological, adaphic or physiographic factors driving crop production in the example field.  These intricate relationships are left to agricultural scientists to scrutinize, interpret and put into context with numerous other crops and fields. 

 

The objective of the series of articles simply is to demonstrate the analysis procedures on a consistent data set.  The intent is to build a case study for a single field that illustrates the wealth of emerging analysis capabilities that are available to scientists, agronomists, crop consultants and producers. 

 

One might argue that we have “the technical cart in front of the scientific horse.”  In many instances that appears true.  We can collect data, view the numbers as maps or summary statistics, but struggle to make sense of it all.  In part, the disconnect isn’t technical versus science but the predicament of “the chicken or the egg coming first.” 

 

We have a radically different set of analysis tools than those available just ten years ago.  Concurrently, we have a growing number of farmers collecting geo-registered data, such as crop yield, soil nutrients and meteorological conditions.  The computer industry is turning out software and machines that are increasing more powerful, yet costing less and less and are easier to use.  The ag equipment companies are developing implements and instruments that respond to “on the go” movements throughout a field.  In short, the pieces are coming together.

 

The accompanying table lists the topics that will be covered in the series.  Most of details of the topics have been discussed in previous “GIS Toolbox” articles.  What awaits is the demonstration of their application to a consistent data set.  Hopefully, the case study will stimulate innovative thinking as much as it outlines how the procedures can be applied.  All of the processing will be done with readily available software for a PC environment. 

 

The next decade in precision farming will focus on making sense of the site-specific relationships we measure.  The technology is here (analysis toolbox) to bridge the gap between maps and decisions.  Its widespread application will be determined by how effective we are in applying it.

 

 

Case Study in Agricultural Map Analysis

 

Mapped Data Visualization and Summary

Preprocessing and Map Normalization

Comparing Maps and Temporal Analysis

Interpolating and Assessing Map Surfaces

Integrating Remote Sensing Data

Delineating Management Zones

Clustering Groups of Similar Data

Measuring Correlation Among Maps

Developing Predictive Models

Generating Management Action Maps

 

 

“Some Assembly Required” in Precision Ag (return to top) 

 

Before we start wrestling with the details of spatial data analysis it makes sense to consider some larger issues surrounding their application—namely, the interaction between technology and science, the relevant expression of results, and the current field-centric orientation of analysis.

 

The underlying concept of precision agriculture is that site-specific actions are better than whole-field ones.  In applying fertilizer, for example, changing the rate and mixture of the nutrients throughout a field puts more than the average where it is needed and less where it is not needed.  However, precise science in the formulation of prescription maps is critical.

 

To date the infusion of site-specific science has not kept pace with the technologies underlying site-specific agriculture.  We can produce maps of yield variation and a host of potential driving factors, from soil properties and nutrients to electric conductivity and topographic relief.  We can generate prescription maps that guide the precise application of various materials.  Where we used to use one recommendation, we now have several for different locations in a field.  So what’s the problem?

 

It’s not that there is a problem; it’s just that there is far more potential.   Introduction of spatial technology has forever altered agriculture—for the researcher, as well as the farmer.  The traditional research paradigm involves sampling a very small portion of a field then analyzing these data without reference to their spatial context and relationships.  The most frequently used method develops a “regression model” of one variable on another(s), then extrapolates the results for large geographic regions. 

 

A classic example is a production curve that relates various levels of a nutrient to yield—e.g., the Yield vs. Phosphorous plots you encountered in agronomy 101.  The interpretation of the curve is that if a certain level of phosphorous is present, then a certain amount of yield is expected.  This science, plus a whole lot of field experience (windage), is what goes into a prescription—whole-field and site-specific alike.  If a map of current phosphorous levels exists, then the application rate can be adjusted so that each location receives just the right amount to put it at the right spot on the production curve.

 

Current precision ag technology enables just that…site-specific application, but based on state-wide equations (science).  In fact, the “rules” used in most systems were developed years ago at experiment stations miles away, indifferent to geographic patterns of topographic, adaphic and biological conditions, and analyzed with non-spatial statistics that intentionally ignore spatial autocorrelation. 

 

Today, an opportunity exists for site-specific science to “tailor” the decision rules for different conditions within a field.  For example, more phosphorous might be applied to south-facing toe slopes than north-facing head-slopes.  But the least amount would be applied in flat or depressed areas.  Of course the spatially-specific rule set would change for different soil types and in accordance with the level of organic matter.

 

The primary difference between “non-spatial science” and its spatial cousin is the analysis of map variables versus a small set of sample plots attempting to replicate all possible conditions and their extent within a field.  Until recent developments there wasn’t a choice, as the enabling spatial technologies didn’t exist ten years ago. 

 

Now that we have spatial tools we need to infuse them into agricultural science.  Ideally, these activities will gravitate from the experiment station model to extensive on-farm studies that encompass actual conditions on a farmer’s land.  In addition, the focus will expand from identifying “management zones” to the analysis of continuous map surfaces.  Without shifts in perspective, the “whole-field” approach simply will be replaced by “smaller-field-chunks” (management zones) that blindly apply non-spatial rule sets.  Without fully engaged science, the full potential of site-specific agriculture will be aggregated into a series of half steps dictated by technology.

 

In addition to a closer science/technology marriage, the results of its application need to be expressed in new and more informative ways.  The introduction of maps has enlightened farmers and researchers alike to the variability within agricultural fields.  Interactive queries of a stack of maps identify conditions at specific locations.  However, simply presenting base map data doesn’t always translate into information for decision-making.  Further processing is needed to decode these data into a decision context.  For example, bushels and pounds per acre could be rendered into revenue, cost and net profit maps under specified market conditions.  As the assumptions are changed the economic maps would change similar to a spreadsheet.

 

Broadening the geographic focus of precision agriculture is a third force directing the future development.  To date, most spatial analysis of farm inputs and outputs stop at the edge of the fields… like the fields are islands surrounded by nothing.  This myopic mindset disregards the larger geographic context of agriculture.  Increasing environmental concern in farm practices could radically alter the field focus in the not too distant future. 

 

For example, the focus might be extended to the surface flows and transport of materials (fine soil particles, organic matter, chemicals, etc.) that connect a field to its surroundings.  Like forestry companies, the cumulative environment effects within an air- or watershed might need to be considered before certain farming activities can take place.  The increasing urban/farm interface will surely alter the need and methods of communication between agriculture and other parts of society.  Spatial analysis can play a key role in how agricultural stewardship is effectively communicated.

 

Precision agriculture is not going away.  The blend of modern science and technology is like a locomotive with momentum—impossible to stop.  But the direction of the tracks determines where it goes.  Current decisions by software developers, equipment manufacturers, service providers and farmers themselves, are laying the groundwork for tomorrow’s direction.  Thinking “out of the box” at this stage might be the best route.

 

 

Mapped Data Visualization and Summary (return to top) 

 

The next several columns will apply map analysis techniques in a case study format.  Discussion begins with a series of procedures used to visualize and summarize field data and ends with several approaches that develop maps to guide management action.  The map-ematical procedures used to translate data into useful information is demonstrated using a subset of the data collected for a large multidisciplinary project of the Water Management Unit of ARS and the Experiment Station at Colorado State University.  The MapCalc software by Red Hen Systems is used throughout the case study for analysis and display of the data.  A set of “hands-on” exercises based on the case study is available for classroom and self-study (see author’s note).

 

The most fundamental concept in analyzing precision farming data is the recognition that all of the maps are “numbers first, pictures later.  While the human mind is good at a lot of things, noting subtle differences within large, complex sets of spatial data isn’t one of them.  To visualize these data we first must aggregate the detailed information into a few discrete categories then assign colors to each of the groupings. 

 

Figure 1.  Two different renderings (categorizations) of yield data.

 

For example, the yield maps shown in figure 1 appear dramatically different but were generated from the same harvest data.  Both maps have five categories extending from low to high yield and use the same color scheme. 

 

The differences in appearance arise from the methods used in determining the breakpoints of the categories—the one on the left uses “Equal Ranges” while the on the right uses “Equal Count.”  Equal Ranges is similar to an elevation contour map since it divides the data range (2.33 to 295 bu/ac) into five steps of about 60 bushels each.  Note that the middle interval (119.3 to 177.8) has more than 100 acres and comprises over half of the field (54%).  Equal Count, on the other hand, divides the same data into five classes, but this time each category represents about the same area (20%) and contains a similar number of samples (about 660). 

 

Figure 2.  Statistical summary of yield data.

 

The differences between the two approaches are best visualized from a combined statistical/graphical perspective.  The left panel in figure 2 shows the basic Descriptive Statistics for the data set—min, max, range, mean, median, standard deviation, variance and total area.  While these statistics report a fairly good harvest overall (an average yield of 158 bu/ac), they fail to map the yield patterns (spatial variability) throughout the field. 

 

The two histograms on the right side of figure 2 link the statistical perspective with the map displays.  Recall that a histogram is simply a plot of the data range (X-axes) versus the number of samples (Y-axis).  The “vertical bars” superimposed on the histograms identify the breakpoints for the categories—evenly spaced along the Y-axis for Equal Ranges while variably spaced for Equal Count.  Note the direct relationship between the placement of the bars and the category intervals shown in the map legends shown in figure 1.  For example, the first interval begins at 2.3 bushels for both maps but breaks at 60.8 for Equal Ranges and 140.9 for Equal Count.  The differences in the placement of the breakpoints (vertical bars) account for the differences in the appearance in the two yield maps.

 

Similarly, changes in the number of intervals, and to a lesser degree, changes in the color scheme will produce different looking maps.  By its very nature, discrete contour mapping aggregates the spatial variability within field-collected data.  The assignment into categories is inherently a subjective process and any changes in the contouring parameters result in different map renderings.  Since decisions are based on the appearance of spatial patterns, different actions could be taken depending on which visualization of the actual data was chosen.

 

Figure 3.  A reference grid is used to directly link the map display and the stored data.

 

The map on the left-side of figure 3 shows the Equal Count contour map of yield composed of irregular polygons that identify the discrete categories.  The inset directly above the contour map characterizes the underlying lattice data structure for a portion of the field.  A yield value is stored for each of the line intersections of the superimposed regular grid.  The breakpoint values for the map categories are plotted along the lines connecting the actual data.  The contours are formed by connecting the breakpoints, smoothing the boundary lines and filling the polygons with the appropriate color.

 

The grid data structure utilizes a superimposed reference grid as well.  However, as depicted in the grid map and inset on the right side of figure 3, the centroid of each grid cell is assigned the actual value and the entire cell is filled with the appropriate color.  Note the similarities and differences in the yield patterns between the contour and grid displays—best yields generally in the southeast and northwest portions of the field.

 

Next month’s column will investigate the important considerations in representing precision farming data as continuous map surfaces, viewing these data as 3-D displays and pre-processing procedures used to “normalize” maps for subsequent analysis. 

 

 

Preprocessing and Map Normalization (return to top)

 

Preprocessing and normalization are critical steps in map analysis that are often overlooked when the sole objective is to generate graphical output.  In analysis, however, map values and their numerical relationships takes center stage.

 

Preprocessing involves conversion of raw data into consistent units that accurately represents field conditions.  For example, a yield monitor’s calibration is used to translate electronic signals into measurements of crop production units, such as bushels per acre (measure of volume) or tons per hectare (measure of mass).  The appropriate units (e.g., bushels vs. tons) and measurement system (English acres vs. metric hectares) is determined and applied to the raw data.  While this point is important to software developers, it is somewhat moot in most practical applications after the initial installation of software preferences.

 

A bit more cantankerous preprocessing concern involves raw data adjustments and corrections.  Like calibrations, adjustments involve “tweaking” of the values… sort of like a slight turn on that bathroom scale to alter the reading to what you know is your true weight. 

 

Corrections, on the other hand, dramatically change measurement values, both numerically and spatially.  For example, as discussed in previous columns the time lag from a combine’s header to the yield monitor can require considerable repositioning of yield measurements.  Also, a combine can pass over the same place more than once and the duplicate records must be corrected.  Even more difficult is the corrections of a reading from a partial header width or the effects of empting and filling the thrashing bin as a combine exits from one pass and enters another.

 

The old adage “garbage in, garbage out” holds true for any data collection endeavor.  Procedures for calibration, adjustments and corrections are used to translate raw data measurements into the most accurate map values possible.  But even the best data often needs a further refinement before it is ready for analysis.

 

Normalization involves standardization of a data set, usually for comparison among different types of data.  For example, crop rotation can provide different views of the productivity of a field.  While direct visual comparison of onion and corn yield maps might provide insight, some form of normalization of the data is needed prior to any quantitative analysis.  In a sense, normalization techniques allow you to “compare apples and oranges” using a standard “mixed fruit scale of numbers.”

 

The most basic normalization procedure uses a goal to adjust map values.  For example, a goal of 250 bushels per acre might be used to normalize a yield map for corn.  The equation,


    Norm_GOAL = (mapValue / 250 ) * 100

 

derives the percentage of the goal achieved by each location in a field.  In evaluating the equation, the computer substitutes a map value for a field location, completes the calculation, stores the result, and then repeats the process for all of the other map locations.

 

Insert B in figure 1 shows the results of goal normalization.  Note the differences in the descriptive statistics between the original and normalized data— a data range of 2.33 to 295 with an average of 158 bushels per acre for the original data versus .934 to 118 with an average of 63.3 percent of the goal. 

 

Figure 1.  Comparison of original and “goal normalized” data.

 

Now note that the two histograms look nearly identical and that the same holds for the two maps.  In theory, the histogram and map patterns are identical.  The slight differences in the figure are simply an artifact of rounding the display intervals for the rescaled graphics.  While the descriptive statistics are different, the relationships (patterns) in the normalized histogram and map are identical to the original data.

 

That’s an important point— both the numeric and spatial relationships in the data are preserved during normalization.  In effect, normalization simply “rescales” the values like changing from one set of units to another (e.g., switching from bushels per acre to cubic meters per hectare).  The significance of the goal normalization is that the new scale allows comparison among different fields and even crop types— the “mixed fruit” expression of apples and oranges.Appendix_D_files\image012.png

 

Other commonly used normalization expressions are 0-100 and SNV.  0-100 normalization forces a consistent range of values by spatially evaluating the equation

 

   Norm_0-100 = ((mapValue – min) * 100) / (max – min), where max and min are the

      maximum and minimum values of the original data.

 

 

The result is a rescaling of the data to the range 0 to 100 while retaining the same relative numeric and spatial patterns of the original data.

 

While goal normalization benchmarks a standard value without regard to the original data, the 0-100 procedure rescales the original data range to a fixed, standard range.  The third normalization procedure, standard normal variable (SNV), uses yet another approach.  It rescales the data based on the central tendency of the data by applying the equation

 

   Norm_SNV = ((mapValue - mean) / stdev) * 100, where mean is the average and stdev

      is the standard deviation of the original data.

 

The result is rescaling of the data to the percent variation from the average.  Mapped data expressed in this form enables you to easily identify “statistically unusual” areas— +100% locates areas that are one standard deviation above the typical yield; -100% locates areas that are one standard deviation below

.

 

Map preprocessing and normalization are often forgotten steps in a rush to make a map, but they are critical pre-cursor to a host of subsequent analyses.  For precision farming to move beyond pretty maps, the map values themselves become the focus— both their accuracy and their appropriate scaling.

 

 

Exchanging Mapped Data (return to top)

 

Map normalization is often a forgotten step in the rush to make a map, but is critical to a host of subsequent analyses from visual map comparison to advanced data analysis.  The ability to easily export the data in a universal format is just as critical.  Instead of a “do-it-all” GIS system, data exchange exports the mapped data in a format that is easily consumed and utilized by other software packages.

 

Figure 1. The map values at each grid location form a single record in the exported table.

 

Figure 1 shows the process for grid-based data.  Recall that a consistent analysis frame is used to organize the data into map layers.  The map values at each cell location for selected layers are reformatted into a single record and stored in a standard export table that, in turn, can be imported into other data analysis software.

 

The example in the figure shows the procedure for exporting a standard comma separated variable (CSV) file with each record containing the selected data for a single grid cell.  The user selects the map layers for export and specifies the name of the output file.  The computer accesses the data and constructs a standard text line with commas separating each data value.  Note that the column, row of the analysis frame and its latitude, longitude earth poison is contained in each record.  In the example, the export file is brought into Excel for further processing.  

 

Figure 2.  Mapped data can be imported into standard statistical packages for further analysis.

 

Figure 2 shows the agricultural data imported into the JMP statistical package (by SAS).  Area (1) shows the histograms and descriptive statistics for the P, K and N map layers shown in figure 2.  Area (2) is a “spinning 3D plot” of the data that you can rotate to graphically visualize relationships among the map layers.  Area (3) shows the results of applying a multiple linear regression model to predict crop yield from the soil nutrient maps.  These are but a few of the tools beyond mapping that are available through data exchange between GIS and traditional spreadsheet, database and statistical packages—a perspective that integrates maps with other technologies.         

 

Modern statistical packages like JMP “aren’t your father’s” stat experience and are fully interactive with point-n-click graphical interfaces and wizards to guide appropriate analyses.  The analytical tools, tables and displays provide a whole new view of traditional mapped data.  While a map picture might be worth a thousand words, a gigabyte or so of digital map data is a whole revelation and foothold for site-specific decisions.

 

 

Comparing Yield Maps (return to top)

 

One of the most fundamental operations in map analysis is the comparison of two maps.  Questions like “how different are the maps?”, “how are they different?” and “where are they different?” immediately spring to mind.  Quantitative answers are needed because visual comparison cannot fully consider all of the detail in an objective manner.

 

Recall that there are two basic forms of mapped data used in precision farming— discrete maps (vector) and map surfaces (grid).  Let’s consider discrete map comparison first.  The two maps shown in figure 1 identify corn yield for successive seasons on a central-pivot field in Colorado.  Note that the maps have been normalized to 300 bu/ac and displayed with a common color pallet… but how different are they; how are they different; and where are they different?

 

Figure 1.  Discrete Yield Maps for Consecutive Years.

 

While your eyes flit back and forth in an attempt to compare the maps, the computer approaches the problem more methodically.  The first step converts the vector contour lines to a grid value for each cell.  An analysis grid resolution is chosen (50ft cells are used in this example) and geometrically aligned with the maps.  The dominant yield class within each cell is assigned its interval value (values 1 through 5 corresponding to the color range).  The grid mesh used is superimposed on the yield maps for visual reference. 

 

Figure 2.  Coincidence Map Identifying the Conditions for Both Years. 

 

The next step, as shown in figure 2, combines the two maps into a single map that indicates the “joint condition” for both years.  Since the two maps have identical gridding, the computer simply retrieves the two class assignments for a grid location then converts them to a single number.

 

The map-ematical procedure merely computes the “first value times ten plus the second value” to form a compound number.  In the example shown in the figure, the value “forty-three” is interpreted as class 4 in the first year but decreasing to class 3 in the next year.