Comparison of Results for Interpolation and Extrapolation Test Sets
The following table extends the Residual Analysis discussed in the GIS Toolbox column (How Good Is My Map?, September, 1966). It compares the results for interpolated data ("Interp" estimated locations within the geographic bounds of the set of samples used in generating the map surfaces) and extrapolated data ("Extrap" outside the bounds).
Sample Average Inverse Kriging MinCurve
Interp Extrap Interp Extrap Interp Extrap Interp Extrap
#17* -- 23 -- 8 -- 2 -- 6
#18* -- -25 -- -6 -- -2 -- -20
#19* -- -41 -- -12 -- 1 -- 1
#20* -- -42 -- -11 -- -9 -- -17
#21 -11 -- -1 -- -4 -- -3 --
#22 23 -- 2 -- -1 -- 1 --
#23 17 -- 1 -- -5 -- -5 --
#24 -56 -- -12 -- -9 -- -10 --
#25 -41 -- -12 -- 4 -- 26 --
#26* -- 15 -- 0 -- -1 -- -2
#27 4 -- 3 -- 0 -- -2 --
#28* -- 17 -- 2 -- -3 -- -12
#29 11 -- 3 -- 2 -- 7 --
#30 6 -- 2 -- 3 -- -10 --
#31* -- 14 -- -1 -- -3 -- -25
#32 9 -- 5 -- -3 -- -21 --
Sum Residuals -38 -39 -9 -20 -13 -15 -17 -69
Avg Unsigned 19.8 25.3 4.6 5.7 3.4 3.0 9.4 11.9
Normalized .71 .91 .16 .21 .12 .11 .34 .43
Grouped (Interp and Extrp): (see table with article)
Sum Residuals -77 -29 -28 -86
Avg Unsigned 22.2 5.1 3.3 10.5
Normalized .80 .18 .12 .38
Now lets take a look at the numbers
The first thing you should note is that the sample size in the two populations is too small to be confident in any analysis comparing them (#_Interp= 9 and #_Extrap= 7). Putting that aside, note that all three techniques show nearly equal or better "performance" for the Interpolated estimates than for the Extrapolated ones. The Kriging technique is the exception, showing slightly better performance for the Extrapolated test set (Normalized .11 versus .12), but the slight difference might simply be an artifact of the small sample size.
The Average and Inverse techniques show about an eleven percent improvement for the Interpolated estimates versus the Interp/Extrap grouped results (.80-.71/.80*100= 11.25%; .18-.16/.18*100= 11.0%). The MinCurve technique shows less improvement (10.5%). The Kriging technique shows no improvement for the Interpolated set, but an eight percent improvement for the Extrapolated set. Since this technique uses trends in the data, the results seem to confirm that the trend extends beyond the geographic region of the interpolation data set.
The tendency to underestimate was relatively balanced for the Average and Kriging techniques, with their Sum of the Residuals bias fairly equally split (-38 and -39 of -77; 13 and -15 of -28). However, the Injverse and MinCurve techniques biases were relatively unbalanced (-9 and -20 of -29; -17 and -69 of -86) with an increased tendency to underestimate the Extrapolated values.
I wonder if there is a "significant difference" between the residuals for the Interpolation and Extrapolation estimates for each of the techniques? Thats a fair question, but requires a bit of calculation. The formulae for a "t-Test" to see if the means of the two populations are different are:
(Interp_Avg - Extrap_Avg)
t= ---------------------------------------------------------------
SQRT ((Pooled_Var(#_Interp + #_Extrap) / (#_Interp * # _Extrap))
where, the Pooled Variance is
SUMSQ_Interp + SUMSQ_Extrap
Pool_Var= -------------------------------
(#_Interp - 1) + (#_Extrap - 1)
and variable processes are
Interp_Avg = average of interpolated residuals
Extrap_Avg = average of extrapolated residuals
#_Interp = number of interpolated residuals (9 in this case)
#_Extrap = number of extrapolated residuals (7 in this case)
SQRT = square root
SUMSQ_Interp = sum of the squares of the interpolated residuals
SUMSQ_Extrap = sum of the squares of the extrapolated residuals
The calculated t-Test value is compared to values in a "Distribution of t" table for the degrees of freedom (9-1 + 7-1 = 14, in this case) at various levels of significance. Lets see if there is a significant difference for the MinCurve Interp/Extrap populations:
1405 + 1499 2904
Pool_Var= ------------------- = ---- = 207.43
(9-1) - (7-1) 14
(-1.89) - (-9.86) 7.97
t = ------------------------- = ---- = 1.10
SQRT ((207.43 * 16) / 63) 7.26
In this case, tabular t with 14 degrees of freedom at the 0.05 level is 2.145. Since our sample value (1.10) is less than this, the difference is not significant at the 0.05 level. Anyone out there willing to test the Average, Inverse and Kriging techniques to see if the Interp/Extrap estimates are "significantly" different? (Show your work).
Although the t-Test is a good procedure to test for significant differences in results, three requirements must be met for it to be valid: 1) random sample of residuals, 2) residual values must be normally distributed, and 3) each group must have similar variances. Concern 1) was met, as the residuals were randomly sampled. Concern 2) is a problem as the number of samples in each group are small (9 and 7). Concern 3) can be checked by Bartletts Test for homogeneity and if group variances are deemed unequal, a slightly different t-Test is used.
WHEW!!! The upshot of all this is that there "appears" to be a difference between the levels of performance (residuals) between the Interpolated and Extrapolated estimates, but we cant say that there is a "statistically significant" difference between the two for the MinCurve map surface. Personally, I would attempt to limit the proportion of extrapolated estimates in generating a map from point data, particularly if I were using the Inverse or MinCurve techniques. This means that the sampling design should "push" samples toward the edge of the field and not start well within the field "just for symmetry."