Tbox39_Supplement

Supplement for GIS Toolbox column
December, 1998, "Mapping Spatial Dependency"

The following further discussion and Excel worksheet is posted at www.innovativegis.com/basis under Column Supplements…

_______________________________

Using the F-test to Evaluate Localized Spatial Autocorrelation

by John Gins, HyperParallel, Inc. (jgins@earthlink.com)

Note: The designation of "Berry IT" for this procedure in the Beyond Mapping column, GIS World, 1/99, was made in jest. Although the "handle" is in jest, the approach is genuine. As explained in the article, the procedure uses a "doughnut neighborhood" formed by a center cell and its adjacent cells (inside) and the surrounding distant cells (outside). This "roving window" is passed over a grid data, successively evaluating each map location. John Gins suggests using the F-test to directly compare the variances.

A easy way to assess localized correlation is to compare the variances… formulate the question, hypothesis this way:

"Is the Variance of the adjacent neighborhood A = Var(A) the same as that
of the Variance of the doughnut neighborhood D = Var(D)?"

Define the sample size of the adjacent neighborhood A as n(A) and the sample size of the doughnut neighborhood D as n(D). If we can assume that the population of A is normally distributed and that the population of D is normally distributed, Var(A)/Var(D) follows the Fisher Distribution, F, with (n(A)-1) and (n(D)-1) degrees of freedom. The result of evaluating the function F(Var(A)/Var(D), n(A)-1, n(D)-1) is the upper tail probability.

Note that if Var(A) > Var(D) then F ranges from 50% to 100%, if Var(A) < Var(D) then F ranges from 0% to 50%, and if Var(A) = Var(D) then F=50%.

This value of F can be obtained in Excel using the following function:

FDIST (x,degrees_freedom1,degrees_freedom2)

x= Var(A)/Var(D)

degrees_freedom1 = n(A)-1

degrees_freedom2 = n(D)-1

Typically in the middle of the map n(A)=5 and n(D)=20. If we are willing to accept a 5% chance of being wrong, then we would reject our hypothesis that Variance of the adjacent neighborhood is equal to the Variance of the doughnut neighborhood if the ratio of the Variances . Var(A)/Var(D). is less than 0.1166 or greater than 3.5587.

This range can be obtained in Excel using the following function:

FINV (probability,degrees_freedom1,degrees_freedom2)

Probability = 0.025 and 0.975

degrees_freedom1 = n(A)-1

degrees_freedom2 = n(D)-1

Using the data from Joe’s example, I have supplied a "downloadable" Excel spreadsheet (FDIST.XLS, VERY big file... be patient) with all of the computational steps delineated. Each Worksheet Tab contains a portion of the calculation.

Another thought… Heteroscadasticity - the lack of consist variance over groups, is usually measured via analysis of variance over combinations of data from independent groups or cells. The usual methods would not apply to the donut versus local neighborhoods because of the way cell data is reused via the roving window. We can do some analysis of the Fdist values that are derived from the ratios as follows:

If the variances are random then I would expect to see the Fdist values uniformly distributed from 0 to 1. (5% of the values should be observed in each 0.05 interval). If the variances are the same over the map then all of the Fdist values of the ratios would be clustered around 0.5. If there are lots of features then I would expect to see values less then .05 or greater than .95.

A good measure of the difference of the Fdist values to uniform is the Kolmogorov-Smirnov goodness of fit test. I would want to use intervals of <=.01, .01-<=0.025, 0.025-<=.075, ... , 0.475-<=0.525, ... 0.925-<=0.975, 0.975-<=0.99,>0.99

This would give us a single measure over an entire map. If this is of interest I will work up an example.