GIS Modeling Email Dialog

…an introduction to grid-based map analysis and modeling

GEOG 3110, University of Denver, Geography, Winter Term 2011

Thursdays 6:00-8:50 pm, GIS Lab, Room 126, Boettcher (West)

Keep in mind that for all the lab exercises you have several “life lines” if you need them—

1) send me an email with a specific question,

2) arrange for a phone call via email for tutorial walk-thru (you need to be at a computer with MapCalc or Surfer),

3) an arranged eyeball meeting in the GIS Lab on Thursdays between 10:00 am and 3:00 pm, or

4) open door office hours 3:00 to 5:00 pm (or as specially arranged for Friday mornings).

Send an email with your question(s) and I will respond and then post the response if your question has general class interest—

___________________________

Dear Dr. Berry, I have a question regarding Question #9. Is there a way to look at the R square or Adjusted R square for the multiple linear regression model to see statistically how it fits the data in addition to look at the error surface?  Best, Qing

3/7 Qing— alas, the old stat tool we used for Regress doesn’t provide for a traditional “R square or Adjusted R square” evaluation option and we didn’t extend beyond the basic tool …possibly room for future enhancement.

However, you can get the correlation matrix information using the Correlate command.  Another way is to Export the set of maps as a CSV file and do the statistical analysis (regression and R-square) in a “grown-up” stat package like JMP or SAS.

One interesting feature of Regress, however, is the…

For                   <newMap>

The resulting map contains predicted values for the dependent map using the regression equation.

…that evaluates the equation for the set of map data which you can compare (subtract) to the actual dependent map for an error map that gives you insight into both a spatial pattern and overall levels of error (Shading Manager table summary).  While it is not a “traditional non-spatial” evaluation of regression fit, the “biased performance evaluation” with error map/summary can be useful.  Possibly it could be argued more useful (except with traditional statisticians) as R-squared is an aggregate, non-spatial evaluator and this is spatial statistics …making R-squared sort of off-the mark as it ignores the spatial pattern of relationship.  -Joe

Regress

Regress performs linear regression analysis by using the "least squares" method to fit a line through a set of data points in multiple maps. Each grid location identifies a series of values. You can analyze how a single map (the dependent variable) is affected by the values of one or more other maps (independent variables). For example, you can determine how crop yield is affected by such factors as phosphorous, potassium and pH levels.

Regression is used for developing a prediction model based on a set of sampled data. The relationship between the dependent and independent variables is determined by fitting a line to the data that minimizes the deviations between the line and the data. The mathematical equation for the line is used to estimate the dependent variable for any given set of values contained in the independent variables.

Note: Regress does not work with maps containing categorical information, such as a soil classification map.

Regress            <dependentMap>

With                  <independentMap>

If using more than one independent map, select as needed from the drop-down list. Click Add after each selection.

Del                   Click to delete a highlighted independent map from the command line.

To                    <newTextFile>

For                   <newMap>

The resulting map contains predicted values for the dependent map using the regression equation.

___________________________

Joe, I am working on number 3, and when i get the change in percetn yield map, the "percents" range from -80 to 4160.  Is there really a 4000 percent change?  …my faith in all things computer suggests there really is a 4160% change—check it out  …but be sure to comment on whether you believe it is real or just something to do with a data collection artifact involving small numbers I put the formula in exactly as you had it in the homework.  Thanks, Eric

Eric—yep, that’s the dilemma when working with small numbers as percents.  4000 percent says there is a 40-fold increase in yield which is likely from 1 to 40 bushels …which I suspect is occurring along the edge of the field and is likely “data collection noise” as the harvester moves in and out of the crop it is harvesting.

An interesting extended discussion (captures your interest, right?) would be to use you pointer to find out where this unusually large change occurred, note the two values and manually solve the percent change equation to confirm the calculation.  A follow-on extended discussion could comment on why one might want to get rid of whacko areas in a data (termed “eliminating outliers” in stat-speak) before developing any statistical models.  -Joe

___________________________

Dear Dr. Berry, I have a question about Exercise 9 Question #1. By saying "use the same legend" for the two maps (1997_Fall_P & 1997_Fall K), do you mean use the same intervals for the color ranges for the two maps? Yes I noticed that the 1997_Fall_P map has values ranging from 5-102, and 1997_Fall_K has values ranging from 88-310. The common value range (88-102) for the two maps is very small. No—the combined range to use is 5 to 310 but you might adjust to 0 to 320 and use 16 User Defined intervals of 10 ppm  In 1997_Fall_P only 17 out of 3288 cells is within 88-102 interval, and 1997_Fall_K has only 12 cells fall into the 88-102 range. So, I am not sure whether it would be helpful to use the same ranges for the two maps to make a comparison.

If you're trying to get us to "use the same legend" for the two maps, how do I define the range intervals? Thanks, Qing

3/6 Qing—when visually comparing tow maps the legends should be the same.  That means you need to determine the minimum and maximum for both maps then set up a legend that covers the combined range—from the “minimum” of the two minimums to the “maximum” of the two maximums –a data range that encompasses the individual data ranges on both maps.  In practice it helps to make the range a bit bigger such that the ranges can be set up in sensible steps.  The result is a legend that has the same “color-coding” for the same data steps enabling the viewer to easily “walk” between both maps.

For example, if one map had values from 10 to 75 and the other had values 45 to 90 the combined range would be 10 to 90 but it would make sense to set up the legend from 0 to 100 with steps of 5 (into 20) or 10 (into 10) and set a color inflection at 50 for the color ramp..

Keep in mind that this technique only works for map surfaces that have the same units.  If the maps are of two different variables (apples and oranges) you would need to normalize the mapped data to a common scale and then set a common legend …reasonable “extended discussion” fodder.  -Joe

___________________________

Joe, Sarah and I are having some difficulty converting our point data (towns) into raster data. ArcMap doesn't complete the operation, and spits out a generic error that effectively says that something unknown is wrong.  Do you have any ideas as to why the "point to raster" tool is not working for us? Can you potentially give us any other raster conversion tool to use?  Thanks.  Hope all is well.  Cheers

2/28 Mark and Sarah—yep, the “base map” data preparations is always the difficult step.  By copy of this email, I am asking any of you “ArcGIS-sperts” with vectorßà skills to contact Mark (Mark.Janko@du.edu) or Sarah (smiller07@gmail.com) with your advice.

It sounds simple …simply convert X,Y point locations (in vector Lat/Lon WGS 84, I believe) into a raster map of specified cell size, row/column configuration and geo-positioning.  My ancient experience used the PolyGrid command in AML but I am not sure what the command and specifications are in the current ArcGIS GUI tools.  Thanks, Joe

___________________________

Hi Joe, I am looking at question 2b that says: “Embed a screen grab of your “color-filled” 5-foot contour map with data posting below” …What does with "with data posting" mean?  Thanks!   Eric

Eric—in Surfer you can “post” the original point data in a map display.  In Surfer select Help from the main menu items and search on “Post” to get help on how to post the sample points’ data values to a map surface display.  -Joe

___________________________

2/23 Folks— on the possible Optional Paper front, a popular topic in the past is to compare some of the commands in Grid/Spatial Analyst to corresponding MapCalc commands, such as Costdistance and Spread and Pathdistance and Stream.  Folks in the past were most interested in Spatial Analysis (vs. Spatial Statistics) and had prior experience with ArcGIS.  A cross-reference of the commands is posted at http://www.innovativegis.com/basis/MapCalc/MCcross_ref.htm (provided the class website is up and running).   Joe

___________________________

2/23 Folks— last class I noted that several of you were interested in the Standard Normal Variable (SNV) and other normalization techniques that are useful in pre-processing mapped data before Descriptive and Predictive statistics are employed.  As warm-up for the next two lectures/exercises you might want to add the following to your “readings”…

Normalizing Maps for Data Analysis  describes map normalization and data exchange with other software packages

Comparing Apples and Oranges  describes a Standard Normal Variable (SNV) procedure for normalizing maps for comparison

In addition, the “Compare” command in MapCalc calculates a bunch of comparison statistics between two maps (need to normalize if the data are not in the same units …apples and oranges).

Compare creates a summary table of various comparison statistics between two maps. The comparison table summarizes the percent difference between the two specified maps on a cell-by-cell basis. The statistical indices test for significant differences between the two sets of data.

Example: COMPARE Slope WITH Slope_max TO SSm_compare.txt

Compare <existingMap>

With <anotherMap>

To <newTextFile>

Example Output (Note: SCAN is used first for this example)

SCAN Elevation Average WITHIN 5 FOR Elevation_smoothed

COMPARE Elevation WITH Elevation_smoothed TO Compare_table.txt

…see page 76 of the MapCalc User’s Manual for explanation/interpretation of the statistics generated.   -Joe

___________________________

2/21 – Folks, I am cheerfully reading through the midterms …mostly good so far.  However, there were two of the first-part questions that showed a bit of general confusion—

1) Question comparing Traditional GIS vs. Spatial Analysis and Traditional Statistics vs. Spatial Statistics:

Traditional GIS: involves discrete spatial objects (points, lines, polygons) primarily for geo-query and mapping (inventory focus)

Spatial Analysis: involves continuous map surfaces primarily for analysis of “contextual” spatial patterns and relationships (analysis focus)

Traditional Statistics: involves characterizing non-spatial data to determine the “typical” response (Mean and Stdev) considering the data’s numerical distribution alone

Spatial Statistics: involves characterizing spatial data to determine both the numeric and geographic distributions (maps the Variance) to analyze “numerical” spatial patterns and relationships

2) Question to identify and briefly describe the differences in information contained in the following types of visibility maps:

Net-Weighted Visual Exposure Density Surface

…the viewer map values are assigned positive weights for “pretty” things (beautiful Profile Rock) and negative weights for “ugly” things (unsightly Joe’s Junkyard) such that the sum of the weights indicates the net-weighted arithmetic total.  A negative sign of the net-weighted value identifies locations connected to mostly ugly things; positive, mostly pretty things.  The magnitude of the Net Weighted VEDS indicates how pretty or ugly the overall visual connections are at every map location.

Joe

___________________________

Joe- in reference to Question #4 on Exercise 5, below:  Are you suggesting one selects the Square option within SCAN when you talk about the 3x3 roving window?

Use Scan and the Covertype map to identify the proportion of a roving window (3x3) that has the same cover type (Covertype_proportion map).  Thanks, Pete

2/9 Pete— …yep, 3x3 square window.  That results in 9 cells (center cell and eight surrounding cells).  The Proportion calculations will “note” the number of similar cover type category (map value) cells in the roving window …expressed as a proportion of the total # of cells.  For extended discussion (A stuff), what do you think is the minimum and maximum values that could result within a 3x3 window?  What about the min/max for a 4x4 window?  -Joe

___________________________

Joe-- I am working on question 5 and am getting a little confused with the first bit.  I completed the first command:

Use Scan and the Covertype map to create and display a Covertype_diversity map within 100 meter reach (one cell radius).

Which resulted in the covertype diversity map which has continuous data.  However, when using the reassignment values for the following command:

Use Renumber to isolate the areas of high cover type diversity (Assigning 1 to 3 and 0 to 1 thru 2).

The map looks really weird with multiple colors and I think 9 or 10 ranges.  Using these assignments, it appears to me that anything valued between two and three haven't been incorporated and that is where all the weird colors are coming from.  However, if I "Assign 1 to 2 thru 3, and Assign 0 to 0 to 2" I get a nice binary map that makes me really happy.  is Is it okay to adjust the assigment assignment values or am I missing something important and embarrasingly embarrassingly obvious?  Thanks, Eric

2/9 Eric— I think you are the victim of “default map” display.  In coding Scan, we had to set a default map display when the operation is completed.  Since there are lots of ‘quantitative” options for summarizing the data in the window (e.g., average, StDev, etc.) we decided to set the default display to “continuous.”  Since the diversity summary is simply a count of the number of different types in the window, you have to press the display Data Type button to switch to “discrete.”

…always think about the type of data created (qualitative/quantitative, discrete/continuous) then set the Display Type, Display Form and Display Data Type for the best display—“…rarely is the default map display the best one.”

versus

Default display                                                         Correct display

___________________________

2/8 Pete-- you need to view and reply to my emails in “HTML”, NOT “Plain Text,” because I embed graphics and other stuff requiring the retention of formatting and special fonts/characters/text/links.  When you reply, however, the formatting is automatically stripped and you send only “Plain text” in default Times New Roman 10 font that isn’t very exciting (Aerial, Calibri or Verdana might be a good “new look” for you).

Also, it is professional to set Spell Checking on.  I am not sure how to set the default to HTML and Spell Checking in other email readers (e.g., gmail, DU direct access reader, etc.) but there ought to be a way.  -Joe

From Outlook’s main menu bar, select Toolsà Optionsà Mail Format tabà and set the “Message format” to HTML

___________________________

Hi Dr. Berry- We have a question about the 'Orient' function and the map it displays.  From our perspective, it looks like the areas in pink should be relatively flat; however, as you can see from our included image where azimuth degrees are draped over elevation, that's not the case.  Can you explain why that is?  Thanks!  Kylee + friends

…I am not sure what happened here???

2/8 Kylee—I am not sure what went wrong with your display.  When I entered “ORIENT Elevation Precisely FOR Elev_azimuth” and then displayed the result using 9 User Defined Ranges as shown …

…this is what I get.

…However, close inspection shows a strange “Count” of 0 for the 61 “flat” cells.  Can anyone explain why this “apparent” error occurs?

Joe

___________________________

Ok, Joe, then with reference to question 23 on the midterm study guide.  My answer to the question describe how accumulation surface is used to determine an optimal path between two locations is: accumulation surface values increase continuously as they move from a given starting location out and away toward a destination.  This pattern results in a bowl-like surface where each "steepest downhill line over the surface" represents estimated travel time for every location from the given starting point.  Am I close to answering this question, correctly?  Thanks, Pete

2/7 Pete—you need to mention the three basic steps to Least Cost Path (LCP) routing—1) create a Discrete Cost Map that contains absolute barriers (avoidance) and relative barriers (preferences) to movement; 2) create an Accumulation Cost Surface from a starting location(s) to everywhere; and, 3) identify the steepest downhill path from a desired end point(s) over the Accumulated Cost Surface to delineate the “best” (most preferred) route.

Mentioning the three steps, plus the “continuously increasing” configuration of the Accumulation Cost Surface (“bowl-like” with varying steepness as a function of the intervening absolute/relative barriers) approaches a complete answer.  You also should mention that the steepest downhill path retraces the movements of the wave-front that got the end point first.  -Joe

___________________________

Joe- in reference to question 23 from the midterm study questions, is the term accumulation surface synonymous with the term accumulation cost surface? Thanks, Pete

2/7 Pete—while the terms Accumulation Surface and Accumulation Cost Surface are often interchanged in practice, there is a big distinction …an Accumulation Cost Surface is technically reserved for “Least Cost Path (LCP)” analysis for routing that uses a Discrete Cost Surface to guide the effective distance waves (absolute and relative barriers).  The more general term is simply Accumulation Surface that describes any effective distance map, regardless whether it us used for routing.

For example in Exercise #4, Question 1, the command “SPREAD Roads TO 20 Simply FOR Roads_simpleprox” just created an Accumulation Surface (simple proximity) and didn’t take the analysis any further.  On the other hand, the “Ranch_prox” map you created in Question 2 and coupled with “STREAM Cabin OVER Ranch_prox Simply Steepest Downhill Only FOR Cabin_route” used an Accumulation Cost Surface (Ranch_prox) to route the quickest path between the Ranch and the Cabin.  -Joe

___________________________

Hi Joe— for the Map Analysis "Mini Exercises" are you looking for map output and descriptions of that output or more of a narrative of the steps you have provided? Thanks!  Eric

2/7 Eric—yep, sort of mini-exercises where you “solve” the problem to include your commands and screen grabs of important maps.  For example, check out the A- solution below …all of the solution’s elements are clearly identified and well presented; however, there is ample room for extended discussion.

Joe

__________

Given the base map of Total_customers (smallville.rgs database) create a map that identifies “pockets of high customer density” with over 35 customers within a quarter of a mile (6 cell reach).  Note: use MapCalc to implement and SnagIt to capture your solution and embed below.  Be sure to identify the input maps, processing procedure, and output map with an interpretation of the map values.

 Figure 1a.  3D Grid Input map identifying the number of customers at each grid location forming discrete quantitative data. Figure 1b.  Scan command that summarizes map values within a roving window Figure 1c.  2D Grid Output map determining the total number of customers within .25 miles of each grid location forming continuous quantitative data.

SCAN Total_customers Total IGNORE 0.0 WITHIN 6 CIRCLE FOR Total_customers_within6

The Scan command is a Neighborhood class of map analysis operators that summarizes map values within a “roving window” and assigns the summary value to the center cell of the window.  In this case, the total number of customers within a .25 mile (6-cell) radius is calculated.  The warmer tones in the Output map indicate increasing number of customers within reach from 0 to 92.

 Figure 2a.  3D Lattice Input map identifying the total number of customers within .25 miles of each grid location forming continuous quantitative data (see Figure 1c above). Figure 1b.  Renumber command that isolates areas of interest Figure 1c.  2D Grid Output map that locates areas with more than 35 customers within .25 miles of each grid location forming discrete binary data.

RENUMBER Total_customers_within6 ASSIGNING 0 TO 1 THRU 35  ASSIGNING 1 TO 35 THRU 92  FOR High_pockets

The Renumber command is a Reclassify class of map analysis operators that enables a user to specify new map values for old values or range of values on an existing map.  In this case, a binary map is produced that identifies 0 with low customer levels from 0 to 35, and 1 identifying “pockets of high customer density “from 35 to 92 customers within a .25 mile reach.

___________________________

Joe - I don't get a slope_fitted option under MAP > OVERLAY after I have performed the slope_fitted function.

2/7 Pete— slope isn’t an Overlay operator …the “Neighbors” drop-down contains all of the neighborhood operations.  Select the Slope command to pop-up its dialog box and choose the “fitted” mode for calculating slope.  But I included a few more “Helpful Hints” below that might be useful in preparing your report.  -Joe

Use the Slope command under the “Neighbors” menu button, to create and capture 2D display maps you create of Slope_fitted, Slope_max, Slope_min and Slope_avg by using the appropriate option button.

If you intend to “visually compare” maps (as directed in this question) you must use a consistent legend.  Once you have calculated the four slope maps determine the maximum range of values considering all four calculation techniques and then make the best 2D display using User Defined for the display calculation mode like above …map analysis “rule:” rarely can you use default displays for report figures.

Once you have the “perfect” legend click on the Templates tab in the Shading Manager give it a name (e.g., slope_0to65) and save it.  Then apply the stored legend to all four slope map 2D displays by recalling it in the drop-down “Template Selector.”  Finally “drape” the different slope maps over the Elevation surface (see Email Dialog item on the class website for the last email describing how to drape) and screen-grab each of them for your report.

Be sure your discussion explains the similarities/differences in the four maps of slope and why it is necessary to use a common legend when visually comparing map displays.  -Joe

___________________________

Hi Joe— I have a question about one of my assigned study-guide questions (#37).  What I think you're trying to get us to do is to explain how the spread command calculates simple proximity for point, line, and polygon data.  However, I can distinguish the difference between the point and polygon data (ranches vs. housing).  Where is the ranch data so that I can actually do this command? -Nashwa

2/6 Nashwa-- the question under questions is...

37.          Using the analogy to tossing an object(s) into a pond, describe how a simple proximity map is created for the following MapCalc commands…

SPREAD RanchMap TO 100 for Ranch_Prox

SPREAD HousingMap TO 100 for Housing_Prox

…the ranch is on the Locations map (need to Renumber to isolate it for the RanchMap).  Note that the three different Starter(s) maps contain different map features—a single Point, a set of Points and a set of Lines.

I think your proposed answer has most of the required elements.  But keep in mind, the waves from a single point simply propagates outward; waves from a set of points or lines propagate outward but interact, such the distance to the closest Starter location is retained. The discussions in…

Beyond Mapping III, Topic 25:  Calculating Effective Distance and Connectivity

discusses the basic concepts of distance and proximity

describes how simple proximity is calculated

___________________________

Hi Joe— I have a question about exercise 4, question 4.  When draping the Elev_Smoothed_Difference Map over the 3D surface, I used the Cover function to Cover Elev_Smoothed_Difference over Elevation.  My question is do I want to ignore zero or not.

By ignoring zero the original elevation values are visible depicting a 3D map that is not very distinguishable from the elevation map.  However, maybe that is a good thing since this map is supposed to show areas that have noticeable changes in elevation as determined by the difference between actual values and the average value computed by neighboring cells.  So overall, there aren't too many areas that have major differences between actual and average.

On the other hand, if you want to see where these changes actually occur, it's easier to see in a 3D map that does not ignore zeros.  Am I off base here?  Thanks, Nashwa

2/6 Nashwa-- Draping is a graphical overlay (just makes a cool map), whereas Cover creates a new map (that one could use in further Map Analysis).  In the exercise,

Question 4.  Capture, embed, clearly label the 2D and 3D (draped over Elevation) displays of the maps you created then briefly discuss the procedure you used to create the “Smoothed Difference” and “Coffvar” maps and interpret the meaning of the output map values.

To drape a map over display the 3D map you want to be the “drape” (e.g., Elevation), then select Map from the main menu, Overlay from the drop-down, and choose the map you want to drape (e.g., Slope in the example below; Elev_ElevSmoothed_difference in the Exercise Q4).

Once you have created the enhanced displays, visually interpret what you see …does the result make sense?  -Joe

___________________________

Joe - re ex4: when entering the spread operation in MapCalc from part 1, question 1, the null value is PMAP_NULL.  However, in the command line from the exercise 4 document, PMAP_NULL is omitted.  Will the map results be impacted due to this discrepancy? Thanks, Pete

1/30 Pete-- PMAP_Null is set to "infinity" in MapCalc and is used if the user wants to exclude an area from processing.  For example, if one only wants to process an irregularly shaped town boundary you could create a discrete binary Null_mask (assigning 1 to all town cells and PMAP_Null to the outside cells) that identifies all areas outside of the boundary to the full extent of the rectangular analysis frame.  The PMAP_NULL cells will be ignored during processing and displayed as a blank in the result…

In this case we want to process everything within the rectangular project area so either PMAP_NULL or no value in the ignoring phrase will cause the computer to consider all locations in its processing—what we want.  -Joe

___________________________

1/29 Folks— while grading Exercise # 3, I see where I lead you astray about the differences in Display Type and Display Data Type

-          2D/3D Toggle

Display Form

-          Use Cells

Display Type— Lattice (generally used with continuous quantitative data type) or Grid (generally used with discrete qualitative data type)

-          Layer Mesh

Display Analysis Frame

-          Data Type

Display Data Type (Discrete (choropleth) or Continuous (isopleth) for Geographic distribution; (Qualitative (nominal, ordinal, binary) or Quantitative (interval, ratio) for Numeric distribution

(you must set on your own)

### …Thematic Mapping (#Ranges, Calculation Mode, Color Pallet ramp) for continuous maps and (Color assignments and Labeling) for discrete map you generate.    Note: you can assign labels to discrete maps by clicking on the Category column of the Shading Manager and entering a brief text description.

…hopefully the above revisions are helpful.  Keep in mind that mapped Data Type always has two parts to its specification—the nature of both its Numeric distribution and its Geographic distribution.  -Joe

___________________________

1/28 Folks—below is a simple flowchart identifying the Hugag Habitat binary suitability model that was created in PowerPoint (for the cheap ones among us without Visio, or who want flowcharts a bit more interesting).  The version on the right “soups” it up a bit with SnagIt screen grabs of the maps generated by in the map analysis processing.  For added effect, the map graphics are grouped then assigned “animation” settings so they appear as you press the down arrow to advance through the model logic steps.  You can download and view the two-slide animated PowerPoint from…

If any of you are interested in a practicum on creating “fancy” graphics like these using just SnagIt and PowerPoint, I would be delighted to hold a short workshop on techniques I have learned before class 5:00-6:00pm.  Drop by class early if you are interested …particularly useful for those who are getting ready for thesis or dissertation defense as the procedures are generic to making effective graphics, regardless of whether you use GIS modeling or not.  -Joe