Tuesday, December 12, 2017

Dasymetric Mapping

In the final week we looked at how to apply more information to overcome the limitations of spatial data aggregation and how to dis-aggregate boundaries. In our lab we used Dysametric mapping to go beyond basic areal weighting that assumes that population counts are evenly distributed across an area. We used the water boundaries and clipped them out of the area to eliminate places where people are assumed to not reside, that ancillary information helps to improve the areal weighting results of the data. This is an example of how land cover can be used to overcome the limitations of spatial data aggregation.


In the lab we found the population estimates for high school aged children using census data, but the census boundaries didn't match the school boundaries so we had to using areal weighting to assign population estimates to the school boundaries and eliminated the areas of water features. The actual results given did not match my results, but areal weighting is an estimate, however my results were closer than with areal weighting alone.


This lab was supposed to further help us to understand the concept of impervious surfaces (or land cover) as an ancillary data source to reduce errors in population estimation with spatial incongruity as a topic and a theory. I used the values of the impervious surfaces raster to find areas that were either 50% impervious or 50-100% impervious, assuming a linear relationship between impervious surfaces and population. I used this weighting to adjust the population of kids ages 5-14 in half for areas with 50% or less impervious surfaces and left the values above 50% alone. This gave me a higher error with high estimates overall than with areal weighting and it should have been lower in error, but better weighted averages and more detailed re-classification would help reduce error.


Below is a screen shot of the school boundaries in colored polygons and the census boundaries are in outlines and the kids ages 5 to 14 are in a point layer that I created. With various table and spatial joins, I was able to look up the sum of kids ages 5-14 for each school, as shown in the table below. This areal weighting style is a close enough approximation when more data is either not available or cost/ time prohibits more accuracy.
I later color coded the census outlines and population dots to reflect under or over 50% imperviousness for the last part of the lab.











 

School
Count
Abs(Error)
Estimate
Error
Hagerty
4706
53.25%
7212
 
2506=7212-4706
 
Lake Brantley
6313
2.8
6137
-176
Seminole
11776
5.2
12389
613
Winter Springs
5693
3.6
5896
203
Lyman
7853
12.1
6905
-948
Oviedo
4780
11.9
4209
-571
Lake Howell
8585
26.6
6298
-2287
Lake Mary
5014
13.2
5674
660

Monday, December 4, 2017

Modified Area Unit Problem

This week we worked with MAUP and looked at political districting as a consequence of MAUP. There is an ecological fallacy that comes from assuming the average value for an area is a value that might exist alone, like the average house may be worth $150,000 in a blaock, but no single house will have that exact price necessarily.


Gerrymandering is the manipulation of districting or zoning based on knowledge of maybe voter demographics, for example. It results in biased and unfair districts that may influence voting results.


Gerrymandering can be measured by looking at district compactness and community. I measured compactness with an isoperimetric formula that compares an area to a circle to see how compact it is:
 PERIMETER ( area land + area water)  for an isoperimetric quotient  and created a field in the district table called ISO and sorted descending to get the 10 highest areas of land per total area, I wish I had population density data!

How I measured community:
I made my counties layer hollow and thickened the borders and overlaid it on my layer showing the districts categorized by GEOID and then looked for overlaps in the GEOID color across boundaries, I found several! I also used my multipart selection layers to look at what districts had been broken up to see if they were broken because of crossing a county line, that made sense to then have a split to ID the areas.

Then, I used select by location with the relationship input of “crossed by the outline” of counties to find those districts that cross county borders and then sorted by descending to find the  10 worst . I found 66 using this tool.
Here are a few examples of what this looks like:



Here is poor community, the colored district goes past the county outline.



Below these western states have poor compactness, but this isn't necessarily a sign of gerrymandering since these are lower population areas, but it would be wise to look at more data to decide if these are fair and equal districts.


Here the select by location tool finds districts (in color) that overlap the county boundaries (black outlines) and shows poor community since counties are overlapped and divided, some counties are divided because they are highly populated but stay within the county, this is fine the image in green below shows counties split by a district that overlaps 3 counties, that looks suspect to bias!


Here the select by location tool found the areas colored by GEOID that overlap counties, I found 66 districts that overlap county boundaries this way.

Here I used the multipart to single part tool to reverse search for districts that were broken into multiple parts for no clear reason. The districts in blue look very odd and might indicate a biased district boundary.


Tuesday, November 28, 2017

GIS 4035 Scale and Resolution

This week we looked at how scale and resolution affect area, length, and the overall mapping of places. A coastline that is smoothed will show a shorter length than a high detail, high resolution image with a large scale. Below is an example of water bodies at different resolutions, the light blue is medium resolution at 1:100,000 scale and the dark blue is at 1:200 scale and has much less smoothing and more detail overall.
For our lab, we had 2 rasters at 1 meter resolution and at 90 meter and we were tasked with making a comparison of the two.
I tried raster calculator and batch calculate statistics, and build pyramids and statistics to start my comparison after making sure they were in the same projection.

Wednesday, November 22, 2017

GIS4035 Week 12 OLS and GWR Analysis

This week we ran ordinary least squares and geographically weighted regression in ArcMap. We read about the virtues of GWR and spatial normalization in our discussion. These local methods help account for non-stationarity. GWR is suited for modeling spatially heterogeneous processes. Mapping residuals can be done with a regression model and then you know the residuals for each observation and you can map them to see the spatial auto correlation, like where do things happen? Once a pattern is identified, the results can be fine tuned with a local model. A residual is basically the difference in predicted and observed events.
For the first part of the lab we ran GWR on 911 calls. I identified low education as having a positive correlation to 911 call frequency, here is amp showing that dynamic.

I had to re-do part B a few times since the data extension (add-on) in excel on my remote connection wouldn't give me the data tools, but GWR is interesting because you can take an independent variable, like percentage of renter occupied homes and model the correlation to auto thefts, this allows you to see trends and it's a new way of looking at data.
 
Once I did my excel correlation matrix, I chose the number of black residents per population, renter occupied per housing, and median value of owner occupied homes to relate to auto thefts (rate of incidents per 10,000 population).
Then I ran an OLS and GWR and looked at Moran I's to determine spatial autocorrelation in the standardized OLS and GWR residuals. I made a map below of the effect of % of renters to auto theft and then my regression residuals for the OLS and GWR are in the following maps below.


Overall, I had the same adjusted R squared value for OLS and GWR so they both explain variability equally , I had similar AIC values of 2497.51 (ols) and 2497.96 (GWR). I did have a lower Moran I value for GWR at 0.127 versus the OLS of  0.
462, so the OLS probably had more spatial auto-correlation, and the GWR had a lower spatial auto-correlation in residuals.
The GWR is probably the better tool to use here because auto-theft is not evenly distributed in all areas and having the spatial geography built into the model makes this a better model to use for this data. And, the AIC being slightly higher, means that GWR has a better "goodness of fit", even if it is just by a little bit. It would be interesting to run more independent variables from the data and try to find ones with a higher coefficient than .40, just to see the results, I also suspect that time might have an impact such as at night, before paydays, when the weather is nicer and people are more likely to leave windows in cars down, etc. so a temporal pattern would be interesting to add to the data.


Monday, November 13, 2017

GIS 4035 Lab 10 Week 11 Supervised Image Classification

Last week we did an unsupervised classification in ERDAS and ArcMap, this week we are doing a supervised image classification. We started with a lake within mountains in ERDAS and added an annotation layer. Once you have this "AOI" layer, you can draw polygons on items like roads and add that class to the signature file, then you run a supervised classification and make a distance output file to use later, and then you recode the main file and  open the attribute table to change the class names, you can use the table tab to add area and pick the units, I used square miles.
I had to check my histogram a few times to find areas that were separated to tell which bands had the least spectral confusion, I found bands 1,4, & 6 were the most separated and so I used those, I later changed the colors in ArcMap and added "unique values" for my symbology and used the attribute table to look up the areas in square miles that I had created in ERDAS.
Below is my map, I still had some spectral confusion on grasses and water that caused me some problems and in urban, I tried 2,3,6 and 4,3,2 as well and also had trouble clearly seeing all the features as well as the original image.

Thursday, November 9, 2017

GIS 5935 Regression and OLS

This week we did a regression analysis in both Excel and ArcMap, then we did an OLS (Ordinary Least Squares) model in ArcMap and checked it with the Moran I's spatial autocorrelation tool. We wanted to look at the 6 checks for our OLS Model to make sure the data is relevant, not duplicated (redundant), not biased, and we used various statistics to do this.


The map below shows standard deviation from my best fit model. I ran the OLS tool 3 times to look at 911 calls with the independent variables: median income, density to urban center, alcohol, population density, and low education.
My values were all positive except for distance to urban centers at -0.002, so there is a slight negative relationship where as distance to urban centers goes down, 911 calls goes up.  My P-value was 0.00 so there is some bias. My AIC was 726 and my adjusted R square value was 0.72, so the model explains 72% of the 911 calls, a pretty good fit. When I ran the Moran I tool my data was clustered a bit but my histogram was a perfect bell curve, another good sign. The p-value is a bit concerning, having a value of 0.05 would be better.
My Jarque-Bera test was 7.16, if the value is under 0.01 the model is biased and not normally distributed, so I passed this test. The Koenker statistic was 15.42 and here values under 0.01 show the results are not consistent, so I passed this test, too. My VIF values were 0.000041 to 1.160976 at the highest, this is another good sign because values over 7.5 show redundant variables. I tried to pick variables that I thought might be relevant to an increase in 911 calls but didn't overlap, so I passed this test. My intercept was 21.54 The p-values were the only thing that stood out as a potential problem, but my data passed the other tests and the adjusted R square is a good fit.


Looking at these residual tells you if you can trust your data and if you used good variables and inputs or did you use too few or too many or not the right variables? This helps you both set up and evaluate results from a model when you are predicting or evaluating correlation in ArcMap. These reports are great, you just need to look at the above values and ask how they relate to your map.

GIS Internship Portfolio

This week we created a GIS portfolio, I made mine into a pdf that is available on the I drive  on UWF's Argo App under my name and the subfolder called portfolio.
My portfolio can also be accessed on my main page on Linked in under Education.
This portfolio showcases some maps and map based projects and my current resume. My favorite map is  the Bob-White Transmission line, I did a more in depth analysis on this my first semester, but this map was created using Python script that I wrote! I am really proud of the Superfund sites Map in Casper, as it is my home town and I've always wanted to see where these sites were in relation to residential areas, and using the buffer shows the high impact that these sites have had on Casper as most of the town is within a mile of a site! As residents, we tend to think of these sites as being "on the other end of town" and we dismiss their potential impact, this map really shows the overlap of these areas and our neighborhoods with homes and schools, this is the power of GIS, to see data in a new light with spatial data added to traditional data. This is where patterns, impacts, and greater understanding of an issue come together in an easy to understand map, which is what I love about GIS!




https://www.linkedin.com/in/ingrid-jourgensen-9097bb74/