Thursday, April 9, 2015

Correlation and Spatial Autocorrelation

Part I: Correlation

Figure 1

Figure 2

Looking at the scatter plot (Figure 1) showing the correlation between distance and sound level there is  a clear pattern. Before even looking at the correlation chart it is easy to see there is a strong negative correlation in the scatter plot and use of a trend line makes this relationship easier to see. Interpreting the scatter plot there is a clear reduction in noise the farther away you are from the source. The negative correlation is that as distance increases noise levels decrease. If the pattern wasn't so easy to see on the scatter plot then looking at the correlation chart would be very helpful. A Pearson correlation test was run and that value is what shows the relationship between the two variables. The value resulting from the test is -.896 which shows both the strength and direction of the correlation. The negative value represents the negative relationship I described earlier as seen in the scatter plot: larger the distance the smaller the noise value. The number -.896 is very close to negative 1 so this means that there is a strong to very strong negative linear correlation between the variables. The null hypothesis would be that there is no linear correlation between the sound level and distance from the source. The alternative would be that there is a linear correlation between sound level and distance from the source. When looking at the significance value .000 is less than .05 which means the null hypothesis should be rejected meaning there is a linear correlation between the two variables.


Figure 3
 
According to this chart the white population are very likely to make more money and not be below the poverty line. It shows there is a strong negative linear correlation between white population and below the poverty level. The significance value is below .05 which means that the null hypothesis should be rejected which is that there is no linear correlation between white population and the poverty line. That means there is a liner correlation between white population and the poverty line. For general trends that I see it appears that the white population is better off in basically every category in this chart. They go to high school and graduate, many of them get a college education, and most of them are above the poverty line. When looking at the Black and Hispanic population the chart shows that both are less likely to go to high school, graduate high school, go to college and be above the poverty line. The overall trend is that the white population has a better education and more money.

Part II: Spatial Autocorrelation 

Introduction

For this portion of the assignment we were given data from the Texas Election Commission (TEC) for the 1980 and 2008 presidential elections. The data included was percent Democratic votes for both elections as well as the voter turn out for each of the elections. Data was also needed from the U.S. Census where we downloaded the Hispanic populations at the county level. With the data we were given we are supposed to look at the patterns of the elections and determine if there are clustering of voter patterns as well as voter turnout. They are interested in whether or not the election patterns have changed over the last 20 years. The analysis will be done through use of GeoDa and SPSS two statistical computing programs.

Methods

The first step in the analysis was to bring a shapefile of the counties in Texas into ArcMap that was downloaded from the U.S. Census website. The Hispanic census data was also downloaded. Once this was done, for mapping purposes the voting data had to be joined with the shapefile in ArcMap. Then in order to look at patterns in voter turnout and changes over time weights had to be set so that spatial autocorrelation. This allowed to make map LISA cluster maps in GeoDa to make visual representations of the voting data. Before the maps were made Moran's I was used to created scatter plots of each of the voting data sets. These scatter plots allow us to visualize the patterns or correlation,if there are any, between the different data sets. Below are the results from the spatial autocorrelation and LISA cluster maps.

Results

Figure 4 Scatter plot of Hispanic Population in 2010
Figure 5 LISA map of Hispanic population 2010
Looking at the  Moran's I (Figure 4) in connection with spatial autocorrelation and the Hispanic population in 2010 there is definitely a cluster pattern. The .77 is getting close 1 which means that there are many areas in this case counties in Texas where you have High High or Low Low situations. In other words there are many counties either high or low in Hispanic population next to other counties with the same high or low population. On the map (Figure 5) there are areas of high Hispanic population clustering (red) in the southern part of Texas along the border and there are areas of low Hispanic population clusters(blue) in the north eastern part of the state. The white area of Texas are counties that don't have a large population or small population of Hispanics they are the more mixed counties.
Figure 6 Scatter plot of percentage of Democratic vote in 2008
Figure 7 LISA map of percentage of Democratic vote 2008
Looking at the percentage of Democratic vote in 2008 again there is a pretty high Moran's I value which means there is a decent amount of clustering going on. In this case it would be areas of high democratic vote (red) next to others with high votes and areas of low Democratic vote (blue) next to other areas of low vote. There a few counties of in the lighter colors which mean there is an area of high Democratic vote next to a county of low vote (light red) or the other way around (light blue). The white areas are counties with a mixed vote. We see that the southern part of the map votes primarily Democratic while the northern section tends to be low in the Democratic vote.

Figure 8 Scatter plot of voter turnout in 2008

Figure 9 LISA map of voter turnout 2008
The stat being considered is the voter turnout by county. Looking first at the Moran's I number it shows that there is less clustering of this data going on than in the previous two data sets. There is an interesting pattern I see between this map (Figure 9) and the previous map (Figure 7). In the southern tip of Texas we see a large cluster of low voting turnout on this map and on the previous that very same area had a high percentage of Democratic votes. One possible explanation for this pattern is that this is an area of agriculture and the farmers do not want to drive the distance to a voting location so there is a low turn out but the people who do vote vote Democratic possibly in support of immigration for employees to work on their farms.


Figure 10 Scatter plot of percentage of Democratic vote in 1980

Figure 11 LISA map of percentage of Democratic vote 1980
The next data set is the percentage of Democratic vote in 1980. According to the Moran's I test there was more clustering and spatial autocorrelation taking place in connection with Democratic vote for the 1980 election than there was for the 2008 election. One possible reason for this is the changing view of people on who they want in office and what values they should have. This change over time leads to more counties that are mix in the votes and neither high or low Democratic vote. Comparing figure 11 and figure 7 we see that the areas of clustering are pretty similar. The high Democratic votes are in the south especially the tip of Texas. The low Democratic votes are in the northern part of the state, they have moved eastward a bit however. The areas of high Democratic vote have moved out of the eastern part of the state from 1980 towards the west and the boarder by 2008.

Figure 12 Scatter plot of voter turnout in 1980
Figure 12 LISA map of voter turnout 1980

Finally the results of the 1980 voter turnout are as shown in figures 11 and 12. Comparing figure 12 in 1980 to figure 9 in 2008 there is more clustering happening in 1980 than in 2008. The southern tip stays consistent with very low turn out values and the northern part of the state has a high turnout in both elections but less clustering in 2008. The central part of the state has areas of high turnout in 1980 and even more so in 2008.

Conclusion

Based on the LISA maps and the scatter plots created to explore the above variables a couple of patterns were observed. First looking at the the Hispanic population in 2010 (Figure 5) it is very obvious that in both elections 1980 (Figure 11) and 2008 (Figure 7) that areas of high Hispanic occupancy match up very well with areas of high Democratic vote. This area is mainly done in the southern tip of Texas which and could all be fueled by agriculture. The more farms there are the more jobs are available and which attracts the Hispanic population who need to support their family so you get a cluster of high Hispanic population. The farmers in these areas tend to vote Democratic possibly to keep their workers at the farms and keep their cheap labor. Another pattern that makes sense is a low voter turn out in this same area for both elections. These counties are full of migrant workers who can not vote or choose not to. So even though these counties may have lots of people in them the voter turnout will be low. Overall it could be said that the concentration of Hispanic population in Texas is much more concentrated in the southern part of the the state than the north.
The clustering patterns for both elections are fairly similar with low voter turn out in the south higher turnouts in the middle and northern part of the state. The Democratic clustering is mostly in the south as well where the north has low Democratic clustering. Looking at all three of the variables together there seems to be a overall pattern emerging. That pattern is that the higher the Hispanic population there is in an area the more Democratically supportive the area is and the lower the voter turnout will be (Figure 13). So voter turnout and Hispanic population have a negative correlation or as the Hispanic population increases the number of voters decreases (figure 14).  With the Democratic vote we see a positive correlation where, as the Hispanic population increases so does the percent Democratic vote (Figure 15). In figures 13 through 15 we would reject the null and state that there is a linear correlation between the variables. Overall the clustering patterns did not change very much between the two elections. They moved slightly and had slightly higher or lower Moran's I numbers but for the most part the two elections show similar clustering patter of voter turnout and Democratic vote.
Figure 13 Democratic vote vs voter turnout 2008


Figure 14 Hispanic population vs voter turnout 2008

Figure 15 Hispanic population vs Democratic vote 2008

No comments:

Post a Comment