If you're new to this blog or my posts on election fraud, my two prior posts serve as a good primer, found here and here.
A quick thanks to one commenter on my blog, jimrtex. This commenter provided great commentary on prior results, and thick description of underlying precinct design factors that could descend from demographic/historical structures that cause our found correlation. I recommend that you use his comments as background on to the new variables I defined below.
This post may get a little nerdy in a second, and while I recognize some of us live in the world of regression coefficients, I also understand that others just want to know what I found. Here's a short synopsis:
- I changed data sets to the 2008 Presidential election. This is valid because the correlation of interest (republicans doing better in larger precincts) still holds up.
- Nearly every demographic covariate I threw at the equation was statistically significant and more important than the "number of voters" variable.
- If I create a large predictive model, using other variables such as population density, county size, and relationships between local precinct, the number of voters voting in the precinct become statistically insignificant.
Ok, those points may still seem a little technical, but here's what they mean: The original analysis was looking at a very small relationship in a world where much more important relationships exist. And if we look at the data in a way where we simultaneously account for multiple factors, the correlation from Clarkson's original commentary is simply non-existent.
And now on to the technical analysis. I switched to the 2008 presidential election in Kansas because I found a file with additional demographic attributes--the base relationship still holds up. I also added some attributes that compared the attributes of individual precinct to that of their county of membership as a whole. Here's a list of variables defined:
- pres_perc_r: Precent voting Republican for President.
- t_08: total votes in 2008 presidential elections
- perc_voting: percent of voting age population voting in election
- aland10: land area of the precinct
- perc_vote_age: percent of precinct that is voting age
- area_to_county: relationship of precinct size in area to average precinct in county
- pop_county: population of the county of membership
- change: population change of precinct between elections
- perc_change: change as percent of initial population
- pop_dense: density of population
- perc_county: percent change growth in relation to other precincts in county
So, first things first, Clarkson's initial simple correlation, did it hold up? Absolutely, and here's the evidence.
But what else correlates to the % that votes Republican in an election? A lot of our variables, it turns out. Here's a correlation matrix. And notice that size of precinct is actually the lowest absolute correlation value. Also of note, many of them are cross-correlated with size of precinct, pointing towards multi-covariation.
So, what happens if I throw some of these variables that a priori make sense at a model? Number of voters in the precinct is not longer significant, but other variables end up highly significant. One variable of note is the percent voting: This one is important, because as the percent of voters voting increases, the percent Republican increases significantly. This is partial verification of a prior concern I had regarding higher turnout in Republican districts being the underlying cause of Clarkson's correlation.
ConclusionStatistics: I spent quite a bit of effort here demonstrating that the small correlation found by Clarkson and previous authors is most likely due to other correlated variables. These variables generally measure demographic factors and precinct design concerns (and correlate conceptually with the ideas from commenters on this blog and elsewhere).
Politics: Given the statistics of this, it is still of concern to me that a statistician goes to the media with an anomaly that is almost completely untested, and (the way it was reported in the media) can lead to massive accusations of fraud. Given the nature of our electoral democracy, this has a tendency to call the entire system into question, and is certainly a reminder for all statisticians to be very careful in reporting results.