Monday, April 20, 2015

Kansas Election Fraud: Pt. 3 ... The end?

My prior posts on Kansas election fraud continue to generate quite a bit of traffic, so I thought I should probably revisit the issue and bring some kind of conclusion.  My prior posts left quite a few open questions, and I had a curiosity about how well other metrics may predict voting outcome by precinct.

If you're new to this blog or my posts on election fraud, my two prior posts serve as a good primer, found here and here.

A quick thanks to one commenter on my blog, jimrtex.  This commenter provided great commentary on prior results, and thick description of underlying precinct design factors that could descend from demographic/historical structures that cause our found correlation.  I recommend that you use his comments as background on to the new variables I defined below.

Summary Results


This post may get a little nerdy in a second, and while I recognize some of us live in the world of regression coefficients, I also understand that others just want to know what I found.  Here's a short synopsis:
  1. I changed data sets to the 2008 Presidential election.  This is valid because the correlation of interest (republicans doing better in larger precincts) still holds up.
  2. Nearly every demographic covariate I threw at the equation was statistically significant and more important than the "number of voters" variable.
  3. If I create a large predictive model, using other variables such as population density, county size, and relationships between local precinct, the number of voters voting in the precinct become statistically insignificant. 

Ok, those points may still seem a little technical, but here's what they mean: The original analysis was looking at a very small relationship in a world where much more important relationships exist.  And if we look at the data in a way where we simultaneously account for multiple factors, the correlation from Clarkson's original commentary is simply non-existent.


Nerdiness


And now on to the technical analysis.  I switched to the 2008 presidential election in Kansas because I found a file with additional demographic attributes--the base relationship still holds up.  I also added some attributes that compared the attributes of individual precinct to that of their county of membership as a whole.  Here's a list of variables defined:

  • pres_perc_r: Precent voting Republican for President.
  • t_08: total votes in 2008 presidential elections
  • perc_voting: percent of voting age population voting in election
  • aland10: land area of the precinct
  • perc_vote_age: percent of precinct that is voting age
  • area_to_county: relationship of precinct size in area to average precinct in county
  • pop_county: population of the county of membership
  • change: population change of precinct between elections
  • perc_change: change as percent of initial population
  • pop_dense: density of population
  • perc_county: percent change growth in relation to other precincts in county
I won't delve deeply into a priori theory on each variable, but, generally speaking the variables were designed to measure underlying demographic factors, as well as precinct design concerns as brought up previously by commenters on this blog.


So, first things first, Clarkson's initial simple correlation, did it hold up?  Absolutely, and here's the evidence.


But what else correlates to the % that votes Republican in an election?  A lot of our variables, it turns out.  Here's a correlation matrix.  And notice that size of precinct is actually the lowest absolute correlation value.  Also of note, many of them are cross-correlated with size of precinct, pointing towards multi-covariation.


 So, what happens if I throw some of these variables that a priori make sense at a model?  Number of voters in the precinct is not longer significant, but other variables end up highly significant. One variable of note is the percent voting:  This one is important, because as the percent of voters voting increases, the percent Republican increases significantly.  This is partial verification of a prior concern I had regarding higher turnout in Republican districts being the underlying cause of Clarkson's correlation.


Conclusion

Statistics: I spent quite a bit of effort here demonstrating that the small correlation found by Clarkson and previous authors is most likely due to other correlated variables.  These variables generally measure demographic factors and precinct design concerns (and correlate conceptually with the ideas from commenters on this blog and elsewhere).

Politics: Given the statistics of this, it is still of concern to me that a statistician goes to the media with an anomaly that is almost completely untested, and (the way it was reported in the media) can lead to massive accusations of fraud.  Given the nature of our electoral democracy, this has a tendency to call the entire system into question, and is certainly a reminder for all statisticians to be very careful in reporting results.

6 comments:

  1. It does call the entire system into question. It casts doubt onto two brands of models of voting machines. The same brands and models of voting machines are producing statistically-significantly different results than other voting machine and/or paper ballots. She looked at several elections in different states and in different years. Maybe these models of voting machines do not work right for totally innocent reasons, but whatever the reason, if they are counting the votes wrong, every single election comes into question - and they should be replaced.

    ReplyDelete
    Replies
    1. Thank you for you comment. The point of this generally speaking is that there isn't good statistical evidence that the machines are working incorrectly. The evidence, is that there is an underlying correlation, though after we recognize the world is complex and creation/existence of precincts is not a random,stochastic process, we see that correlation completely disappear.

      Delete
  2. The statistician would not have gone to the media if the state of Kansas had allowed her to access the data she needs to do her research. You can blame Kris Kobach for the fact that this made the news.

    If there is no foul play here then those who oversee elections should welcome the scrutiny. The data won't lie. If she gets access to the tapes, and her audit checks out, we can all have more confidence in using electronic voting machines (which many have demonstrated are vulnerable).

    ReplyDelete
    Replies
    1. I fully agree with you that she should get access, and that will be the final decider. My analysis neither concludes that the voting machines MUST be working correctly, nor if they are working incorrectly that the correlation she found is even evidence.

      Delete
  3. I did an analysis comparing registration statistics and turnout for Sedgwick County.

    Kansas election geography is complex due to a requirement that voting precincts must be contiguous, and respect city, township, legislative district, and other electoral boundaries. Annexations by Wichita and other cities have chopped up many townships, and legislative districts may force divisions. Of Sedgwick County's 440 precincts, 130 have no registered voters, 199 have less than 50. To reduce the change of vote disclosure, 100 precincts were consolidated into 40 consolidated precincts, where they shared a polling place and a common ballot style. This produces a total of 250 precincts which have individual vote tallies and were used in my analysis (there were 251 reported, but one had no registered voters and no votes cast). Despite the consolidations, 17 of the 250 had less than than 50 registered voters. These were generally do to chopped up areas. Only Erie township in the extreme southwestern corner has a registration (52) that indicates a truly rural area. While statewide, small precinct may indicate rural, in Sedgwick County it indicates fringe of expanding city, leaving township fragments.

    Sedgwick County used 64 polling places, but the vote totals were separately reported for the 250 consolidated precincts.

    Sedgwick County reported registered voters for the consolidated precincts for the 2014 election, but did not include a party breakout. A May 22, 2015 report does include party information for the unconsolidated precincts, and that is what I used, after summing the data to match the consolidated precincts for 2015.

    Correlation between the 2015 registration data and the 2014 registration data (total numbers) is 0.999. Slope of least squares fit of 2015 registration v 2014 registration is 0.9796, and total 2015 registration is 98.11% of total 2014 registration. Conclusion: the 2015 registration is a reasonable facsimile of registration in 2014, including party distribution.

    We can sort the precincts by 2015 registration, smallest to largest, and calculate the cumulative Republican registration percentage. This is similar to the test Clarkson performed, but with some differences.

    Clarkson: Ordered by votes cast; Cumulative vote percentage for Republican candidate;
    Jimrtex: Ordered by registered voters; cumulative Republican registration percentage.

    The cumulative Republican registration percentage reaches close to its final value and stays there, only rising and falling, based on whether there is a run of Democratic precincts. For the 210 largest precincts, the mean Republican registration is 40.6%, while the median is 43.2%. Democratic leaning precincts tend to be further below the mean, than Republican precincts are above the mean, but there are fewer of them. Upticks will be more common than downticks, but the downticks will tend to be larger.

    That is there does not appear to be any relationship between total registration, and Republican registration percentage. Correlation for the 210 largest precincts is -0.034.

    ReplyDelete
  4. ... continuing ...

    I then added the number of votes cast in 2014. This is not specific to any race, but simply the total number of voters. I calculated turnout as votes cast/2015 registration. Turnout varied greatly and is strongly correlated with the Republican registration share.

    Correlation between turnout percentage and Republican registration for the 210 largest precincts is 0.885. For all 250 precincts it is 0.657, so there is merit in Clarkson dropping smaller precincts, but the 500 cutoff is too aggressive. Correlation for the 196 and 146, largest precincts is 0.891 and 0.884, respectively.

    A least square fit for the 210 largest precincts, for turnout v Republican registration percentage gives a slope of 67.6% and an intercept of 25.5%. For every 1% increase in Republican registration share, there is a 0.676% increase in turnout. A precinct with 0% Republicans would be projected to have 25.5% turnout. A precinct with 100% Republicans would be projected to have 93.1% turnout.

    The strong relationship between Republican vote share and turnout has the effect of shifting Republican leaning precincts to the right in Clarkson's analysis. A precinct with 1000 voters and 40% turnout will have 400 voters. With a 60% turnout it will have 600 voters, and they will be more Republican.

    If we then order the precincts by turnout, but continue to calculate the cumulative Republican registration, it will rise as we add in larger precincts. The cumulative Republican registration percentage hits a minimum of 34.1% after we added in a precinct that is only 16.8% Republican. But turnout was only 24.9%. Only 365 of the 1466 voters showed up. Pat Roberts go 35.6% of the vote.

    The cumulative Republican percentage increased as we added in precincts with more votes cast.

    The simple conclusion is that in Sedgwick County, Republicans were likely to show up to vote.

    An alternative explanation is that the voter rolls were salted with Republican "voters". If this were the case, then Clarkson's proposed audit won't show anything, unless the machines were programmed to simply add in Republican votes at the start of the day.

    But what if the number of voters who signed the voting roll matches the number of votes cast, and the names signed on the voting roll were actual registered voters (particularly Republican), and they actual live at the address in the registration record?

    ReplyDelete