Wednesday, April 8, 2015

Kansas Election Fraud pt. 2

Yesterday's post on election fraud issues in Kansas got quite a bit of response, so I thought I would followup with an additional analysis. Also, R was still open, with my data frame loaded when I got up this morning, so... what the hell.

Oh, and this analysis moves quickly into some fairly technical areas, but I think most of them can be understood in general terms.  If you have any methodological questions please post in the comments.

Yesterday's Analysis

My biggest contention from yesterday's model (which was really my implementation of Clarkson's model) is that underlying, unmeasured demographic terms were likely causing the correlation.  So, my goal here (and over a series of posts in the future, theoretically) is to systematically look at other possibilities.

Also the R-squared metric (a common metric for how good a regression model is) was VERY low (about 0.02).  The model is still significant, it's just, not very predictive.


I only had one additional variable that I could use in my data frame, which was the county each precinct was located in.  Because election results are highly variable by county, and counties are also not homogeneous in demographic factors, county can be used as a proxy for these demographic and regional variations.  

In this case, if demographic terms that vary significantly by county are what are responsible for the precinct size: republican vote share correlation, we would expect that introducing county into the equation would decrease the importance of our precinct size variable.  That's not what happened. 

New Model

Methodology: I created the same model as yesterday, but added in a "fixed effect" for the county of each precinct.  That's it.  And by the way, I shouldn't use the term "fixed effect" because it's confusing and every statistician uses it differently.

Here's a summary of what happened in the models:

  1. The R-squared shot through the roof (this is expected) from 2% to 46%.
  2. The effect (parameter estimate) of the precinct size variable increased increase significantly (not expected).
  3. The statistical significance of the precinct size variable increase significantly (not expected).
R output:

Essentially, I expected the county-based control would decrease the correlation of interest, but it actually increased in importance.  What this generally means, in simple terms, is that the control variables cleared up some of the unexplained variance, and allowed a clearer view of the data, in which precinct size is even more important.

What does this mean?

The weird correlation is still there, and is stronger when we clear up some other exogenous factors. I'll need to dive into additional data to figure out the real root cause.

Also, I thought of an additional possibility, though it's still in the early phases of ideation.  Here are the basics: 
  • There's a potential endogenous relationship between % republican voters and number of voters. 
  • If precincts are formed based on census/geographical tracts, then there's a key intervening variable of turnout.
  • If conservative turnout is statistically "better" then having a more republican district, could cause our independent variable (voters in precinct) to rise significantly.
Still thinking through this one though, any input is appreciated.


  1. Clarkson's claim is that the Republican percentage declines as a function of votes cast, and then increases, and that there is a transitional point around 500. But what if there was no initial decline, or the transition was at a different point than 500. Limiting your analysis to precincts with 500+ votes does not allow you to exclude these possibilities.

    Clarkson used the cumulative GOP percentage to identify a precinct size where the cumulative GOP percentage stops its decline and begins to increase. But this is not necessarily where the GOP percentage in precincts begins to increase.

    This is comparable to the relationship between the deficit and the national debt. The deficit has been declining. That is, the difference between revenue and spending is increasing in value. It is still negative, but the first derivative is positive. The debt continues to increase (becoming more negative). The debt is the cumulative deficit. Not until there is a surplus, will the debt begin to decline. What Clarkson is identifying is equivalent to finding where the debt begins to decline (eg 2034). But the deficit began to improve in 2009. Clarkson's method will place the transitional point to the right (more votes cast) than it really is.

    Clarkson claims the relationship happens everywhere in the USA. If this is true, then it must happen everywhere in Kansas. We can repeat her analysis for individual counties.

    The minimum cumulative percentage varies quite a lot. In Wyandotte County it is at 124. In Sedgwick County it is at 754. Wyandotte County has small precincts (or low 2010 turnout). But there are identifiable demographic patterns. 9 of 10 of the largest precincts in Kansas City are in Ward 14. Only 9 of 111 precincts in the county are outside Kansas City, including 2 of the 3 smallest, 4 of the 19 largest, and 3 of the 89 in the middle.

    In Sherman the cumulative population declines across the entire range of votes cast. This follows the model that Clarkson expects? The reason, the four Goodland wards had the 4 highest votes cast. The large town is about 6% more Democratic leaning than the rest of the county (78% v 84%).

    In Shawnee County, 61 of the 66 smallest precincts, those with 218 or fewer votes are in Topeka. Collectively, the 66 were 42.75% for Brownback. 51 of the 68 middle precincts (219-336) were in Topeka. Collectively they were 50.20% for Brownback. 31 of the 67 largest precincts were in Topeka. Collectively, the 67 were 57.0% were for Brownback.

    In Shawnee County, the precincts with the largest turnout tended to be outside Topeka, and more likely to favor Brownback. Clarkson's assumption that precincts in cities tend to be larger is not true. And in Shawnee County the opposite is true.

    In Riley County, the precincts with the most votes cast tend to be in Manhattan, and the turning point is 498 votes cast.

    In Leavenworth County, 19 of the 22 lowest votes cast precincts are in the city of Leavenworth, while only 5 of the 23 highest votes cast precincts are in the city.

    Lawrence precincts are pretty well distributed among the precincts in Douglas County, but the precincts that were below 20% tended to be smaller. I suspect that these either have a higher black population, or are closer to KU.

    The largest precincts in Butler County are definitely not in El Dorado, and the El Dorado precincts are a shade less Republican. The larger precincts are in the areas closer to Wichita.

    Great Bend in Barton County has 4 wards, each with 3 precincts. If the 12 precincts in Great Bend are ordered by votes cast, there is almost a perfect alignment with the 4 wards. Ward 2 has the three highest, then Ward 3, Ward 1, and Ward 4. The wards with more votes cast tended to be more Republican. If the wards are of equal population, then being more Republican correlates with more votes being cast.

  2. Very interesting story. I didn't find it boring to read. In fact, I really had a lot of fun reading your post. Thanks.


  3. Reading your article is such a privilege. It does inspire me, I hope that you can share more positive thoughts. Visit my site too. The link is posted below.