Wednesday, September 2, 2015

Kansas Election Fraud: Part 6 Sedgwick County Suburbs

And yet another post on my continuing series on Kansas election fraud.  Why do I keep posting on this?  First, there is still a lot of interest in the media.  Each time I open up social media, and occasionally when looking at local news stories this story seems to pop up. Second, because no one else is doing due diligence on the numbers, and this type of strategic trend information may be useful in understanding both how our democracy works, and what it takes to win elections.  Today I will cover three subjects:
  • Data Availability (I have a huge gripe here)
  • Sedgwick County Mapping
  • Bucketization Analysis


Someone needs to call Shawnee County Kansas and let them know it's freaking 2015.  I have a device on my wrist that tracks my steps and sleep, syncs that data to my phone, which then I can dump to a MySQL database via API and analyze my activity level hourly.  I analyze a data warehouse with 50+ terabytes of data.  I have code that can download tweets, turn text data into numeric analyzable data, and model that data, and return topics, visualizations, and sentiments all in about 20 seconds.  I can take precinct results data, join it to geospatial map data (freely available online) and create visualizations of spatial voting patterns.  Big data, numbers, are everywhere.  Except:

Shawnee County Kansas can't provide me a numeric format of their 2014 by-precinct election results.  
I have the results from every other county, either in an Excel document from the Secretary of State's office or from a digital online format (SG) or PDF's with text meta data that allows me to easily scrape the underlying data (WY, JO). Yesterday I contacted the Shawnee County elections office to ask for some kind of numeric format (excel, pdf with selectable text, anything) of the precinct data.

No dice.  The response I received back was that the only existing form of this data is PDF (no selectable text meta data) or paper. Nothing in excel, nothing analyzable.  Yes I know I could OCR the PDF, and I've started doing that, though it's not a high quality PDF, so it produces a lot of errors.

While I don't agree with Beth Clarkson's conclusions, I can see where her and the people who agree with her are coming from.  It feels as though the system was not designed to be analyzed after elections.  


In my last blog post, I found that Sedgwick County also demonstrated "Clarkson's Correlation" where larger precincts tended more Republican.  I wondered if the same visualization technique as applied to Johnson County could be applied to Sedgwick County.  The answer was yes. 

First, a look at how Sedgwick County voting patterns by precinct.  Blue (Davis-favoring) precincts in the center city, while the suburbs and outer-rural areas tend more republican, as expected.  

Now on to our overlay of precincts by sizes. There are a lot of 500+ voter precincts in Sedgwick county, but the largest of those are not in the center city, but instead in the suburban ring.  This is an area we know to be overall, whiter, more elite, and to lean more Republican than the center city.

All of this is additional complementary evidence to my prior posts on Clarkson's theory, that it is effectively based on a broken a priori notion: That after 500 voters, there should be no correlation between precinct size and % of vote Republican.  The specific reason is broken is that the precinct creation was not random, and in fact suburbanization caused the largest of the precincts to be in whiter, richer, and more Republican leaning areas.  

But I have only demonstrated this for Sedgwick and Johnson County's, how much do those two counties actually matter?


Let's take a deeper look into large precincts.  An easy way is to break precincts into buckets by size, and talk about them in this way.  Here are the size buckets I am using:

  • Regular Precincts: 0-500 voters (Clarkson Ignored These)
  • Large Precincts: 500-1000 voters
  • Super-Large Precincts: 1000+ voters

So, first, how did Brownback do by each size-grouping of precincts?  Here's a chart:

This chart actually backs up Clarkson's correlation.  Effectively Brownback did best in regular and super-large precincts.  The fact that he did better in super-large precincts than large precincts is the exact correlation that Clarkson is talking about.  This is just another validation that the correlation exists.

But how much do suburbanization patterns in JoCo and Sedgwick County matter in this?  A lot.  A series of pie charts.  First, JO/SG make up only 14% of the regular sized precincts.
But they make up almost two thirds of large precincts.  
 And they make up 97% of super-large precincts, with 66% of those being in Sedgwick county.

If we look at Clarkson's analysis, over 2/3rds of the sample can be attributed to JoCo or Sedgwick county, where we know that her a priori assertion is broken.  Moreover, when we run the correlation on the other 1/3rd we see no correlation.  The effect is only observable in urban/suburban counties.  Effectively: Sedgwick and Johnson counties are all that matter to the observed correlation.  Here's an R output for the other 101 counties:

One quick side note.  There's something else that increases correlation when we aggregate results.  Because the majority of super large precincts are in Sedgwick County, it gives leverage to some of these precincts.  And because all-in Wichita is a more conservative region than Johnson County, that leverage serves to increase the correlation, though due to no nefarious or unexplained phenomena.  


  • Shawnee County: GET. WITH. THE. PROGRAM.
  • Sedgwick County: Though much different than Johnson County, the suburbanization pattern created a similar pattern, the largest precincts are in the suburbs. This pattern subverts Clarkson's a priori assumption of stochastic creation of precincts.  
  • Bucketization: An interesting illustration of how Brownback did well in very large precincts, which are mostly located in Johnson and Sedgwick Counties.  


  1. Fabulous work, Levi!

    I have been obsessing over this today, generating the same sorts of analysis, then found your series of posts here when I was searching google for other work on correlation between precinct size and percent republican votes. Your two maps above (along with the scatter plots in your previous posts showing percent republican versus precinct size) perfectly show that precinct size does not have much to do with the vote demographics. Larger precincts happen to go republican because many urban precincts do not have a large number of voters, maybe they are split up into smaller groups because the polling places can accommodate fewer voters, who knows.

    First, I found this paper from G.F. Webb at Vanderbilt ( that shows a correlation between precinct size and party votes, but your work does a much better job at graphically exploring the cause and not just the correlation.

    Before I found all of your awesome plots, I explored something that I'd love to see you also tackle, maybe in more depth than I have time to - Exit Polls and Age Demographics! I was only able to explore this by county but maybe you can find precinct level data for this. I started with Sedgwick to see if Beth Clarkson would find anything fishy. Follow me here:

    First, I found some great data at CBS News showing exit poll data with various demographics.

    I was looking at the 2014 senate race since that was what Clarkson was looking at. After seeing a CBS graph of republican and democrat votes by age group (select State Results and click 'Age' on the exit poll plot for senator), my hypothesis was that the vote in a given county was driven primarily by age demographics, at least at the county level. The CBS exit poll data has a ton of other correlations as well, such as income, sex, education, income, approval ratings, etc., but I saw a very strong age correlation. A lot of older people turned out to vote. ages 18-29=39%Roberts, 57%Orman, 4%Batson; 30-44=46%R,48%O,7%B; 45-64=56,40,4; 65+=62,36,2. Make sense?

    So, second, I sought out the Sedgwick county age demographics data and found some here:

    I broke Sedgwick age demographics up into the same four groups provided by the CBS exit polls, the only one I had to estimate was the lowest group since the data for Sedgwick was for 15-19 and I only needed 18-19. One assumption I make here is that the percent of registered voters per capita is the same for all age demographics, so that will add some error to this estimate. 18-29=23.74%; 30-34=26%; 45-64=34.6%; 65+=15.66%.

    Third, I combined the exit poll vote demographics with Sedgwick actual demographics to predict the Roberts, Orman, Batson vote distribution for Sedgwick. I get 18-29=39%*23.74%=9.3%; 30-44=46%*26%=12%; 45-64=56%*34.6%=19.4%; and 65+=62%*15.66%=9.7%;

    When you add that all up, you get the age-weighted prediction that Sedgwick should go 9.3%+12%+19.4%+9.7%=50.3% for Roberts. Election results show Roberts got 51.1%, which is darn close to this very roughly estimated prediction. The same method predicts the other two candidates to within one percent as well.

    I also made Beth Clarkson’s cumulative republican vote percentage plot by County votes instead of precinct votes and compared it with County registration data, ordered in the same manner, and the cumulative vote percentage trends look very closely matched.

    1: Demographics explain the trends in cumulative vote percentages no matter how you slice them.
    2: The registration data can be used to predict the shapes to be expected when such cumulative plots are made.

    If you get a chance, I’d love to see you work in the Exit Poll and Age Demographic data in another post.

    Thanks for reading: Matt Harris

    1. The bucketized age-weighted predictions are fascinating, in that that data likely subsumes other demographic factors (urban/rural,racial,gender). I'm not really much of a demographer but I may explore in more detail, especially the age buckets and their ability to predict on their own (what's r-squared for only that mode). I've looked at mean/median income before, but obviously that metric "loses" a lot of information in transformation.

      When I come back to this issue, I will certainly take a look at that.

  2. By the way, the page has by-precinct voting results for the 2014 general and primary U.S. Senate elections, I don't know if you saw that, but Shawnee is there, too. This was the race Beth Clarkson was analyzing.

    Since yesterday, I was able to find by-precinct age and party registration demographics here (stupid .pdf format, but it's data):

    Happy data science-ing. -Matt

    1. Yes, Some of my earlier work was on the Senate general election, and that was the one where I first *verified* Clarkson's result. It's a bit amazing to me that data went online months ago, and the complete governor's election data isn't yet available in a singular place. I'm thinking maybe different federal reporting requirements? I'm not sure, but it's certainly weird.