And yet another post on my continuing series on Kansas election fraud. Why do I keep posting on this? First, there is still a lot of interest in the media. Each time I open up social media, and occasionally when looking at local news stories this story seems to pop up. Second, because no one else is doing due diligence on the numbers, and this type of strategic trend information may be useful in understanding both how our democracy works, and what it takes to win elections. Today I will cover three subjects:
- Data Availability (I have a huge gripe here)
- Sedgwick County Mapping
- Bucketization Analysis
DATA AVAILABILITYSomeone needs to call Shawnee County Kansas and let them know it's freaking 2015. I have a device on my wrist that tracks my steps and sleep, syncs that data to my phone, which then I can dump to a MySQL database via API and analyze my activity level hourly. I analyze a data warehouse with 50+ terabytes of data. I have code that can download tweets, turn text data into numeric analyzable data, and model that data, and return topics, visualizations, and sentiments all in about 20 seconds. I can take precinct results data, join it to geospatial map data (freely available online) and create visualizations of spatial voting patterns. Big data, numbers, are everywhere. Except:
Shawnee County Kansas can't provide me a numeric format of their 2014 by-precinct election results.I have the results from every other county, either in an Excel document from the Secretary of State's office or from a digital online format (SG) or PDF's with text meta data that allows me to easily scrape the underlying data (WY, JO). Yesterday I contacted the Shawnee County elections office to ask for some kind of numeric format (excel, pdf with selectable text, anything) of the precinct data.
No dice. The response I received back was that the only existing form of this data is PDF (no selectable text meta data) or paper. Nothing in excel, nothing analyzable. Yes I know I could OCR the PDF, and I've started doing that, though it's not a high quality PDF, so it produces a lot of errors.
While I don't agree with Beth Clarkson's conclusions, I can see where her and the people who agree with her are coming from. It feels as though the system was not designed to be analyzed after elections.
In my last blog post, I found that Sedgwick County also demonstrated "Clarkson's Correlation" where larger precincts tended more Republican. I wondered if the same visualization technique as applied to Johnson County could be applied to Sedgwick County. The answer was yes.
First, a look at how Sedgwick County voting patterns by precinct. Blue (Davis-favoring) precincts in the center city, while the suburbs and outer-rural areas tend more republican, as expected.
Now on to our overlay of precincts by sizes. There are a lot of 500+ voter precincts in Sedgwick county, but the largest of those are not in the center city, but instead in the suburban ring. This is an area we know to be overall, whiter, more elite, and to lean more Republican than the center city.
All of this is additional complementary evidence to my prior posts on Clarkson's theory, that it is effectively based on a broken a priori notion: That after 500 voters, there should be no correlation between precinct size and % of vote Republican. The specific reason is broken is that the precinct creation was not random, and in fact suburbanization caused the largest of the precincts to be in whiter, richer, and more Republican leaning areas.
But I have only demonstrated this for Sedgwick and Johnson County's, how much do those two counties actually matter?
JOCO, SG, AND BUCKETIZATION
Let's take a deeper look into large precincts. An easy way is to break precincts into buckets by size, and talk about them in this way. Here are the size buckets I am using:
- Regular Precincts: 0-500 voters (Clarkson Ignored These)
- Large Precincts: 500-1000 voters
- Super-Large Precincts: 1000+ voters
So, first, how did Brownback do by each size-grouping of precincts? Here's a chart:
This chart actually backs up Clarkson's correlation. Effectively Brownback did best in regular and super-large precincts. The fact that he did better in super-large precincts than large precincts is the exact correlation that Clarkson is talking about. This is just another validation that the correlation exists.
But how much do suburbanization patterns in JoCo and Sedgwick County matter in this? A lot. A series of pie charts. First, JO/SG make up only 14% of the regular sized precincts.
But they make up almost two thirds of large precincts.
If we look at Clarkson's analysis, over 2/3rds of the sample can be attributed to JoCo or Sedgwick county, where we know that her a priori assertion is broken. Moreover, when we run the correlation on the other 1/3rd we see no correlation. The effect is only observable in urban/suburban counties. Effectively: Sedgwick and Johnson counties are all that matter to the observed correlation. Here's an R output for the other 101 counties:
One quick side note. There's something else that increases correlation when we aggregate results. Because the majority of super large precincts are in Sedgwick County, it gives leverage to some of these precincts. And because all-in Wichita is a more conservative region than Johnson County, that leverage serves to increase the correlation, though due to no nefarious or unexplained phenomena.
- Shawnee County: GET. WITH. THE. PROGRAM.
- Sedgwick County: Though much different than Johnson County, the suburbanization pattern created a similar pattern, the largest precincts are in the suburbs. This pattern subverts Clarkson's a priori assumption of stochastic creation of precincts.
- Bucketization: An interesting illustration of how Brownback did well in very large precincts, which are mostly located in Johnson and Sedgwick Counties.