Big news, Beth Clarkson finally engaged with me in debating her statistical analysis. Oddly enough, the discussion occurred on Esquire.com's forums. Yes, in the forums of a "men's" magazine. Weird.
I would describe Clarkson's argument online as follows:
- a bizarre focus on a null hypothesis test.
- an admission that she hasn't looked that deeply into the statistics that are the basis of a lawsuit.
- a refocus on principles of open government, and that she needs access.
More on the Esquire conversation later, but first, some updates on data.
Sometimes I blog about data issues and don't adequately explain to my lay audience. One twitter commentator has even said I should start a series on basic statistics so people can understand these types of things. Here's an example of my recent commentary that possibly wasn't the best explanation:
One quick side note. There's something else that increases correlation when we aggregate results. Because the majority of super large precincts are in Sedgwick County, it gives leverage to some of these precincts. And because all-in Wichita is a more conservative region than Johnson County, that leverage serves to increase the correlation, though due to no nefarious or unexplained phenomena.The biggest problem here is that this is the concept of leverage, as well as "mix" (ie, the mix of counties) needs to be shown graphically. Luckily, this is easy in the ggplot2 R library. Here is a scatter of Clarkson's correlation with counties color coded. In this you notice the more liberal counties tend to have mid-large precincts (Wyandotte, Douglas, even Johnson county) while more conservative Counties (Sedgwick, Other Rural) make up a propensity of the largest precincts. This enhances Clarkson's correlation when counties are combined, simply due to the mix of counties, not in-county nefarious action by voting machines.
I'm still working on this data. To be honest the OCR worked horribly, and I received no adequate response from the County. I sent a followup email to the Secretary of State's office this morning. I will post their response here.
OTHER WORKI am not the only person working on this. Some of the better conversation regarding this topic has occurred over on DailyKos (I know, I know), with several "diaries" devoted to the subject. One of the better ones recently was by user HudsonValleyMark, exploring the same correlations I looked reviewed. Specifically, his work draws the correlation back to original voter registration data.
What does that mean? It means that party affiliation at registration is also correlated to total number of voters, long before we get to the voting machine. I suggest reading his work, but I also validated his work using Johnson County data in 2004, see chart below, first validating Clarkson's correlation, then replicating HudsonValleyMark's work.
ARGUMENT WITH CLARKSON
Remember my three points from earlier in this post on Beth Clarkson's arguments in the Esquire forum? Let's revisit and go through those one by one. Here's her final response to me, which encompasses all three arguments:
We seem to be in agreement that the null in my case isn't true. I disagree that it invalidates my work because I feel the cause is what is under debate. Your suggestion of assuming a particular prior distribution may or may not be appropriate. I haven't looked at it deeply enough to know for sure. In short, I'm agreeing that you could well be right about that.
That our electronic voting machines are eminately hackable and have no post-election audit procedures in place are established facts and are equally concerning to me. Do you diagree about that aspect? Are you satisfied with assuming a distribution that fits the pattern? Or do you agree that our voting system should be (but isn't) transparent enough for citizens to feel confident that the results are accurate?
Here's my response one by one:
- On the NULL case not being true. I agree with Clarkson that we can "reject the NULL hypothesis", in fact in my first post on the subject (and above in this post) I replicated her results. But all Clarkson is saying here by claiming the null case if false is that she found a non-zero correlation. I agree, there is a non-zero positive correlation, but if we dive deeper why are we testing a null hypothesis? And if we can reject it, have we done the research to say that there aren't reasonable alternate explanations (I have, and there are)? Keep in mind my prior work on this subject, that show demographic and precinct creation reasons create this correlation. In essence rejecting the null hypothesis here is in no way meaningful because it is only testing the false assumption that there should not be a correlation. That has been the point of this blog's work on the subject, that the null hypothesis is irrelevant. For more information on the flaw in null hypothesis testing, see here from Nate Silver.
- On her admission that she hasn't looked deeply into this. She concedes that I may be right. A lot of thoughts here. So essentially she has been threatening lawsuits and doing newspaper interviews over something she hasn't deeply reviewed. She also said earlier in her comments that she hasn't had access to demographic or mapping data. I have been able to compile that data, usually in a matter minutes, whenever I have wanted to look at it. Access to data is easy, and it's the job of a modern statistician or data scientist to acquire it and test your work, due diligence. Effectively here, she admits she's done less work on the subject than I have, and admits I may be right.
- On open government concerns. I have always agreed with her on this concern, on this blog, and publicly, multiple times. I have also offered to help, if I can, should she get access to that data.
Quick summary of what we've talked about in this blog post:
- I gave you a little better view into the "mix" and leverage issues that enhance Clarkson's correlation, though not indicative of Fraud.
- Shawnee County is still living in the data dark ages.
- HudsonValleyMark's work over on DailyKos (which I validated) also demonstrates another way to disprove the Clarkson correlation os related to fraud.
- Finally, in my argument with Clarkson over on Esquire.com, she admits she hasn't looked into this issue deeply, and that my analysis may be correct.