Tuesday, November 24, 2015

The Different Ways We Talk About Candidates

A couple of months ago I found a website with extremely rich data, an event which usually makes me very happy.  This website didn't have that effect on me.  I was trying to figure out the weight of a specific baseball player, and stumbled upon a database of detailed celebrity body measurements (all women, of course), found here.  Later I found that data included political candidates, and it raised a question in my mind about the different ways we talk about men and women in politics.

Simultaneously, I was looking for a way to measure the presence of certain ideas across the internet.  I can already measure sentiments and topics on twitter, but Twitter is only a portion of the internet, and most people access the internet through Google search when seeking out new information.  Could I write code that would start my text mining operations through Google Search?

THE TEST

(NON-Nerds Skip this)

I had a social idea (how we talk about candidates based on gender) and a coding/statistical concept to test: to mine google search results.  I went forward with a formalized test plan:
  • I would use the google search API to pull results for "Candidate's Name" + Body Measurements.
  • I would capture the data and turn it into mine-able text.
  • I would compare the results of top words, and generally compare them.  (note: rate limits on the Google API as well as some Google restrictions slow me down, in the future I may apply more sophisticated text mining techniques).
I wrote some code pull the Google Search results, the google API only allows us to pull 4 results at a time, so I wrote a loop to pull four at a time.  Here's what that looks like (building step by step for ease of understanding):



DATA RESULTS

So what are the results of googling Candidate Names + Body Measurements?  I googled four candidates, two men, two women.  My observations:
  • Men: The men's results were generally about the campaign, with each returning a few references to BMI (Body Mass Index).
  • Women: The women's results were heavily focused on the size of their bodies.  In fact, the top four words for each women were the same: size, weight, height, and bra.  


This table shows the top 10 words returned for each candidate.  This is obviously on a small sample size (four candidates, only top 44 google results for each) but is interesting nonetheless.  


And because I know everyone likes wordclouds (sigh) I created wordclouds for each candidate at the bottom of this post, below conclusion.

CONCLUSION

Some final takeaways from this analysis:
  • It's definitely possible to use text mine google results in order to find prevalence on the internet.  I probably need to refine my methodology in the future, and obviously implement more sophisticated techniques, but the basic scraping method is complete.  
  • There exists relatively little information on the internet regarding the body measurements of male candidates.  And I really wanted to know Ben Carson's waist to hip ratio!
  • Female candidates are talked about online a lot more in terms of their body.  I'm not an expert in feminist discourse analysis, or even really qualified to give an opinion here, but I have certainly measured a difference in the way candidates are talked about online.




BEN CARSON

HILLARY CLINTON


CARLY FIORINA
BERNIE SANDERS


Friday, November 20, 2015

Corrected Polling Numbers

A few weeks ago I posted a fairly hefty critique of a survey conducted by Fort Hays State University researchers on the political climate in Kansas.  The survey claimed a lot of things, but the issue receiving the most press was that Kansas Governor Brownback had an 18% approval rate.  I took issue with that number for various reasons, largely due demographic skews in the data, hinting at sampling or response bias.

ACTUAL APPROVAL RATE?

Sometime later a twitter user asked me, if not 18%, what do I really think Brownback's approval rating might be.  I looked again at the skews, did some quick math, adjusting for prior demographic distributions and likely errors and came up with a range.  This was me really just trying to back into a number from bad polling data.  Here's my response on twitter:

WARNING: "I TOLD YOU SO" coming. 

This week another survey was published that reviews the approval rate of all governors in the US.  You can find that study here.  I haven't fully vetted the methodology, but the methodology indicates they at least tried to deal with demographic issues.

What did that study tell us?
Brownback's approval rate is 26%.  LOOK THAT'S IN MY RANGE!
But that dataset also provides information on other governor approval ratings, what can those tell us?

COMPARISON TO OTHER GOVERNORS

While I was correct that Brownback's likely approval rate is above 18%, his approval rate is still dismal compared to other governors.  In fact Brownback is 9 percentage points below any other governor, and a huge outlier.  I could bore you with p-values and z-scores (-2.8) and other statistical nerdery, but two charts can easily describe how bad his approval rate is. (Brownback in red)




CONCLUSION

Takeaways bullets:
  • Brownback's approval rate is likely above 18%, closer to 26% (read: I was right).
  • Brownback has the lowest approval rate among US governors.
  • Brownback's approval rating is an extreme low outlier.  

Tuesday, November 3, 2015

Testing Opinion Polls: Do they really measure what they say they do?

**edited 2015-11-05 to include additional demographic information

Generally, I am not a fan of survey research and prefer economic numbers or other data measured not by "calling people and asking them how they feel."  Polls can bring in a lot of bias, not just the normal sampling error that some statisticians are obsessed with measuring and testing against, but also response bias, sampling bias, biases from the way you ask questions etc.  That's not to say that opinion polls and surveys are all worthless (if you want to do one, I know a guy, his name is Ivan, he's great with this stuff).

This is why when developing political models I only partially rely on recent opinion polls, but also heavily weight historic voting trends.  Remember how I used a model with additional data to predict in-margin the Brownback re-election?  (I'll be bringing this up for at least another three years).

A new poll has been making the rounds in Kansas and national media, making claims such as "Obama is more popular in Kansas than Brownback."  Keep in mind that Obama lost Kansas by 21 percentage points in 2012, and Brownback just won re-election in Kansas by about four percentage points.  This is obviously quite a claim, but how seriously should we take it?  Moreover, are there some basic steps we can use to vet how good opinion surveys are.

BACKGROUND: TYPES OF BIAS

So what makes a survey accurate versus inaccurate? The truth is, there are a lot of good ways to mess up a survey.  Here are the general ways surveys are incorrect:
  • Sampling error.  Many statisticians spend a majority of their careers measuring sampling error (this is part of the frequentist versus bayesian debate, and for another post).  Sampling error is the error caused simply by using a sample smaller than the entire population.  Assuming the sample is randomly selected from the population, there will still be a small amount of error.  This is the (+/-)  3% you see in most public opinion poll, though it varies by the size of the sample.
  • Sampling bias. Sampling bias is different than sampling error, though this is an issue that is somewhat difficult to understand.  This is the bias introduced through problems with the process of choosing a random sample of a population.  How does this kind of bias crop up?
    • Bad random number generators.
    • Bad randomization strategies (just choosing top 20% of a list).
    • Bad definition of population (list of phone numbers with people systematically missing)
  • Response Bias.  Response bias is what occurs when certain groups of recipients of a survey respond at different rates than others.  This occurs due to varying "propensity to respond" by demographic or opinion groups within a population.  Examples of how that occurs:
    • Women can be more willing to respond 
    • Older people (retirees) can have more time to respond
    • Minority groups can be less trusting of authorities, and less willing to respond
    • Certain political groups may not trust polling institutions and be less willing to respond
Once again, this is just a starter list of what can go wrong with taking samples within surveys, and I may add to this list as we go, but this is a good primer.

DATA: THE FHSU SURVEY


Let's jump right into study at hand.  The study was conducted by the Docking Institute of Public Affairs at Fort Hays State University.  The study can be found here.

Did the authors in this study consider the error and bias issues?  Absolutely, and they reference it in the study.  Here's a snapshot from their methodology.  



A couple of things from my first section.

  • First, they're referencing a sampling error (3.9%, +/-) for the sample.  That means we know that any number in the survey can be considered accurate within 3.9%.  
  • Second they make a passing reference to response bias, assuming it away.  But how can we test to determine there is no response bias?  Elsewhere in the paper they say that they contacted 1,252 Kansans, and 638 responded.  That means if the 50% that responded are "different" demographically than the 50% that didn't respond, the conclusions of the survey could be misleading.
  • Third there's no reference here to sampling bias, but they de facto address it elsewhere, taking about how they pulled the sample.  The report says: "The survey sample consists of random Kansas landline telephone numbers and cellphone numbers. From September 14th to October 5th, a total of 1,252 Kansas residents were contacted through either landline telephone or cellphone."

Looking at these three sources of potential bias, sampling error is simple math (based on sample size).  Response bias is assumed away by the researchers, and its impossible to know if the list of phone numbers used can create an unbiased sample of Kansans; can we be sure that this sample is an accurate representation of Kansans?

We can never be certain that this is a good sample free of response and sampling bias, but we can do some due diligence to determine if everything looks correct, specifically through fixed values testing.  In essence, there are some numbers that we know about the population through other sources (census data, population metrics, etc) that we can test to make sure the sample matches up.  Let's start with gender.

In the paper on page 39 there's a summary of respondents by gender, both for population and sample.  Keep in mind that the margin of error for this sample is 3.9%, so we would expect gender ratios to fall within this margin.  They do not (5% variance), meaning that the gender differences in this survey can not be attributed to random sampling error.



Also on page 39 is a summary of sample and population by income bracket.  Reported income brackets are a bit fuzzier than reported gender, but the chart below show how those line up.  Because there are multiple categories here, we can't do a simple (3.9% +/-) calculation (technically a binomial test of proportions).  Instead we rely on a test called a chi-squared goodness of fit test to determine if the difference are due to sampling error or an underlying bias.  If the values were statistically similar, we would expect a Chi-Square value of under 14.1.   The test finds that the results exceed the limits of sampling error, and indicate an underlying bias to the results.  


We also have fixed numbers for party affiliation in Kansas per the most recent registration numbers from the Secretary of State.  Those numbers are shown in the below on the left side of the chart, about 45% of Kansans are currently registered as Republican.  On page 39 of the survey we see the reported party affiliations of survey takers.  This analysis is a bit fuzzier because the way people identify doesn't always match their actual party affiliations, but we wouldn't expect that to cause the level of observed deviations in the chart.  As shown below more of the sample respondents responded as unaffiliated, about 12% points fewer Republican, and 5% points fewer democrat.  This also insinuates the sample was significantly less conservative than the registered voters of Kansas.




CONCLUSION


All of the data above speaks to how different the sample was than the general population of Kansas, but what are the takeaways from that?

  • The significant differences in population versus sample demographics undermine the 3.9% margin of error, making it unknown, and potentially much larger.  More concerning, is that the direction of the issues, make it appear that the survey was biased in a way that favored democrats.
  • Significant differences in sample versus population measured values can be indications of other underlying problems with the sample in unmeasured values.  We know that the sample was more female, affluent, left-leaning than the population, could that mean that our sample was biased in a way that made it more urban?  Unknowable with available data, but certainly problematic.
  • The researchers released the paper, with the metrics outside of margin of error, and didn't talk about it.  This is the most troubling part, because in research there are many times that statistical issues like this crop up, but they can be otherwise tested away or quantified in their impact to the margin of error.
My last thought on this:  I agree that Sam Brownback likely has a low approval rate, however the 18% approval rate, as well as other numbers related in the survey are likely an under-statement of his true approval rate, given the bias presented.

**Added 2015-11-05 
After seeing more people cite this survey I realized this survey has been occurring in Kansas since 2010, so I thought I would see if the demographic trends were consistent.  They were consistent, and there were some additional demographics added, specifically: age, race, and education level.  Not going to do a full write-up on these, but they were also significantly inconsistent with population demographics.


Oh and one last thing, I didn't talk about how questions are framed can impact results, and this this survey had one really wonky question in it that I'm not a fan of.  Specifically:

Thinking about what you paid in sales tax, property tax and state income tax together, compared to two years ago, the amount you pay in state taxes has increased, remained the same or decreased?

This question has received some press time, with the byline being that 74% of Kansans now pay more in taxes under Brownback's policies.  Because of the term "Amount" versus "Rate" in the question, I would count myself part of the 74%, but not because of Brownback's policy changes.  I pay more now because I make more money and live in a bigger house, actually an indication of success over the past four years.  I certainly don't think this is what the researchers or the press are purporting to measure.