Tuesday, November 3, 2015

Testing Opinion Polls: Do they really measure what they say they do?

**edited 2015-11-05 to include additional demographic information

Generally, I am not a fan of survey research and prefer economic numbers or other data measured not by "calling people and asking them how they feel."  Polls can bring in a lot of bias, not just the normal sampling error that some statisticians are obsessed with measuring and testing against, but also response bias, sampling bias, biases from the way you ask questions etc.  That's not to say that opinion polls and surveys are all worthless (if you want to do one, I know a guy, his name is Ivan, he's great with this stuff).

This is why when developing political models I only partially rely on recent opinion polls, but also heavily weight historic voting trends.  Remember how I used a model with additional data to predict in-margin the Brownback re-election?  (I'll be bringing this up for at least another three years).

A new poll has been making the rounds in Kansas and national media, making claims such as "Obama is more popular in Kansas than Brownback."  Keep in mind that Obama lost Kansas by 21 percentage points in 2012, and Brownback just won re-election in Kansas by about four percentage points.  This is obviously quite a claim, but how seriously should we take it?  Moreover, are there some basic steps we can use to vet how good opinion surveys are.


So what makes a survey accurate versus inaccurate? The truth is, there are a lot of good ways to mess up a survey.  Here are the general ways surveys are incorrect:
  • Sampling error.  Many statisticians spend a majority of their careers measuring sampling error (this is part of the frequentist versus bayesian debate, and for another post).  Sampling error is the error caused simply by using a sample smaller than the entire population.  Assuming the sample is randomly selected from the population, there will still be a small amount of error.  This is the (+/-)  3% you see in most public opinion poll, though it varies by the size of the sample.
  • Sampling bias. Sampling bias is different than sampling error, though this is an issue that is somewhat difficult to understand.  This is the bias introduced through problems with the process of choosing a random sample of a population.  How does this kind of bias crop up?
    • Bad random number generators.
    • Bad randomization strategies (just choosing top 20% of a list).
    • Bad definition of population (list of phone numbers with people systematically missing)
  • Response Bias.  Response bias is what occurs when certain groups of recipients of a survey respond at different rates than others.  This occurs due to varying "propensity to respond" by demographic or opinion groups within a population.  Examples of how that occurs:
    • Women can be more willing to respond 
    • Older people (retirees) can have more time to respond
    • Minority groups can be less trusting of authorities, and less willing to respond
    • Certain political groups may not trust polling institutions and be less willing to respond
Once again, this is just a starter list of what can go wrong with taking samples within surveys, and I may add to this list as we go, but this is a good primer.


Let's jump right into study at hand.  The study was conducted by the Docking Institute of Public Affairs at Fort Hays State University.  The study can be found here.

Did the authors in this study consider the error and bias issues?  Absolutely, and they reference it in the study.  Here's a snapshot from their methodology.  

A couple of things from my first section.

  • First, they're referencing a sampling error (3.9%, +/-) for the sample.  That means we know that any number in the survey can be considered accurate within 3.9%.  
  • Second they make a passing reference to response bias, assuming it away.  But how can we test to determine there is no response bias?  Elsewhere in the paper they say that they contacted 1,252 Kansans, and 638 responded.  That means if the 50% that responded are "different" demographically than the 50% that didn't respond, the conclusions of the survey could be misleading.
  • Third there's no reference here to sampling bias, but they de facto address it elsewhere, taking about how they pulled the sample.  The report says: "The survey sample consists of random Kansas landline telephone numbers and cellphone numbers. From September 14th to October 5th, a total of 1,252 Kansas residents were contacted through either landline telephone or cellphone."

Looking at these three sources of potential bias, sampling error is simple math (based on sample size).  Response bias is assumed away by the researchers, and its impossible to know if the list of phone numbers used can create an unbiased sample of Kansans; can we be sure that this sample is an accurate representation of Kansans?

We can never be certain that this is a good sample free of response and sampling bias, but we can do some due diligence to determine if everything looks correct, specifically through fixed values testing.  In essence, there are some numbers that we know about the population through other sources (census data, population metrics, etc) that we can test to make sure the sample matches up.  Let's start with gender.

In the paper on page 39 there's a summary of respondents by gender, both for population and sample.  Keep in mind that the margin of error for this sample is 3.9%, so we would expect gender ratios to fall within this margin.  They do not (5% variance), meaning that the gender differences in this survey can not be attributed to random sampling error.

Also on page 39 is a summary of sample and population by income bracket.  Reported income brackets are a bit fuzzier than reported gender, but the chart below show how those line up.  Because there are multiple categories here, we can't do a simple (3.9% +/-) calculation (technically a binomial test of proportions).  Instead we rely on a test called a chi-squared goodness of fit test to determine if the difference are due to sampling error or an underlying bias.  If the values were statistically similar, we would expect a Chi-Square value of under 14.1.   The test finds that the results exceed the limits of sampling error, and indicate an underlying bias to the results.  

We also have fixed numbers for party affiliation in Kansas per the most recent registration numbers from the Secretary of State.  Those numbers are shown in the below on the left side of the chart, about 45% of Kansans are currently registered as Republican.  On page 39 of the survey we see the reported party affiliations of survey takers.  This analysis is a bit fuzzier because the way people identify doesn't always match their actual party affiliations, but we wouldn't expect that to cause the level of observed deviations in the chart.  As shown below more of the sample respondents responded as unaffiliated, about 12% points fewer Republican, and 5% points fewer democrat.  This also insinuates the sample was significantly less conservative than the registered voters of Kansas.


All of the data above speaks to how different the sample was than the general population of Kansas, but what are the takeaways from that?

  • The significant differences in population versus sample demographics undermine the 3.9% margin of error, making it unknown, and potentially much larger.  More concerning, is that the direction of the issues, make it appear that the survey was biased in a way that favored democrats.
  • Significant differences in sample versus population measured values can be indications of other underlying problems with the sample in unmeasured values.  We know that the sample was more female, affluent, left-leaning than the population, could that mean that our sample was biased in a way that made it more urban?  Unknowable with available data, but certainly problematic.
  • The researchers released the paper, with the metrics outside of margin of error, and didn't talk about it.  This is the most troubling part, because in research there are many times that statistical issues like this crop up, but they can be otherwise tested away or quantified in their impact to the margin of error.
My last thought on this:  I agree that Sam Brownback likely has a low approval rate, however the 18% approval rate, as well as other numbers related in the survey are likely an under-statement of his true approval rate, given the bias presented.

**Added 2015-11-05 
After seeing more people cite this survey I realized this survey has been occurring in Kansas since 2010, so I thought I would see if the demographic trends were consistent.  They were consistent, and there were some additional demographics added, specifically: age, race, and education level.  Not going to do a full write-up on these, but they were also significantly inconsistent with population demographics.

Oh and one last thing, I didn't talk about how questions are framed can impact results, and this this survey had one really wonky question in it that I'm not a fan of.  Specifically:

Thinking about what you paid in sales tax, property tax and state income tax together, compared to two years ago, the amount you pay in state taxes has increased, remained the same or decreased?

This question has received some press time, with the byline being that 74% of Kansans now pay more in taxes under Brownback's policies.  Because of the term "Amount" versus "Rate" in the question, I would count myself part of the 74%, but not because of Brownback's policy changes.  I pay more now because I make more money and live in a bigger house, actually an indication of success over the past four years.  I certainly don't think this is what the researchers or the press are purporting to measure.

No comments:

Post a Comment