sorry, darren sherkat, low response rates are not bullshit

Scatterplot has a discussion on one my favorite topics, low response rates. The observation is that political polls have low response rates, but they produce decent answers, contrary to standard sociological advice. For years, I have argued that response rates do not logically entail biased data. It is simply a logical fallacy to deduce that survey data is biased only because of the response rate. Two examples that show the logical fallacy of deducing bias from response rates alone:

  1. High response rate, very biased: Let’s say that I fielded a survey that everyone responded to, except for Jews. They didn’t respond at all because I printed a swastika on the envelope. Every single Jewish respondent just threw it in the trash. The result? A response rate of about 97%. High response rate? Yes – textbook perfect. Bias? Yes – any question regarding Judaism (e.g., is R Jewish?) will be biased.
  2. Low response rate, no bias: Let’s say that I fielded a survey on Oct 1, 2012 in New York City. Say all 1,000 people who got the survey responded. Great! On October 21, I decide to use research funds to draw an extra sample of 9,000 names and send them the same survey. Oh no! Hurricane Sandy hits and nobody responds. Response rate? 10%. Biased? No – because not responding was a random event. The people in wave 1 were randomly chosen.

The issue isn’t the response rate – it’s selection into the study. If selection is correlated with the data (a religion survey that alienates a religious group), then the data is biased. If selection is random, then you have no bias. Selection biases can occur or not occur over the range of response rates from 1% to  99%.

Ok, you say, but maybe it’s not a logic issue. Sure, logically low response rate doesn’t *have* to lead to lead to bias. But in practice, low response is empirically related to bias. May low response rates means only really weird people answer the phone or send back the survey.

This is actually a fair point, but it’s wrong. You see, the bias-low response rate connection is an assumption that can be tested. And guess what? Public opinion researchers have actually tested the assumption through a number of studies. For example, Public Opinion Quarterly in 2000 published the results of an experiment where a survey was run twice. The first time, you just let people do whatever they want (response rate 30%). The second time, you really, really bug people (response rate 60%). The result? Same answers on both surveys. Follow up studies often find the same result.

In fact, in discussing this issue with John Kennedy, our recently retired director of survey research, I found out that this is an open secret among survey professionals. Response rates are a completely bogus measure of bias in survey data. It’s a shame that social scientists have held on to this erroneous belief, despite the work being done in public opinion research.

Awesome books: From Black Power/Grad Skool Rulz

Written by fabiorojas

November 9, 2012 at 3:52 am

Posted in fabio, mere empirics

4 Responses

Subscribe to comments with RSS.

  1. I’m mostly with you. However, the claim that “response rates are a completely bogus measure of bias in survey data” is a little extreme. After Jeremy’s post I decided to read a bunch of the public opinion research on this. And it shows that response rate CAN be a measure of bias, but that there are lots of other measures of bias that are potentially more important and ignored because response rate is take as their ultimate proxy. And, just because it can be a source of bias doesn’t mean it always is one. In most instances if the design is good (the other sources of bias are addressed through such design), then response rate bias isn’t nearly as much of a concern as we assume it is.


    Shamus Khan

    November 9, 2012 at 5:00 pm

  2. Shamus, at first, I was going to agree with you and blame the strong language on late night blogging delirium. I realized that I stand by the wording. The issue isn’t response rate. The issue is nonresponse bias. At best, response rates might, in some cases, indicate serious nonresponse bias. But not always, and in many cases there is no nonresposonse bias.

    So given that a measure of your data (the response rate) may or may not be linked to nonresponse bias and that they bias itself is often modest, and that only severe selection bias will change many estimates, then what should we do about response rate?

    My answer: ignore it and focus on the real issue – selection bias. In many cases, you can actually directly model selection bias. For samples of the population, we can estimate the differences between the sample and standard data sets like the census. If you have the resources, you can measure selection bias by comparing people who didn’t need to prompted with those who needed prompts. If you are in a big research area, different research groups will use different sampling methods and you can see if they make a difference and by how much. Then, if you actually find some bias you can adjust your models by creating sample weights or by using a two stage model.

    Sadly, sociologists refuse to actually think through the problem and just rely on response rates. It’s logically wrong, emprically suspect, and intellectually lazy.



    November 9, 2012 at 6:44 pm

  3. (Note to young survey researchers: if you actually do accidentally print a swastika on the envelope of your survey, you will have bigger problems than just a compromised response rate.)

    I agree with most of your argument, of course, but extremely high response rates do set a bound on the possible non-response bias for a evenly distributed outcome. To take your toy example, a survey that had 100% response among non-Jews and 0% response among Jews, and Jews vote 2:1 Democratic in an otherwise 50-50% electorate and are 3% of the population, then you’d only miss the % Democrat by one percentage point. Whereas a sample that had a 3% response rate due 100% Jewish participation and 0% non-Jewish participation would miss the true answer by 16 percentage points.

    Thing is, what we consider high response rate studies of public opinion (GSS/ANES), do not have high enough response rates to prevent a considerable amount of bias already if it is going to be a problem.



    November 9, 2012 at 8:37 pm

  4. @jeremy: Good point, but your comment points out how subtle the response rate/bias link can be. Returning to the example, the answer for % democrat is only a little biased. The answer for % jewish is quite biased and highly misleading. With a large N, you’d probably get a very tight confidence interval around 0%.

    Lesson: Bias is selection + data, which has a subtle and often weak link to response rates. If you are interested in response bias, better to investigate directly than make the judgment based on response rates.



    November 9, 2012 at 8:55 pm

Comments are closed.

%d bloggers like this: