the joy of unrepresentative samples
Let’s start with a question: Why should you believe that representative samples are good? The answer: representative samples produce estimates of the mean that are normally distributed. In plain English, a nice, big random sample will produce an estimate that is probably close to the real answer.
Follow up question: Does it logically follow that all unrepresentative samples produce systematically biased estimates? No, it doesn’t. To see why, you need a little Logic 101. In logic, it is well known that (A –> B) does not automatically entail ( not A –> not B), which is called the “inversion.” To see why ( not A –> not B) might be false, think about this conditional: “If it is a bat, it is a mammal.” Obviously, “if it is not a bat, then it is not a mammal” is not true. In general, you can’t really make any inference about an inversion from a conditional. They are simply different animals.
Let’s return to social science research methods. Our original conditional is: ( the sample is random –> the estimates are normally distributed around the real mean). It doesn’t automatically follow that ( sample is not random –> the estimates are not normally distributed around the real mean). It *might* be true, but it’s not automatically true. It requires a separate argument.
So far, I have not read a general argument showing that unrepresentative or biased samples in *all* cases leads to systematic biases in the estimated parameters. That’s because there probably isn’t such an argument because samples can be biased for all kinds of reasons. In some cases, the bias may matter but in other cases it may irrelevant.
What’s the point? The point is that social scientists should abandon their knee-jerk rejection of unrepresentative samples. Instead, we should take a “case by case” approach to unrepresentative samples. We have to individually investigate each type of unrepresentative sample to determine if it can be used to estimate a parameter.
If you can accept that, then you open up a whole new world of data. For example, a lot of people can’t accept that futures markets are accurate forecasters of events. One of the arguments is that traders do not accurate resemble voters. True, but that doesn’t logically entail that it is impossible for trader behavior to mimic voter preferences. It *might* be true, but it is not automatically true based on the principle of random sampling. So what does the research say? Well, trading markets predict presidential election tallies better than random samples of voters (polls) 74% of the time. Not bad.
Bottom line: Yes, random samples are good, but they aren’t the last word. Social scientists should be on the lookout for data sources that perform well despite biases in sampling.