take the pill

Today, I’ll directly address Sam Lucas’ article, “Beyond the Existence Proof: Ontological conditions, epistemological implications, and in-depth interview research,” published in 2012 in Quantity and Quality. In it, he argues that there is no basis at all for generalizing conclusions from the types of unrepresentative samples that are used by interview researchers. The best you can do is use the sample to document some fact (“an existence proof”), not make any out of sample generalizations. You can read Andrew Perrin’s commentary here.

To illustrate his argument, let’s return to yesterday’s hypothetical about unrepresentative samples. I said: “What if Professor Lucas suddenly found out that his heart medication was tested with an unrepresentative sample of white people from Utah? Should he continue taking the medication?” I’ll outline two answers to this question.

1. According to Professor Lucas, the reason that you can’t generalize from unrepresentative samples is that the world is “lumpy.” What does this mean? It means this:

All analysts confront a social world that is lumpy. By “lumpy” I mean that in the large-dimensioned social space there are concentrations of entities, and sparse locales; some constellations of characteristics are common, others rare; hills and mountains rise from some spots on the social terrain, valleys and ravines mark others.

In other words, individual cases are not linearly, or normally distributed, in the vector space that describes the variables we care about. The density map of the social world looks like this:


Thus, a sample that comes from one “island” (or near a critical point in the surface, to use our sophomore calculus terminology) would yield data that would have a large error term when extrapolated. Professor Lucas would tell his doctor: “Thanks for the medication, but African Americans might be in a cluster that is very far away from the sample of White men in Utah. Since this is expensive and I might have serious side effects, I’ll discontinue the treatment. ”

2. My response is that Professor Lucas has an unstated assumption, which may or may not be supported. The assumption is that all degrees of lumpiness are equally likely. In other words, the world I showed you is just as likely as this one:


In other words, Professor Lucas correctly points out that it is possible that you live in a lumpy world (an obviously correct theoretical point), but then tacitly assumes that there a good chance that you actually live in a lumpy world. The second point is empirical and can only be determined via study. If it’s true, then you are, in general, allowed to dismiss unrepresentative samples. If not, then you can start thinking about how good or bad the sample is.

So, here’s what I would say, which is similar to some of the commenters: “I recognize that the study testing the efficacy of Berkleyflaxin is flawed in an important way. But I also recognize that it has some information. What I’ll do is then look for evidence that that I live in ‘lumpy’ world where there is huge variation the correlation between ethnicity and response to other similar medications. If I find it, I’ll discontinue. If I find evidence that I live in a less lumpy ‘linear’ world, then I’ll take my chances. If I find no evidence, I’ll probably continue because there aren’t *that* many drugs where there is huge variation in effectiveness across ethnic groups.”

Now, I want to make a few things clear. In many ways, I strongly agree with the gist of Lucas’ argument. It is often the case that in-depth interview researchers pretend to limit their claims and then jump to broad conclusions. I can easily imagine the Berkeley cultural sociology student who wants to make claims about schools  based on their in depth interviews of twelve immigrant kids in the Mission district. But still, that doesn’t mean that unrepresentative samples have no value for inferential social science. Instead, we have to start figuring the different types of processes that produce data and build systematic theories of when the data is useful or not.

Adverts: From Black Power/Grad Skool Rulz

Written by fabiorojas

October 11, 2013 at 12:01 am

3 Responses

Subscribe to comments with RSS.

  1. Interesting debate. Two points I would like to mention:

    1) Experimental economics generally uses small and unrepresentative samples, i.e. students (they are preferred subjects because low opportunity cost make it cheaper to create sufficient incentives), to make claims about how humans generally behave in markets. I realize not everyone reading orgtheory will agree, but I believe such research still managed to produce a lot of insightful knowledge about human decision making. Major results have also been replicated across multiple different, equally unrepresentative samples, with remarkable consistency; and have been corroborated by ‘real world’ empirical data. Moreover, rather than being flustered by canned criticisms about representativeness (i.e. ‘analyzing what a few students do in lab contributes nothing to our knowledge of society’), potential biases are investigated and charted (e.g. experienced economics students tend to act more selfishly/rationally in certain situations).

    2) Indiscriminately holding all samples up to a gold standard of representativeness is, in my opinion, bound to smother scientific progress. I’m all for trying to get the best data you possibly can. But as has been mentioned in previous posts and comments, it is simply not always possible to have a representative sample, or to know in what way a sample differs from the relevant population. Many questions in the social sciences call for data for which this is unfortunately the case. In fact, if it were easy to get good data, we would already have good answers to these questions. With this in mind, I do not understand why it is so common that reviewers and editors criticize innovative (presumably best possible given the circumstances) data for lack of representativeness without explicitly indicating how they believe the reported associations are biased.



    October 11, 2013 at 7:55 am

  2. I think we’re converging on some sort of consensus – essentially, if you have prior knowledge of the distribution that allows you to evaluate how, and in what ways, a sample is nonrepresentative, you can use that knowledge to adjust for the nonrepresentativeness. My point is just that such an operation requires that you first know something about the population, which you can know only through representative sampling.

    You’re wrong, IMHO, about the assumption involved in Lucas’s lumpiness claim. You say that his case rests on an assumption that the world is lumpy. But that’s not true – his case rests only on the assumption that the world could be lumpy. If the population is lumpy, a representative sample succeeds and a nonrepresentative sample fails. If the population is not lumpy, a representative sample succeds and a nonrepresentative sample succeeds. So on those grounds alone, representative wins. But in the case of no prior knowledge about the population, a nonrepresentative sample fails even if the population is not lumpy because it provides no adequate information that would let us distinguish between an (accidentally) correct inference and an incorrect inference.



    October 11, 2013 at 2:14 pm

  3. […] around this paper by Samuel Lucas. Two other sociologists — Andrew Perrin here, and Fabio Rojas here and here — provide responses. I feel no need to disagree with anything yet said. However, I […]


Comments are closed.

%d bloggers like this: