race and genomics: comments on shiao et al.
Last week, I argued that many sociologists make a strong argument. Not only are social classifications of race a convention, but there is no meaningful clustering of people that can be derived from physical or biological traits. To make this claim, I suggested that one would need to have a discussion of what meaningful traits would include, get a huge sample people, and then see if there are indeed clusters. The purpose of Shaio et al (2012) is to claim that when someone conducts such an exercise, there is some clustering.
Before I offer my own view of the evidence that Shiao et al offer, we need to set some ground rules. What are the logical possible outcomes of such an exercise?
- The null hypothesis: your clustering methods yield no clusters (e.g., there are no detectable sub-groups of people).
- The weak hypothesis: clustering algorithms yield ambiguous results. It’s like getting in regression analysis a small correlation with a p=.07. This is important because it should shift your prior moderately.
- The “conventional” strong hypothesis: unambiguous groups that correspond to social classifications of people. E.g., there really is a “White” group of people corresponding to people from Europe.
- The “unconventional” strong hypothesis: unambiguous groups that do not correspond to common social classifications of people. For example, there might be an extremely well defined group of people that combines Hawaiians and Albanians.
A few technical points, which are important. First, any such exercise will need top incorporate robustness checks because clustering methods require the use to set up initial parameters. Clustering algorithms do not tell you how many groups there are. Instead, they answer the question of how well the model fits the hypothesis that you have X groups. Second, sociologists tend to mix up these possible outcomes. They correctly point out that there is a social construction called “race” which is real in its effects and influence on people. But that doesn’t logically entail anything about the presence or absence of human populations that are differentiated due to random variation of inherent physical traits over time. Also, they fail to consider #4. Their might be actual differences, but they might not match up to our common beliefs.
So what does Shiao at al offer and where does it lie in this spectrum of possibilities? Well, the article is a not a systematic review of genomic research that searches for clusters or people. Rather, it offers a few important points drawn from anthropology and genomics. First, Shiao et al point out that there is a now undisputed (among academics) human history. Humans originated in East Africa and then spread out (“Out of Africa thesis”). Second, as people spread out, genomic variation emerges as people mate with people close by. Third, genetic drift implies that geography will predict variations in genes. As you move from X to Y, you will see measurable differences in people. Fourth, these differences are gradual in character.
Shiao then switch gears and talk about clustering of people using genomic data. They tell us that there are statistically detectable and stable group differences and that these do not rigidly determine behavior. They also cite research suggesting these statistical groups correlate with self-described racial groupings. Then, the authors discuss a “bounded” approach to social theory where biology imposes some constraints on the variation on behavior but in a non-deterministic fashion.
I’ll get to the symposium next week, but here’s my response: 1. There is a real tension. At some points, Shiao et al suggests a world of gradual variation, which suggests no distinct racial groups (outcome #1) but then there’s a big focus clusters. 2. If we do live in a world of gradual, but real, variation in human biology, then the whole clustering approach is misleading. Instead, we might live in a world that’s like a contour map. It’s all connected, there are no groups, but you see some variables increase as you move along the map. 3. If that’s true, we need an outcome #5 – “race is not real but biology is real.” 4. I definitely need more detail on the clustering methods and procedures. Some critics have pointed out that the clusters found in research are endogenously produced, which makes me suspect that the underlying science might be hovering around outcomes #1 (it all depends on the algorithm and its parameters) or #2 (there might be some clustering, but it is very poorly defined).
Subscribe to comments with RSS.
Comments are closed.