bad reporting on bad science
This Guardian piece about bad incentives in science was getting a lot of Twitter mileage yesterday. “Cut-throat academia leads to natural selection of bad science,” the headline screams.
The article is reporting on a new paper by Paul Smaldino and Richard McElreath, and features quotes from the authors like, “As long as the incentives are in place that reward publishing novel, surprising results, often and in high-visibility journals above other, more nuanced aspects of science, shoddy practices that maximise one’s ability to do so will run rampant.”
Well. Can’t disagree with that.
But when I clicked through to read the journal article, the case didn’t seem nearly so strong. The article has two parts. The first is a review of review pieces published between 1962 and 2013 that examined the levels of statistical power reported in studies in a variety of academic fields. The second is a formal model of an evolutionary process through which incentives for publication quantity will drive the spread of low-quality methods (such as underpowered studies) that increase both productivity as well as the likelihood of false positives.
The formal model is kind of interesting, but just shows that the dynamics are plausible — something I (and everyone else in academia) was already pretty much convinced of. The headlines are really based on the first part of the paper, which purports to show that statistical power in the social and behavioral sciences hasn’t increased over the last fifty-plus years, despite repeated calls for it to do so.
Well, that part of the paper basically looks at all the papers that reviewed levels of statistical power in studies in a particular field, focusing especially on papers that reported small effect sizes. (The logic is that such small effects are not only most common in these fields, but also more likely to be false positives resulting from inadequate power.) There were 44 such reviews. The key point is that average reported statistical power has stayed stubbornly flat. The conclusion the authors draw is that bad methods are crowding out good ones, even though we know better, through some combination of poor incentives and selection that rewards researcher ignorance.
The problem is that the evidence presented in the paper is hardly strong support for this claim. This is not a random sample of papers in these fields, or anything like it. Nor is there other evidence to show that the reviewed papers are representative of papers in their fields more generally.
More damningly, though, the fields that are reviewed change rather dramatically over time. Nine of the first eleven studies (those before 1975) review papers from education or communications. The last eleven (those after 1995) include four from aviation, two from neuroscience, and one each from health psychology, software engineering, behavioral ecology, international business, and social and personality psychology. Why would we think that underpowering in the latter fields at all reflects what’s going on in the former fields in the last two decades? Maybe they’ve remained underpowered, maybe they haven’t. But statistical cultures across disciplines are wildly different. You just can’t generalize like that.
The news article goes on to paraphrase one of the authors as saying that “[s]ociology, economics, climate science and ecology” (in addition to psychology and biomedical science) are “other areas likely to be vulnerable to the propagation of bad practice.” But while these fields are singled out as particularly bad news, not one of the reviews covers the latter three fields (perhaps that’s why the phrasing is “other areas likely”?). And sociology, which had a single review in 1974, looks, ironically, surprisingly good — it’s that positive outlier in the graph above at 0.55. Guess that’s one benefit of using lots of secondary data and few experiments.
The killer is, I think the authors are pointing to a real and important problem here. I absolutely buy that the incentives are there to publish more — and equally important, cheaply — and that this undermines the quality of academic work. And I think that reviewing the reviews of statistical power, as this paper does, is worth doing, even if the fields being reviewed aren’t consistent over time. It’s also hard to untangle whether the authors actually said things that oversold the research or if the Guardian just reported it that way.
But at least in the way it’s covered here, this looks like a model of bad scientific practice, all right. Just not the kind of model that was intended.
[Edited: Smaldino points on Twitter to another paper that offers additional support for the claim that power hasn’t increased in psychology and cognitive neuroscience, at least.]