Archive for the ‘research’ Category
Here’s the list (so far):
Some people might want to hand wave the problem away or jump to the conclusion that science is broken. There’s a more intuitive explanation – science is “brittle.” That is, once you get past some basic and important findings, you get to findings that are small in size, require many technical assumptions, or rely on very specific laboratory/data collection conditions.
There should be two responses. First, editors should reject submissions which might depend on “local conditions” or very small results or send them to lower tier journals. Second, other researchers should feel free to try to replicate research. This is appropriate work for early career academics who need to learn how work is done. Of course, people who publish in top journals, or obtain famous results, should expect replication requests.
Science just published a piece showing that only a third of articles from major psychology journals can be replicated. That is, if you reran the experiments, only a third of experiments will have statistically significant results. The details of the studies matter as well. The higher the p-value, the less like you were to replicate and “flashy” results were less likely to replicate.
Insider Education spoke to me and other sociologists about the replication issue in our discipline. A major issue is that there is no incentive to actually assess research since it seems to be nearly impossible to publish replications and statistical criticisms in our major journals:
Recent research controversies in sociology also have brought replication concerns to the fore. Andrew Gelman, a professor of statistics and political science at Columbia University, for example, recently published a paper about the difficulty of pointing out possible statistical errors in a study published in the American Sociological Review. A field experiment at Stanford University suggested that only 15 of 53 authors contacted were able or willing to provide a replication package for their research. And the recent controversy over the star sociologist Alice Goffman, now an assistant professor at the University of Wisconsin at Madison, regarding the validity of her research studying youths in inner-city Philadelphia lingers — in part because she said she destroyed some of her research to protect her subjects.
Philip Cohen, a professor of sociology at the University of Maryland, recently wrote a personal blog post similar to Gelman’s, saying how hard it is to publish articles that question other research. (Cohen was trying to respond to Goffman’s work in the American Sociological Review.)
“Goffman included a survey with her ethnographic study, which in theory could have been replicable,” Cohen said via email. “If we could compare her research site to other populations by using her survey data, we could have learned something more about how common the problems and situations she discussed actually are. That would help evaluate the veracity of her research. But the survey was not reported in such a way as to permit a meaningful interpretation or replication. As a result, her research has much less reach or generalizability, because we don’t know how unique her experience was.”
Readers can judge whether Gelman’s or Cohen’s critiques are correct. But the broader issue is serious. Sociology journals simply aren’t publishing error correction or replication, with the honorable exception of Sociological Science which published a replication/critique of the Brooks/Manza (2006) ASR article. For now, debate on the technical merits of particular research seems to be the purview of blog posts and book reviews that are quickly forgotten. That’s not good.
Cristobal Young is an assistant professor at Stanford’s Department of Sociology. He works on quantitative methods, stratification, and economic sociology. In this post co-authored with Aaron Horvath, he reports on the attempt to replicate 53 sociological studies. Spoiler: we need to do better.
Do Sociologists Release Their Data and Code? Disappointing Results from a Field Experiment on Replication.
Replication packages – releasing the complete data and code for a published article – are a growing currency in 21st century social science, and for good reasons. Replication packages help to spread methodological innovations, facilitate understanding of methods, and show confidence in findings. Yet, we found that few sociologists are willing or able to share the exact details of their analysis.
We conducted a small field experiment as part of a graduate course in statistical analysis. Students selected sociological articles that they admired and wanted to learn from, and asked the authors for a replication package.
Out of the 53 sociologists contacted, only 15 of the authors (28 percent) provided a replication package. This is a missed opportunity for the learning and development of new sociologists, as well as an unfortunate marker of the state of open science within our field.
Some 19 percent of authors never replied to repeated requests, or first replied but never provided a package. More than half (56 percent) directly refused to release their data and code. Sometimes there were good reasons. Twelve authors (23 percent) cited legal or IRB limitations on their ability to share their data. But only one of these authors provided the statistical code to show how the confidential data were analyzed.
Why So Little Response?
A common reason for not releasing a replication package was because the author had lost the data – often due to reported computer/hard drive malfunctions. As well, many authors said they were too busy or felt that providing a replication package would be too complicated. One author said they had never heard of a replication package. The solutions here are simple: compiling a replication package should be part of a journal article’s final copy-editing and page-proofing process.
More troubling is that a few authors openly rejected the principle of replication, saying in effect, “read the paper and figure it out yourself.” One articulated a deep opposition, on the grounds that replication packages break down the “barriers to entry” that protect researchers from scrutiny and intellectual competition from others.
The Case for Higher Standards
Methodology sections of research articles are, by necessity, broad and abstract descriptions of their procedures. However, in most quantitative analyses, the exact methods and code are on the author’s computer. Readers should be able to download and run replication packages as easily as they can download and read published articles. The methodology section should not be a “barrier to entry,” but rather an on-ramp to an open and shared scholarly enterprise.
When authors released replication packages, it was enlightening for students to look “under the hood” on research they admired, and see exactly how results were produced. Students finished the process with deeper understanding of – and greater confidence in – the research. Replication packages also serve as a research accelerator: their transparency instills practical insight and confidence – bridging the gap between chalkboard statistics and actual cutting-edge research – and invites younger scholars to build on the shoulders of success. As Gary King has emphasized, replications have become first publications for many students, and helped launched many careers – all while ramping up citations to the original articles.
In our small sample, little more than a quarter of sociologists released their data and code. Top journals in political science and economics now require on-line replication packages. Transparency is no less crucial in sociology for the accumulation of knowledge, methods, and capabilities among young scholars. Sociologists – and ultimately, sociology journals – should embrace replication packages as part of the lasting contribution of their research.
Table 1. Response to Replication Request
|Yes: Released data and code for paper||15||28%|
|No: Did not release||38||72%|
|Reasons for “No”|
|IRB / legal / confidentiality issue||12||23%|
|No response / no follow up||10||19%|
|Don’t have data||6||11%|
|Don’t have time / too complicated||6||11%|
|Still using the data||2||4%|
|‘See the article and figure it out’||2||4%|
Note: For replication and transparency, a blinded copy of the data is available on-line. Each author’s identity is blinded, but the journal name, year of publication, and response code is available. Half of the requests addressed articles in the top three journals, and more than half were published in the last three years.
Figure 1: Illustrative Quotes from Student Correspondence with Authors:
- “Here is the data file and Stata .do file to reproduce [the] Tables…. Let me know if you have any questions.”
- “[Attached are] data and R code that does all regression models in the paper. Assuming that you know R, you could literally redo the entire paper in a few minutes.”
- “While I applaud your efforts to replicate my research, the best guidance I can offer
is that the details about the data and analysis strategies are in the paper.”
- “I don’t keep or produce ‘replication packages’… Data takes a significant amount of human capital and financial resources, and serves as a barrier-to-entry against other researchers… they can do it themselves.”
Plummeting grant funding rates are back in the news, this time in the U.K., where success rates in the Economic and Social Research Council—a rough equivalent to NSF’s SBE division—have dropped to 13%. In sociology, it’s even lower—only 8% of applications were funded in 2014-15.
I’ve written before about the waste of resources associated with low funding rates. But this latest round prompted me to do some back-of-the-envelope calculations. Disclaimer: these numbers are total guesses based on my experience in the U.S. system. I think they are pretty conservative. But I would love to see more formal estimates.
book spotlight: remaking college: the changing ecology of higher education, edited by kirst and stevens
Recent orgtheory posts excepted, we pay way too much attention to a tiny handful of higher education institutions in the U.S. (Not to mention too much attention to the U.S. relative to the rest of the world.)
Academic chatter often assumes research universities are the prototypical higher ed organization, even though only 23% of students are enrolled in such universities (RU/VH or RU/H). By comparison, more than a third are enrolled in community colleges, and nearly 10% in for-profit institutions.
At the level of public attention, focus gets even narrower. A New York Times search gets 310 hits for “community college,” versus nearly 13,000 for “Harvard.” Recently historian David Perry surveyed two months of NYT op-eds containing the word “professor” and found
zero by community college or lower-status teaching school profs, zero by branch campus public profs, and a handful by top liberal arts schools (Smith, Dickinson) or lower-tier R1 publics (Colorado State, South Carolina).
So kudos to Michael Kirst and Mitchell Stevens for noticing that the world of higher ed is bigger than that. Remaking College: The Changing Ecology of Higher Education, published a couple of months ago by Stanford UP, focuses on the institutions that are underappreciated by the media and scholars: comprehensive colleges, community colleges, for-profit colleges. By bringing together a diverse group of academics — several of whom take an explicitly organizational approach — to focus on broad-access institutions, they have done the field a real service.
The essays cover a range of ground and approaches. Several, including an orienting one by W. Richard Scott, conceptualize higher ed as an ecology or field. I’ll just highlight a couple I particularly enjoyed here.
In “The Classification of Organizational Forms: Theory and Application to the Field of Higher Education,” Martin Ruef and Manish Nag use topic models based on IPEDS data to generate new sets of categories for U.S. postsecondary institutions. From mission statements, for example, they infer not only two distinct clusters of liberal arts schools and two of community colleges, but several additional types of institutions — globally-oriented colleges, Christian colleges, medical tech schools, student-oriented universities — that might otherwise go unnoticed. Like other good work that identifies patterns from texts, it prompts a rethinking of cultural identity beyond assumed categories.
Regina Deil-Amen makes a significant contribution just by hammering home how atypical the “typical” college student really is. Nearly three-quarters of first-year undergraduates are enrolled in community colleges or for-profit institutions. 53% are not enrolled full-time. Only 13% live on campus. 13 percent! Her quotes of interviews with lower-income and Latino students, who are dealing with family stresses and financial struggles, are telling:
My family has a lot of financial problems, so that’s another stress that I’m constantly dealing with. I have to call them like, ‘Mom, are you gonna be able to pay rent this month?’…I’ve actually used some of my loans to help them pay their rent this year. (p. 146)
These firsthand accounts reinforce how inaccurate the picture of a dependent 18-year-old striking out on her own for the first time actually is.
I also enjoyed Richard Arum and Josipa Roksa’s reflection on measuring college performance, where they emphasize that they
have vehemently argued against the desirability of an externally imposed accountability schema. We are deeply skeptical of increased centralized regulation of this character—fearing that the unintended consequences would far outweigh any benefits—and have instead called for institutions themselves to assume enhance responsibility for monitoring and improving student outcomes. (p. 170)
I’m not sure they know how to measure college quality either, but it’s a thoughtful piece.
Higher ed really is a diverse organizational ecology, and it’s going to take a lot of work to map out the whole landscape. But I’m very glad that people like Kirst and Stevens are moving us in that direction.