First, a quick introduction: I am sociologist from Stanford working in organizational theory at the Tepper B-school, CMU. My interest is primarily in information within social networks (i.e. how people and firms find it, change it, share it, learn from it, etc.). Currently, I am investigating corruption in communication networks.
Given my interest in information and corruption coupled with the recent data frauds in academia, I have been thinking a lot about reproducible science, particularly for the social sciences. Creating norms or policies that enforce reproducible science may not only be cheap insurance to mitigate academic fraud but also improve our field.
Obviously, I am not the first to see the benefits of making data and methods available. There are quite a few wonderful archives out there that provide the infrastructure, but they are frightfully empty. For example, a quick search on ISPCR gave me a total of 8,369 studies. Harvard’s IQSS also has the wonderful Dataverse, but I don’t know of many scholars who actually use it.
What we have is a good old fashion social dilemma. Why should I make all my data and syntax available if you don’t? Here, I wonder if the journals themselves shouldn’t shoulder more of this effort, requiring data and syntax files from authors. (I do recognize that this is easy to suggest since I am not an editor and never have been). But I would be in incomplete favor of a journal requiring that I produce the data & the syntax file as a requirement to publishing my work. Now, I know this won’t deter particularly crafty fraudsters but it’s just too easy for some researchers to fake findings. And I think the pressure is only going to increase. It’s Merton’s innovation, scholars feel pressure to succeed but the “legitimate” avenues don’t produce the right p-values so they commit “creative data analysis.” As researchers, shouldn’t we really look to creating an atmosphere of openness and transparency with our data and methods?
There are other benefits as well. At Stanford, Bill Barnett teaches this wonderful methods course. Through his own connections and personal solicitations, Bill has gathered the data files for many important org theory papers. In order to get the files, he did assure the authors that the data would only be used for pedagogical purposes. In his course, Bill asks student to reproduce the findings from these papers and maybe improve upon the studies with new statistical techniques or methods. It’s a wonderful way to teach graduate students a number of important skills. But couldn’t all graduate students benefit from reproducing important studies?
Okay that was far preachy than I intended. Maybe for my next post I can report on drunken scholar sightings at AoM.