itty bitty error tanks major network paper?

Attention, networkers: According to the soc net email list, there is a small, but important, coding error in the 2004 General Social Survey that casts doubt on a well known empirical result of McPhereson, Smith-Lovin, and Brashears. You might remember that they showed that the 2004 GSS reported that people’s networks were smaller. A vigorous debate ensued.

Turns out that this might attributed to a mistake in the coding of the survey data. 41 people who were missing data were coded as having zero social contacts. It’s got to be emphasized that this was a NORC error, not an error from the authors. The original research team issued this note over the Soc Net list this morning:

Since our 2006 ASR paper using these data got so much attention, it’s worth noting our initial take on the impact of this discovery.  The unweighted, uncorrected percentage of NUMGIVEN=0 in 2004 is about 27%, as opposed to about 9% in 1985.   If we  recode the 41 mis-coded 2004 respondents from 0’s to missing and use the correct weights for the sampling frame of the study (as we did in the 2006 paper), we get about 22% isolated in 2004 and about 10% isolated in 1985.   Our re-estimated models from the 2006 ASR paper, which also continue to correct for fatigue, uncooperativeness and other factors (as well as demographic shifts), still show a substantively and statistically significant difference between 1985 and 2004.   None of our results from the ASR models change substantively because of the newly discovered data problem, but we will publish corrected tables and figures as soon as ASR allows. We are also analyzing the results of a re-interview of some of the 2004 respondents, which will be forthcoming later.

Of course, the skeptics quickly came back. Claude Fischer writes on the same list:

The 2004 GSS Finding of Shrunken Social Networks: An Artifact? ABSTRACT: In 2006, McPherson, Smith-Lovin, and Brashears (MS-LB) reported that Americans’ social networks had shrunk precipitously from 1985 to 2004. They found that respondents to the 2004 General Social Survey (GSS) provided dramatically fewer names when asked to list the people with whom they discussed important matters than respondents to the 1985 GSS had given to the same question. Critically, the percentage of respondents who provided no names at all increased from about 10 percent in 1985 to about 25 percent in 2004. In this memo, I present anomalies found in the 2004 GSS network item which strongly imply that this dramatic increase in apparent social isolation is an artifact. I speculate that the artifact may be the result of random error. With as yet no complete explanation for these anomalies, scholars at this time should draw no inference from this GSS question as to whether American social networks changed substantially from 1985 to 2004 – they probably did not – and should be cautious in using the 2004 network data.

Fischer says he’ll post his paper on his personal website:

Written by fabiorojas

September 24, 2008 at 3:31 pm

4 Responses

Subscribe to comments with RSS.

  1. Update:
    The paper referred to is up at the top of this page:
    Note: My analysis suggests that there is much more error in the data than the 41 newly-discovered mis-coded cases, which I have taken into account.

    C. Fischer



    September 24, 2008 at 5:17 pm

  2. It can happen. I had a similar miscoding (my fault, entirely) on a paper in Forces in 1991. It didn’t change much substantively. And, hopefully that’s the case here. I haven’t read the Fischer paper yet, but I can see that happening. Doug Eckberg bought a GSS dataset from Gallup (he’s at a small college with no ICPSR), and it contained a very serious error across some items. The only reason why he caught it is because he’d worked on previous years data, and what he had didn’t match for those years (it was a column error ifrom Gallup in the program to convert to a SAS file). I wonder if Miller and Lynn were privy to an early release of the network module, and the initial conversion program had an error in the missing statements….



    September 26, 2008 at 12:36 am

  3. Fischer’s paper is a huge service to the field, and should be required reading in method’s courses (at least those on [social network] surveys), both for the care with which he attacks the problem and the lmeasured approach to assigning blame. I think though it is unfortunate that the authors of the original study were so quick to send what looks like a pre-emptive strike before Fischer posted his paper. It is very clear from reading it that the fixing of the 41 cases does not solve the problem. And there is little reason for the authors to be defensive since they do not seem culpable, except perhaps for being too quick to publicize the results as real.


    Ezra Zuckerman

    September 26, 2008 at 3:00 pm

  4. […] I recently read Scott Long’s new book The Workflow of Data Analysis Using Stata and I highly recommend it. One of the ironies of graduate education in the social sciences is that we spend quite a bit of time trying to explain things like standard error but largely ignore that on a modal day quantitative research is all about data management and programming. Although Long is too charitable to mention it, one of the reasons to emphasize these issues is that many of the notorious horror stories of quantitative research do not involve modeling but data management. For instance, “88″ was an unnoticed missing value code not actual data on senescent priapism, it was a weighting error that led to wildly exaggerated estimates of post-divorce income effects, and, most recently, findings about anomie were at least in part an artifact of a NORC missing data coding error. […]


Comments are closed.

%d bloggers like this: