As discussed here before, computational research has incredible potential for social scientists interested in studying patterns of communication, the emergence of networks and beliefs, etc. The big problems with this kind of research for the time being are getting access to the data, figuring out how to process it, and dealing with privacy issues. The latter issue heated up last year when it was revealed that data gathered from university students’ Facebook networks could be used to identify the supposedly anonymous students.  This is a thorny issue. How does one use rich online data about people’s communication and consumption habits without exposing them?

The Facebook drama continues although it has most recently been amplified by Facebook itself. As many of you probably know, Facebook recently redefined their Terms of Use. The changes effectively gave Facebook ownership over any piece of information posted on its website. Those of you who use Facebook would suddenly have given rights away to any content posted on your profiles, including pictures and personal information. Some Facebook users were, of course, outraged over this. The Consumerist blog called for a boycott of the website. Facebook, recognizing the image crisis it suddenly had, quickly took action and changed the Terms of Use back to their September 2008 form.  While this is all well and good for the hundreds of thousands of Facebook users who don’t want to lose complete ownership of their online content, it creates a headache for scholars who want to use Facebook to study online networks. Syracuse University’s Ines Mergel highlights the problems (and potential solutions) of Facebook data collection on the Complexity and Social Networks blog.

  • Facebook does not allow research (or anyone) to store data more than 24 hours, which makes it difficult to clean, analyze and of course at the end publish the data
  • Data needs to be anonymous (especially in SNA network data cannot be anonymous – we need to know what kind of actors are nominating other actors and longitudinal data analysis seems to be impossible)
  • So far I have identified three different ways to collect/use Facebook data, although at this point it is unclear how people can comply to the first two bullet points.

1. Bernie Hogan at the Oxford Internet Institute, University of Oxford, UK, has created a Facebook application available on iTunesU to analyze Facebook data (open iTunes -> iTunes -> Oxford University).

2. Dataverse project at Harvard’s Berkman Center has made available Facebook data.

3. Create an application or a group on Facebook where you can find a way to have people give their consent to collect data on their online behavior and contacts.

The latter solution seems problematic to me because any network created where people joining know beforehand they’re giving permission to someone to analyze that network will have some obvious selection bias. The whole point of analyzing Facebook data, as I see it, is that you can look at the emergent properties of networks as they take place in a natural(?) environment. The moment you start getting people’s informed consent before they join the network you destroy the natural setting.

Still, I’m very interested in how one might overcome this problem. My belief is that Facebook and other online network sites hold enormous potential for studying  a range of emergent social phenomena, including collective action and the spread of beliefs. We just have to figure out how to deal with the privacy issue.


Written by brayden king

February 23, 2009 at 3:45 pm

4 Responses

  1. As I see it, Facebook and other online network sites could not only be a research tool for for studying collective action but are also an interesting research object in this regard, especially as examples of private authority regulators who struggle with the “voice” of their (not-only) customers.

    The German Facebook-clone “StudiVZ” experienced similar troubles with its users after changing the terms of service about a year ago. The pattern I see here is that network effects prevent dissatisfied users from exiting and leaves them with voice as the only option left…



    February 23, 2009 at 4:47 pm

  2. Brayden, this is a really interesting question. I think that one solution could be to develop a sample of respondents and get informed consent much like in a non-online setting. One of the great things about Facebook and other social networking sites (e.g. LinkedIn) is that there is basically a roster of the population to which you are trying to generalize. Furthermore, even based on the publicly available data, you can develop a pretty sophisticated sampling scheme – you could stratify on gender, relationship, age, geographic location, type of school attended. Then, theoretically, instead of applying a Facebook app that would simply ask people to join (analogous to a convenience sample), you could have the app be the informed consent form and really target those people hard — potentially with incentives, just like in face-to-face interviewing.

    You could create analogues to traditional forms of sampling — snowball sampling, clustered sampling (if, for instance, you are interested in generalizing to a population from a certain geographic region or type of school), or even convenience sampling where it may be appropriate for things like focus groups.

    Of course, you wouldn’t have the density of networks to be able to work with, but at a certain point, too much data can become almost as crippling as not enough data.



    February 23, 2009 at 7:04 pm

  3. […] a comment » Bernie Hogan, who I mentioned in an earlier post, has written a very cool application for Facebook that creates network matrices of your friends […]


  4. There are more than privacy issues at hand. We don’t know enough about who uses these services in the first place, who tends to use them for what, what the online network is a reflection of (e.g., different people have very different approaches to “friending” folks) and how behavior on these systems reflects other behavior. There is some research out there on this, but not too much as many people are more enthusiastic to simply jump on the data than think about – never mind actually research! – these challenges (and potential limitations).

    Here’s a paper I wrote that suggests how selection into use of these sites is not random, at least not in one particular community (and no reason to think it would be in others).



    February 26, 2009 at 6:30 pm

