orgtheory.net

a new era in research on the media and social movements

If you are a social movement researcher, you often want data from the media. But there are serious logistical problems, not to mention the regular problems one has when one tries to interpret media data. Obtaining media data is hard. You need a lot of resources to do any but the most basic analyses. Doug McAdam’s group had a large NSF grant to support a detailed coding of the NY Times. In my own research, I had a team of undegraduates work for a year to scour three major newspapers for reports of Black student protest events.

That era is now over. As long as the media your are interested in is digitized and accessible, you can compile a data set in days, if not hours. There are two general approaches. First, you can use search engines to generate lists of articles with key words. Then the human coders take their turn. Second, if you are merely counting words that clearly tag a concept (e.g., “the Tea Party”) then you can write (or pay someone to write) a program called a “web scraper” to load websites and extract the text you need. For older media, such as newspaper, say, pre-1990, this is hard. But if you have a question about a recent movement, then it’s orders of magnitude easier. I forsee an era where sociologists routinely partner with computer science geeks to generate powerful data sets cheaply and complete research in months, rather than years.

Adverts: From Black Power/Grad Skool Rulz 

About these ads

Written by fabiorojas

March 5, 2013 at 12:41 am

10 Responses

Subscribe to comments with RSS.

  1. Don’t forget that a lot of sites (e.g. Twitter, Facebook, the NYT, some Google sites) have APIs that can facilitate this kind of data collection.

    JD

    March 5, 2013 at 1:40 am

  2. Foresee an era? Actually, this is already trivial. There are simple scraper programs in R, Stata, Python, and a variety of other programs that are being used by sociologists, economists, psychologists, and folks in public health (just to name a few recent examples) to grab all manner of data from the web. A basic google search will turn up any number of these programs. You don’t need someone in computer science, just a basic understanding of programming to adapt the programs to your use.

    anon

    March 5, 2013 at 3:11 am

  3. @anon – I’m not sure that drawing attention to a form of data collection that has only existed for a few years and, probably, <2% of sociologists are actively using with is trivial.

    JD

    March 5, 2013 at 3:42 am

  4. I am definitely enthusiastic about web-scraping, APIs, and big data more broadly. But I think it is also important to keep in mind that with big data come big challenges. For example, tools like Latent Dirichlet Analysis are powerful, but cannot yet capture many of the fine grained objects of study of interest to many students of social movements (e.g. frames, ideologies, identities, etc.). Another limitation is that much of the big data wave contains little information about the social context in which texts are produced. This does not apply to newspaper documents as much as blogs, twitter, and Facebook, but it is an important issue: just how many people Tweet? How many of them Tweet about everything that comes to their minds? Finally- while I agree with anon above that the entry costs are coming down- high-powered big data analysis still requires significant skills in either R or Python (neither of which have really caught on within mainstream sociology- even for those who already know R, there are a variety of new challenges such as how to interface with APIs, deal with SSL errors, etc.). Gary King is working on some new web-based tools that will make big data tools more user-friendly, but for now, people who want to jump on board the big data movement need some chops- unless they only want big data for purely descriptive purposes (and even then, extracting simple information like dates or time from big data can be challenging depending upon the structure of the data).

    I have a new working paper that attempts to provide a vision for how to integrate big data and big data analysis tools with current theoretical debates within cultural sociology that touches on some social movements issues in passing. If anyone is interested, please email me. Also, Neal Caren is editing a special issue of Mobilization right now that looks at new methods for social movements research that will cover many of these issues, I expect.

    Chris Bail

    March 5, 2013 at 3:43 pm

  5. Because these tools are relatively new to many of us I thought it might be helpful to know that many sites have policies requesting you use their API and not a scraper/crawler. Theoretically violating the policy could lead to them restricting access to their site for you, and even others from your U. Whether that’s myth or reality, it’s probably best to be respectful to their wishes if they’re kind enough to provide their info.

    You can discover any sites policy by going to xxx/robots.txt

    more info here:
    http://www.robotstxt.org/robotstxt.html

    py

    March 5, 2013 at 8:21 pm

  6. Good point py. Many sites, such as Google, will also actively shut you down if you make too many repeated requests. Unfortunately, the last time I checked Google’s API doesn’t cover many of its services (such as Google News). Facebook and Twitter, however, have fairly easy-to-use APIs once you get a hold of their idiosyncratic terminology. Without authentication, however, one can’t get much from Facebook (twitter is a much different story of course). Mostly, I was just trying to throw a little bit of cold water on the big data movement/ note that one can’t simply get all the data one might possibly want with the click of a button (or even by hiring a programmer for a few hours). I’ve started a new project that draws heavily upon Facebook’s Graph API and Google’s App Engine, for example, and it’s taken me months just to clean the data (let alone analyze it).

    Chris Bail

    March 5, 2013 at 9:22 pm

  7. Twitter’s API is notorious about logging people out over repeated requests, even when repeated requests are necessary to keep up with the volume of new tweets. I think they are relaxing that some, but it’s still a major issue.

    Noah

    March 6, 2013 at 4:12 am

  8. Speaking as someone who’s been scraping/presenting/publishing from Twitter data for the past year and a half, Chris’s points about the challenges of big data are right on–but I suspect that a consensual big data method with standardized best practices will be hammered out rather quickly over the next couple of years. (Especially since many of the folks who are doing this are on Twitter sharing their ideas with each other in real time.)

    I think a more concerning limitation is the divide between “online” and “offline” researchers. Fabio’s post unintentionally provides an example of it, as he is essentially predicting a future that has been the present for at least five years now among us online folks. On the flip side, I’m still kind of amazed at how little social movement theory has made it into studies of social movements online–it’s more common for tech researchers to posit sexy new dynamics happening online than to pick up existing theories and show how they’re tweaked a bit as a result of digital mediation.

    What I hope will happen eventually is this divide will dissolve as sociologists increasingly realize that no matter what you study, part of the action is happening online and part of the action is happening offline, and it won’t do to simply bracket the one you don’t want to focus on, because they’re simply too permeable and influence each other too much. If you want the full account, you have to look at both. Then we can have a marriage (perhaps a tenuous one of convenience, but a union nevertheless) between online/offline theories/methods that will benefit all involved.

    Randy

    March 6, 2013 at 4:41 pm

  9. Great points. I have long wondered why sociology has been so slow to get on board with online research. Perhaps part of the explanation is concern about digital divides, though there are a variety of high quality studies which show such divides have decreased dramatically (e.g. DiMaggio). I also very much appreciate your call for more attention to the relationship between online and offline social action. It would be great to have some studies of precisely which type of people use social media when where and why. The Pew studies that are out there only give baseline figures on usage, which don’t really get at the deeper questions that would make for good sociology (e.g. presentation of self in public vs. private; role of social networks; transformation of institutions because of social media). To be fair, I don’t think other social science disciplines have made much progress in theorizing the relationship between online and offline social spaces- even if they’ve been much quicker to incorporate social media data into their studies. One exception may be Gary King, who has just written a brilliant paper on internet repression in China that uses very innovative automated data extraction methods to identify micro-blogs and then develops a model of whether and when they are shut down by the government. Still, the study relies almost exclusively upon online data. Does anyone else know of other good examples? Why are there so few ethnographies of the internet? Perhaps because internet research is seen as “too easy” or “lazy”? It would be particularly interesting to see a study that follows peoples activity online and offline, I think.

    Chris Bail

    March 7, 2013 at 5:11 pm

  10. Good question. This AJS piece is one of the better comparisons between off- and online interaction: http://www.jstor.org/discover/10.1086/590650?uid=2&uid=4&sid=21101785504661
    Polletta had a paper using message board data in the ASR too, I think.

    Soc blogger

    March 8, 2013 at 1:37 am


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 649 other followers

%d bloggers like this: