orgtheory.net

more tweets, more votes: social media as a quantitative indicator of political behavior

bigtweet20102012

Unit of analysis: US House elections in 2010 and 2012. X-Axis: (# of tweets mentioning the GOP candidate)/(# of tweets mentioning either major party candidate). Y-axis: GOP margin of victory.

I have a new working paper with Joe DiGrazia*, Karissa McKelvey and Johan Bollen asking if social media data actually forecasts offline behavior. The abstract:

Is social media a valid indicator of political behavior? We answer this question using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composition. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit.

The working paper (short!) is here. I’d appreciate your comments.

Adverts: From Black Power/Grad Skool Rulz

* Yes, he’ll be in the market in the Fall.

About these ads

Written by fabiorojas

April 23, 2013 at 2:41 am

33 Responses

Subscribe to comments with RSS.

  1. Don’t forget that the largest coefficient in Ray C Fair’s classic paper predicting presidential outcomes are economic growth, and a lagged “economic optimism or pessimism” variable he made. I’d throw it in the pot. Also, do you lag the Twitter observations?

    Like

    Graham Peterson

    April 23, 2013 at 3:02 am

  2. Thanks, Graham, for the comments. 1. The data is lagged. Ie, all tweets from before the election. 2. We do have controls in the full model for district SES, partisanship, and incumbency. Economic variables by district are rare.

    Like

    fabiorojas

    April 23, 2013 at 3:11 am

  3. Take a peek at Fair’s paper. The story that unfolds is that if the economy is underperforming, people vote in a challenger. The trend is party-blind: the median voter seems to just want a change period when the pie isn’t growing. To use national level data for regional elections you have to assume that people are thinking about the economy on a national scale, and hoping that regional officials can have an incremental influence. I think that’s a good assumption, as evidenced by the revealed preference that people, as you note, don’t often measure and talk about regional economic performance.

    I’m also doing some hand waving about whether materially-felt economic impacts in terms of income are independent of dialogue about those feelings revealed in discourse variables. There is a map between those variables that we don’t have a good functional form for. Big endogeneity issue in trying to control for material variables in a textual regression.

    Like

    Graham Peterson

    April 23, 2013 at 3:24 am

  4. I do believe that argument and it’s consistent with other research. But in terms of this analysis, we’d need some more fine grained data. We have good data on socio-demographics, but no reliable data on economic sentiment at the local level.

    Like

    fabiorojas

    April 23, 2013 at 3:28 am

  5. We do have district level data on some economic measures in the model, but traditionally these aren’t frequently used in House models. Incumbency and baseline district partisanship get you most of the way there in terms of controls.

    Like

    Joe DiGrazia

    April 23, 2013 at 3:28 am

  6. In any case, we can see that twitter is capturing behavior, whether or not the actions are driven by exogenous factors. The paper shows foundational evidence for using Twitter to study politics and electoral outcomes.

    Like

    Karissa McKelvey

    April 23, 2013 at 3:30 am

  7. Yeah, you guys are probably capturing the effect with your incumbency controls. I suppose I’m arguing for raw economic controls because I have projects in mind where I set traditional economic indicators up against ideational variables and see which predict economic outcomes better — but that project has a totally different outcome of interest than elections.

    Like

    Graham Peterson

    April 23, 2013 at 3:34 am

  8. Have you seen the Livne et al. paper on the rise of the Tea Party on Twitter? It could be a useful point of reference.

    My concern with any study that uses Twitter data is that we do not yet understand the types of bias that lead people to Tweet about their political preferences. This is to say nothing of the various studies by Pew and other organizations which report that less than 9% of the U.S. population uses Twitter, and only 3% actively Tweet… Of those, there is apparently a strong bias towards younger users and Democrats (and minorities).

    The other major problem to my mind is the relative lack of info available from Twitter itself. Networks of followers are interesting of course, but can only go so far if one does not have a clear sense of basic demographics. I’ve seen some folks who try to impute these demographics using the geocodes, but only about 1 in 400 Twitter users enable GPS tracking… which means that one can only obtain such data for 1/400th of the 3 percent of the population that uses Twitter. To be sure, this sample is still larger than the average public opinion survey, but if the Pew data are correct, I think we should be concerned about precisely who these people are.

    Like

    Chris Bail

    April 23, 2013 at 2:53 pm

  9. @Chris Thanks for the feedback. These sorts of studies are what motivated the paper, actually. Another paper you could look at that shows conservative bias is a co-author of mine, Conover, et al. on Politlcal Polarization on Twitter.

    Despite these, and a variety of measured biases, Twitter –is– capturing something relevant over these elections and time spans, which makes the results interesting.

    Like

    krmckelv

    April 23, 2013 at 3:11 pm

  10. Cool I’ll have to check out your paper- in brief, though, what “is” the “something relevant”?

    Like

    Chris Bail

    April 23, 2013 at 3:13 pm

  11. @Chris Something we are putting together right now. Stay tuned for a couple more papers coming out this year with more theoretical underpinning, including why some other measures (like @handles) don’t work as well.

    My current hunch is that this is just an easier way to measure who is getting attention in politics — something that has been going on all along. Prehistoric social media like movies, radio, and TV seemed to work for Reagan, Sunny Bono, Schwarzenegger, and Al Frankin.

    Like

    krmckelv

    April 23, 2013 at 3:25 pm

  12. Sounds promising- I’ll look forward to it. I’d love to chat with you guys more (I’m now writing up a study of social media diffusion by non-profit/civil society organizations that I hope you might find interesting: http://www.findyourpeople.org). I am collecting insights data from Facebook fan pages that are surprisingly rich in terms of demographics, audience, and online behavior. I also collect a variety of measures of offline activity and integrate my app with a variety of Google APIs to collect contextual data.

    Like

    Chris Bail

    April 23, 2013 at 3:37 pm

  13. P.S. do you guys have gardenhose access or are you working with a 1% sample?

    Like

    Chris Bail

    April 23, 2013 at 3:39 pm

  14. @Chris – we do have garden hose access. We are arguing that Twitter is still a valid indicator of political behavior despite the fact that twitter users are not representative of the US population as a whole. While, the proportion of tweets a candidate receives relative to his or her opponent may not be equal to the proportion of votes he or she receives (due to the biases in twitter use), we still find that higher proportions of tweet share are reliably correlated with higher vote margins.

    Like

    Joe DiGrazia

    April 23, 2013 at 3:56 pm

  15. @ Joe- it’s a nice finding. I wonder whether the results would generalize to non-competitive districts, however. One could tell a story about how this selection drives the expression of political opinion, no? Also, since this is a new frontier, it would be nice to see more simple descriptive measures of what is going on (such as the percentage of the population that is tweeting, or the percentage of people mentioning republican candidate’s names as a percentage of the total voting-age population). It would also be nice to have campaign expenditures in your model. There are a variety of recent studies (such as Daniel Kriess’s book) that suggest most of the people tweeting are part of the campaigns. If you could show that campaign expenditures do not erase the finding, this would address this potential objection….

    Like

    Chris Bail

    April 23, 2013 at 4:17 pm

  16. “I wonder whether the results would generalize to non-competitive districts, however” – we actually address this issue explicitly in the paper. Short story: Twitter under performs as an indicator in some non-competitive districts.

    Like

    Joe DiGrazia

    April 23, 2013 at 4:25 pm

  17. Joe: like I told Rojas, that result is I think the most interesting in the whole paper. Really insightful stuff.

    Textual analysis seems to only pick up (or bias toward), by nature of the exercise, only social variation — moments of social change when people are self-consciously talking about and negotiating one meaning or another. Like the linguists have been saying for a long time, though, a great deal of social structure is tacitly embedded in the language itself. It seems like once you have a population that has successfully negotiated a consensus coordinating device, a consensus social norm — that is precisely the element of the culture you won’t be able to pick up in a text analysis. And those tacit consensus norms arguably constitute actually the majority of social structure.

    Like

    Graham Peterson

    April 23, 2013 at 4:34 pm

  18. Along with what Chris is saying, I am curious whether you ran the analysis with a nonscaled measure of tweets, since I would think that it’s easier to run up a high tweetshare in low-tweeting districts (i.e., older, whiter, Republican-er districts). I guess the most intuitive story to me is that tweet-share is proxying for savvy campaigning (which might actually be decoupled from expenditures) mediated by the demographics of a district.

    Like

    winston

    April 23, 2013 at 4:45 pm

  19. –which might also help explain why Twitter “under performs as an indicator” in less competitive districts since these are unlikely to have savvy social media campaigns

    Like

    winston

    April 23, 2013 at 4:55 pm

  20. @Joe- this squares with what I’ve seen in some analysis of voter turnout and twitter usage (as a percentage of the total population) in non-competitive regions of North Carolina and Ohio. Paradoxically, however, I’ve also seen some regions with high twitter usage and low turnout. Presumably you have turnout in the model as well? What do you find?

    One other thought- I’m aware of a few studies that show inconsistent Twitter use by U.S. lawmakers– I’ve also seen a study that shows Twitter usage is more prevalent among Democrats… This probably works in your favor, but it might be worth citing these studies.

    Like

    Chris Bail

    April 23, 2013 at 5:00 pm

  21. @winston- Might it be the opposite? I would expect twitter usage to be highest in non-competitive districts (specifically urban areas with highly educated populations, greater % democrat, and greater % minority.

    In any event, they probably won’t be able to come up with a measure of the Savvnyness of social media campaigns. One neat proxy for this might be to calculate centrality measures for the people doing the tweeting… if they are all following each other this would suggest some type of organizing effect, potentially.

    Like

    Chris Bail

    April 23, 2013 at 5:02 pm

  22. Sorry, I didn’t mean a measure of total tweets (twitter usage), but a nonscaled count of Republican tweets. So my gut explanation would be that smart campaigns tweet a lot in competitive districts, but this shows up as high Republican tweetshare only in districts with low twitter usage (hence more Republican). So what the correlation is catching is an interaction between savvy campaigning and the underlying political demographics of a district.

    Like

    winston

    April 23, 2013 at 5:11 pm

  23. @winston- gotcha. That definitely seems plausible.

    Like

    Chris Bail

    April 23, 2013 at 5:23 pm

  24. Hi Winston, first to clarify, our measures are not of tweets by candidates. They’re measures of people’s tweets about the candidates. Second, we also included another model where we use the number of users tweeting to construct our share measure rather than the number of tweets. We see no meaningful difference between the two models.

    Like

    Joe DiGrazia

    April 23, 2013 at 5:30 pm

  25. [...] more tweets, more votes: social media as a quantitative indicator of political behavior | orgtheory…. [...]

    Like

  26. great stuff guys… and I love that the new operating procedure on orgtheory is: “don’t feed the troll!”

    can’t wait to see what comes of the paper.

    Like

    james

    April 23, 2013 at 8:58 pm

  27. Can you elaborate on what you mean by that, james? Everyone here seems to be commenting on the paper – except you.

    Like

    Josef

    April 24, 2013 at 6:09 am

  28. Very cool study. What software did you use to collect and analyze the data?

    Like

    Amy

    April 24, 2013 at 1:52 pm

  29. We use traditional computer science tools over here to collect the data — good ol’ python, bash, and gzip.

    We then used R, sometimes stata, to analyze it.

    Like

    Karissa McKelvey

    April 24, 2013 at 3:37 pm

  30. Will you be making the data available? (Just the data behind the points in the graphs plus some identifiers, not the data you used to compute the points. So, x value, y value, year, candidate ID, party ID, state and district and other election info).

    Like

    Amy

    April 24, 2013 at 5:04 pm

  31. Once the paper is published, we plan on releasing the data. Thanks for asking.

    Like

    fabiorojas

    April 24, 2013 at 5:09 pm

  32. [...] “More tweets, more votes: social media as a quantitative indicator of political behavior” http://papers.ssrn.com/… via http://orgtheory.wordpress.com/… [...]

    Like

  33. [...] There is a robust conversation going on with one of the authors here. [...]

    Like


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 1,073 other followers

%d bloggers like this: