incumbents, transparency, and social media data

At last week’s PLEAD conference on social media and political processes, Alex Hanna tweeted a summary of a talk by Mark Huberty of UC Berkeley political science, which raised some questions about using social media data to forecast electoral results. Alex suggested that we could have a good discussion about Mark’s talk. In these comments, I rely on Alex’s summary. If I mis-characterized a point, please email me or correct me in the comments.

1. Huberty noted, correctly, that incumbency highly correlates with electoral wins. The implication is that social media data is not valuable, or important, or accurate, because incumbency accounts for a lot of the variance in electoral outcomes.

Well, it depends on what your goals are. If you are making a claim that “A causes B”, then finding out that C account for much of the variance is extremely important. It shows that A isn’t causing B. However, if your claim is that “A is a decent measurement of B,” then finding out that C is a strong correlate of B is simply irrelevant.  The claim isn’t about what is some fundamental cause of B, just what tracks with B.

Different claim, different standard of proof. That’s we care about polls. Incumbency predicts elections better than polls, but as long as we don’t claim that polls cause election outcomes, we remain satisfied with the well documented correlation between voter surveys and final votes.

Also, incumbency is not a reasonable variable to benchmark against because incumbency is simply a word for “the person who won last time in the same election with a very similar group of voters.” As good social scientists know, a lot of human behavior is seriously auto-correlated. What I ate yesterday is the best predictor of what I’ll eat tomorrow. Politics is no different.

Thus, in a lot of social science, we aren’t interested in these sorts of time series because we know that answer already. X_t is almost certainly strongly correlated with X_t-1. The interesting question is why the time series is X_1, X2,… and not Y_1, Y_2, … Similarly, we might interested in “extracting a signal” from some new source of data to help us measure X_i or build a causal explanation that doesn’t fall back on trivial auto-correlated time series explanations. In other words, “The guy is an incumbent because there are a lot Black voters” is a much more meaningful statement than “The guy won this time because he won last time.”

That is ultimately why I remain interested in social media and electoral outcomes. Social media is a record of what people think that is different  than polls and traditional print or broadcast media. It deserves a serious examination as a signal. And given the work by Huberty himself, Tusmajan, Juengher, Beuchamp, the Indiana group, and others, the “social media as measurement of political sentiment” hypothesis is important and, as far as I can tell, supported to varying degrees by the Twitter data. Incumbency is a non-issue as long as researchers and political professionals avoid claims of causation.

2. Alex also indicated that Mark Huberty was concerned about how social media data is created. Here, I also agree. Transparency is important.  All data is imperfect – people lie on polls, surveys has selection biases, etc. There is a discussion about the properties of the samples that Twitter produces for researchers that might lead one to think that there might be an issue. The more we know about the way social media samples are generated, the better.

Still, the issue is *how much* of a problem this is.  On this point, I urge Mr. Huberty to be bluntly empirical.The blunt empiricist, I would argue, would just put it to the test. The empiricist would look for natural experiments in the data (transparent data vs. others) or well chosen comparisons to see how much it affects the social media-vote correlation. Rather than point to possible problems, research would actually identify them. It might not matter, or it might be a big deal. Let’s figure it out!

Your path to success: From Black Power/Grad Skool Rulz

Written by fabiorojas

November 5, 2013 at 12:01 am

3 Responses

Subscribe to comments with RSS.

  1. Here’s the actual Huberty piece (paywalled, unfortunately):



    November 5, 2013 at 12:10 am

  2. Of course, if one finds that social media posts are caused by incumbency (name recognition, patronage, hired tweeters, or other mechanisms); that is, A causes C, then tracking social media is not warranted. On the other hand, if A,B, and C have a common cause, that would be most interesting.



    November 5, 2013 at 10:02 pm

  3. Au contraire, Randy. There are tons of races without incumbents – primaries, new districts, special elections, and term limited seats. Additionally, it is of real scientific interest to see how talk tracks with votes.


    Fabio Rojas

    November 6, 2013 at 3:37 am

Comments are closed.

%d bloggers like this: