more tweets, more vote – q&a and erratum
This week, there has been substantial media coverage of the More Tweets, More Votes paper, which was presented on Monday at the ASA meeting in New York. Scholars and campaign professionals have been asking questions about the draft of the paper, which can be found here. Since we have received many requests and clarifications, I will address comments through this blog post.
1. Your tweets/votes R-squared is small. The correlation between tweets and votes is actually really small when compared with other factors (such as incumbency).
Commenters have asked about the size of the twitter correlation in comparison with other models. First, no claim was made about this issue and it not relevant to the major point of the paper. The point of the paper is that social media has important information. This information may be correlated with other data. However, we can compare the twitter bivariate correlation with other correlations. The twitter correlation with Republican vote margin, for example, is .53. Incumbency has a correlation of .73 with vote margin. The proportion of people with a college education has a correlation of .15. Thus, the twitter measure is in the middle of the range of the variables we look at.
2. 404 out of 406?: In your SSRN draft, the analysis does not predict the winner in 404 out of 406 competitive races, which is what Fabio Rojas said in the WaPo op-ed. (http://www.washingtonpost.com/opinions/how-twitter-can-predict-an-election/2013/08/11/35ef885a-0108-11e3-96a8-d3b921c0924a_story.html?wpisrc=emailtoafriend)
A number of commenters have asked about the number of correctly predicted races. In the original paper, we do not perform this analysis. For the purposes of presenting the research to the public, we computed the rate of correct predictions (within the data), which was about 92.5%. I then multiplied this by all races (435). Therefore, the extrapolated number of correctly predicted races is 404 out of 435. If we use only the contested race subsample, we get 375 races out of 406 contested races. This is a correction of what I wrote in the op-ed, which accidentally combined these two estimates. The op-ed now contains the correction.
3. You don’t predict an election. “[...] just in case someone is paying attention: You, Have, To, Predict, In, Advance. If you don’t want to follow my advice follow that of Lewis-Beck (2005):”the forecast must be made before the event. The farther in advance [...] the better”. Gayo-Avello (http://di002.edv.uniovi.es/~dani/PFCblog/)
Professor Gayo-Avello and other commenters have raised the issue of prediction. He is correct in that we didn’t use contemporary data to predict elections in the future. Rather, we use “predict” in the statistical sense. We use social media data to estimate a dependent variable within the sample.
4. The Pollyanna effect is unsubstantiated. There is no support to say negative tweets are a good thing for a candidate.
The Pollyana effect is merely a hypothesized explanation for what we find. It requires further research and study. We make no claim that it has been established.
5. Twitter user base is not representative of the population, self-selection bias, spam, propaganda, lack of geolocation of tweets.
A number of commenters have focused on the fact that we know little about the people who write tweets, nor do we estimate whether tweets are positive or negative. This is true, but the point of the paper is not to make an estimate of who people are, or to interpret what they say. Rather, it is simply to show that that social media contains informative signals of what people might do. Remarkably, the data shows a correlation even though Twitter users are not a random sample of the population. We are simply measuring the relative attention given to a political candidate.
6. Vote share is a more natural way than vote margin to analyze and present the results, as well as consistent with prior Political Science research. (http://themonkeycage.org/2013/04/24/the-tweets-votes-curve/)
Some readers noted that traditional political science uses vote share rather than vote margin. Our updated paper corrects that. The original paper is a non-peer reviewed draft. It is in the process of being corrected, updated, and revised for publication. Many of these criticisms have already been incorporated into the current draft of the paper, which will be published within the next few months.