Archive for the ‘poll’ Category

polls: your hillary vs. trump predictions

Note: X = 52% means X gets 52% or higher but less that 53%.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street


Written by fabiorojas

May 6, 2016 at 1:42 am

iowa caucus poll 2016

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street 

Written by fabiorojas

January 29, 2016 at 3:07 am

reader poll 2014

It’s been a while. Let’s see who is reading this blog.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

November 17, 2014 at 1:57 am

Posted in fabio, poll

public access poll


On Wednesday, we discussed the ASA’s opposition to opposition to the Fed’s Open Access policy. What do you think?

50+ chapters of grad skool advice goodness: From Black Power/Grad Skool Rulz

Written by fabiorojas

January 24, 2014 at 12:07 am

Posted in academia, fabio, poll

the end of polling is near

In my Washington Post column, I discussed the possibility that social media data might displace the traditional political poll. After writing the column, I thought that I might have gone overboard. But after reading some recent research, I realized that I am really onto something. Recent research shows that social media data, when modeled correctly, does provide very good measurements of public opinion trends.

Nick Beauchamp is political scientist at Northeastern University. He has a new working paper called “Predicting and Interpolating State-level Polling using Twitter Textual Data.” This paper is the vital intermediate step between noticing that tweets correlate with votes  and using social media data by itself to forecast elections. The abstract:

Presidential, gubernatorial, and senatorial elections all require state-level polling, but continuous real-time polling of every state during a campaign remains prohibitively expensive, and quite neglected for less competitive states. This paper employs a new dataset of over 500GB of politics-related Tweets from the nal months of the 2012 presidential campaign to interpolate and predict state-level polling at the daily level. By modeling the correlations between existing state-level polls and the textual content of state-located Twitter data using a new combination of time-series cross-sectional methods plus bayesian shrinkage and model averaging, it is shown through forward-in-time out-of-sample testing that the textual content of Twitter data can predict changes in fully representative opinion polls with a precision currently unfeasible with existing polling data. This could potentially allow us to estimate polling not just in less-polled states, but in unpolled states, in sub-state regions, and even on time-scaled shorter than a day, given the immense density of Twitter usage. Substantively, we can also examine the words most associated with changes in vote intention to discern the rich psychology and speech associated with a rapidly shifting national campaign.

In other words, if you do some sensible model fits and combine with content analysis, social media time series mimic the trends produced by polls. The next step is obvious: combine election results and social media data, model the error, and if the results are reasonable, you will no longer need big polls.

Adverts: From Black Power/Grad Skool Rulz

Written by fabiorojas

October 23, 2013 at 12:01 am

more tweets, more votes – media summary

If you are interested in reading the media coverage of More Tweets, More Votes, here are the links to selected coverage:

Thanks for checking in.

Adverts: From Black Power/Grad Skool Rulz

Written by fabiorojas

August 19, 2013 at 12:03 am

more tweets, more vote – q&a and erratum

This week, there has been substantial media coverage of the More Tweets, More Votes paper, which was presented on Monday at the ASA meeting in New York. Scholars and campaign professionals have been asking questions about the draft of the paper, which can be found here. Since we have received many requests and clarifications, I will address comments through this blog post.

1. Your tweets/votes R-squared is small. The correlation between tweets and votes is actually really small when compared with other factors (such as incumbency).

Commenters have asked about the size of the twitter correlation in comparison with other models. First, no claim was made about this issue and it not relevant to the major point of the paper. The point of the paper is that social media has important information. This information may be correlated with other data. However, we can compare the twitter bivariate correlation with other correlations. The twitter correlation with Republican vote margin, for example, is .53. Incumbency has a correlation of .73 with vote margin. The proportion of people with a college education has a correlation of .15.  Thus, the twitter measure is in the middle of the range of the variables we look at.

2.  404 out of 406?: In your SSRN draft, the analysis does not predict the winner in 404 out of 406 competitive races, which is what Fabio Rojas said in the WaPo op-ed. (

A number of commenters have asked about the number of correctly predicted races. In the original paper, we do not perform this analysis. For the purposes of presenting the research to the public, we computed the rate of correct predictions (within the data), which was about 92.5%. I then multiplied this by all races (435). Therefore, the extrapolated number of correctly predicted races is 404 out of 435. If we use only the contested race subsample, we get 375 races out of 406 contested races. This is a correction of what I wrote in the op-ed, which accidentally combined these two estimates. The op-ed now contains the correction.

3. You don’t predict an election. “[…] just in case someone is paying attention: You, Have, To, Predict, In, Advance. If you don’t want to follow my advice follow that of Lewis-Beck (2005):”the forecast must be made before the event. The farther in advance […] the better”. Gayo-Avello (

Professor Gayo-Avello and other commenters have raised the issue of prediction. He is correct in that we didn’t use contemporary data to predict elections in the future. Rather, we use “predict” in the statistical sense. We use social media data to estimate a dependent variable within the sample.

4. The Pollyanna effect  is unsubstantiated. There is no support to say negative tweets are a good thing for a candidate.

The Pollyana effect is merely a hypothesized explanation for what we find. It requires further research and study. We make no claim that it has been established.

5. Twitter user base is not representative of the population, self-selection bias, spam, propaganda, lack of geolocation of tweets.

A number of commenters have focused on the fact that we know little about the people who write tweets, nor do we estimate whether tweets are positive or negative. This is true, but the point of the paper is not to make an estimate of who people are, or to interpret what they say. Rather, it is simply to show that that social media contains informative signals of what people might do. Remarkably, the data shows a correlation even though Twitter users are not a random sample of the population. We are simply measuring the relative attention given to a political candidate.

6. Vote share is a more natural way than vote margin to analyze and present the results, as well as consistent with prior Political Science research. (

Some readers noted that traditional political science uses vote share rather than vote margin. Our updated paper corrects that. The original paper is a non-peer reviewed draft. It is in the process of being corrected, updated, and revised for publication. Many of these criticisms have already been incorporated into the current draft of the paper, which will be published within the next few months.

Adverts: From Black Power/Grad Skool Rulz

Written by fabiorojas

August 16, 2013 at 8:27 pm