Archive for the ‘mere empirics’ Category
A few years ago, I was in the RWJ public health research seminar at Michigan, when I raised my hand and said something like this: “According to these tables, high BMI individuals are *not* at higher risk for mortality or morbidity. Maybe obesity isn’t the problem we think it is.” Silence ensued. And then, in a very British manner, the conversation just moved on, as if the little boy at the end of the table made a walrus face with the carrot sticks.
Turns out that I am not crazy. In Time magazine article, sociologist Abigail Saguy reviews recent research on weight and health and finds that BMI, and other measures of weight, are not well correlated, if at all, with health:
Yes, there are certain health risks associated with having an elevated BMI, such as type 2 diabetes and heart disease. More broadly, a higher BMI is associated with a greater risk of cardiometabolic abnormalities, as measured by blood pressure, triglycerides, cholesterol, glucose, insulin resistance and inflammation. Nonetheless, almost one quarter of “normal weight” people also have metabolic abnormalities, and more than half of “overweight” and almost one third of “obese” people have normal profiles, according to a 2008 study. That’s 16 million normal-weight Americans who have metabolic abnormalities and 20 million obese (or 56 million overweight and obese) Americans who have no such abnormalities.
If the AMA’s goal is to address the serious diseases of type 2 diabetes and heart disease, it would be more productive and accurate for the association to urge doctors to focus on cardiometabolic risk, recognizing that there are both metabolically healthy and metabolically unhealthy individuals in all categories of weight. Rather than promote weight loss per se, doctors should instead encourage their patients of all sizes to incorporate physical activity and a balanced diet into their lives.
In other words, doctors committed some serious selection bias errors when linking obesity with morbidity. Then, they simply ignored the evidence that was obvious from mortality tables.
Now, this doesn’t mean that one should ignore weight. Personally, I feel much better having lost some weight. I also engage in modest exercise. But the health policy point is that individual health simply isn’t correlated with weight. Instead, it’s about diet and cardiovascular health.
The Tesla has attracted a great deal of attention because it has achieved an important technical breakthrough – a fully charged battery will support 300 (!) miles of driving. In other words, daily charging is enough for most people most of the time. That’s a huge breakthrough – the Nissan Leaf only promises about 100 miles per full charge, which a lot of people would use up just commuting.
Here’s a question – what allowed Tesla to pull this off? A few hypotheses:
- Luck. Tesla isn’t any different, it just so happened that the engineers got lucky.
- Tweaking. Tesla just kept tweaking a design that was already there. Maybe they just work a bit faster, or they had more money to throw at the problem.
- Semi-marginality. Tesla is not tied to the auto industry, so it is easier for them to think outside the box.
Anyone have insight on this? Other theories?
At Pacific Standard Time, an article about interesting state politics research. They list 10 cool findings. My favorite:
05. Members of the California Assembly from moderate districts tend to give moderate answers on political surveys. However, they still largely vote the same as the most extreme members of their parties. (Jim Battista, Josh Dyck, and Megan Gall).
Check it out.
A recent report on software quality:
As projects surpass one million lines of code, there’s a direct correlation between size and quality for proprietary projects, and an inverse correlation for open source projects. Proprietary code analyzed had an average defect density of .98 for projects between 500,000 – 1,000,000 lines of code. For projects with more than one million lines of code, defect density decreased to .66, which suggests that proprietary projects generally experience an increase in software quality as they exceed that size.
In other words, open source works for small projects, but proprietary big projects have better quality. Why? A few hypotheses:
- Incentives – maybe the joy of fixing code is simply washed out for big projects. You would only bother if there was a pay off.
- Teams & management – maybe large projects simply require large teams of dedicated people, which is hard to do in the open source world.
- Selection – maybe for-profits only will support a project if it is easy to maintain and thus reduce costs.
A number of people have asked me a very important question about the More Tweets, More Votes paper. Do relative tweet rates merely correlate with elections or is there is a causal link?
The paper itself does not settle the issue. The purpose of the paper is merely to document this striking correlation. Given that qualification, let me explain the argument from both sides and my priors.
- Correlation: Twitter is a passive record of how excited people are. If a candidate somehow garners the attention of the public, they get excited and start talking about it, which translates into a higher twitter presence.
- Causal: The unusual attention that a candidate attracts in social media sways undecided or weakly committed voters. In a sense, highly active twitter users are the “opinion leaders” of modern society.
My prior: 75% correlation, 25% cause. How would tease out these arguments? For example, what variable could instrument the district level tweet counts? Interesting to find out.
The Chronicle of Higher Education features a study of valedictorians and finds that class background affects where they apply to college:
Poorer students remain underrepresented at America’s top colleges, research has shown. And their academic preparation isn’t the only reason, according to Radford’s study of valedictorians, who should be considered well-prepared.
“Less-affluent valedictorians were less likely to know someone who had enrolled in a most selective institution and thus had a harder time envisioning their own attendance,” Radford wrote in a summary of her research.
The theme of the research association’s meeting this year was “Education and Poverty.” And Radford was among many who presented research on class inequity in higher education, which academics say remains deeply problematic at most colleges. Her study comes at a time of increased focus on how, despite plenty of outreach efforts, much of the talent at low-income high schools isn’t getting recruited to top colleges.
Radford worked with data from the High School Valedictorian Project, a survey of 900 class valedictorians who graduated from public high schools between 2003 and 2006. She also drew from 55 in-depth interviews with the students. The University of Chicago Press soon will publish a book by Radford on her findings.
This is probably one of the key findings of recent stratificiation research. Class doesn’t affect life course only through material resources, but by changing the habitus.
Unit of analysis: US House elections in 2010 and 2012. X-Axis: (# of tweets mentioning the GOP candidate)/(# of tweets mentioning either major party candidate). Y-axis: GOP margin of victory.
I have a new working paper with Joe DiGrazia*, Karissa McKelvey and Johan Bollen asking if social media data actually forecasts offline behavior. The abstract:
Is social media a valid indicator of political behavior? We answer this question using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composition. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit.
The working paper (short!) is here. I’d appreciate your comments.
* Yes, he’ll be in the market in the Fall.
To summarize: Richard Biernacki claims that coding textual materials (books, speech, etc) is tantamount to committing gross logical errors that mislead social scientists. Overall, I think this point is wrong but I think that Reinventing Evidence does a great service to qualitative research by showing how coding of texts might be critiqued and evaluated. In other words, ironically, by critiquing prior work on text coding, Biernacki draws our attention to the fact that qualitative research can be subjected to the same standards as quantitative research.
What do I mean? Well, a big problem with qualitative research is that it is very hard to verify and replicate. It is rare when ethographers go to the same field site, or informants are re-interviewed by others. A lot of the strength of quantitative research lies in the fact that other researchers can replicate prior results. For example, if I claim that party ID is correlated with gay marriage attitudes in the GSS, another researcher can download the same data and check the work. If they think the GSS made a mistake in collecting the data, a second survey can be conducted.
Biernacki, in trying to prove that coding qualitative data is pointless, follows a similar strategy by choosing a few articles of note and then he tries to reproduce the results. For example, he chooses Bearman and Stovel’s “Becoming a Nazi: A Model for Narrative Networks” which appeared in Poetics. The article creates a network out of ideas and themes mentioned from the memoir of a Nazi. Assuming that Biernacki reports his results correctly, he’s persuaded me that we need better standards for coding text. For example, he finds that Bearman and Stovel use an abbreviated version of the memoir – not the whole thing. Big problem. Another issue is how the network of text is interpreted. In traditional social network analysis, centrality is often thought to be a good measure of importance. Biernacki makes the reasonable argument that this assumption is flawed for texts. Very important ideas can become “background,” which means they are coded in a way that results in a low centrality score. This leads to substantive problems. For example, the Nazi mentions anti-semitism briefly, but in important ways. Qualitatively we know it is important, but the coding misses this issue.
Next week, I’ll get to my views on Biernacki’s attack on coding. But for now, I’ll give him credit for drawing my attention to these issues. The problems with the coding of the Nazi memoir point to me that there is more work to be done. We need to first start with a theory of text and then build techniques. If you want to use network analysis, you may have to take into consideration that standard network ideas may not be suitable. That will help us address problems like how to judge a text and the way we code data. That may not be the lesson Biernacki intended, but it’s a good one.
A few weeks ago, Senator Tom Coburn of Oklahoma tried to ban the NSF from supporting political science research. And of course, a lot of folks in the academy voiced their objection. But there’s a broader question for political science: Why is the political science profession so reliant on NSF funding? Repeatedly, people said that a majority of political science projects are funded with NSF funds. Is this true? If so, then it is a precarious state of affairs.
Academic disciplines should rely on a diverse group of supporters. If Congress deems social science a worthy effort, then great. But if they don’t, then we should still be ok. Relying on the NSF is analogous to a business having a single wealthy customer. That’s usually a bad business model. Instead, social scientists should actively court different sources of funding ranging from the public sector, non-profits, individuals, and the corporate world. If you look at sociology, you see many important projects funded by all kinds of folks. The General Social Survey is your typical big project funded by the NSF. Ron Burt obtained a lot of his data from private consulting gigs. Merton’s reference group research was done for the Dept of War during WWII. A lot of Columbia sociology in the 50s and 60s was sponsored by for-profit groups in New York.
It is up to each researcher to decide what kind of funding they are willing to pursue. But collectively, we should encourage funding from many sources, or we’ll be at the mercy of the Tom Coburns of the world.
I have a new article coming out in the journal Public Opinion Quarterly. It’s called “Correcting for Postmortem Bias in Longitudinal Surveys.” I think you might enjoy it. Here’s the abstract:
Ever since the pioneering work of Heckman (1965), social scientists have been acutely aware of selection bias in surveys. This is especially true for longitudinal studies where respondents may drop out of the survey after the first wave due to deceasement. This paper proposes a solution to this problem through postmortem follow up interviews. We illustrate this technique with the Panel Survey of Income Dynamics. Through postmortem interviews, we are able to increase the response rates by approximately 15% and we can identify biases in the original PSID data. For example, we find that employment was over reported in the original PSID. Interestingly, we find that voting increased once the dead were included, an effect attributable to deceased respondents in the Chicago metropolitan statistical area. These results show the promise of including the dead in all future social research. This study was supported by NSF grant 123-666.
Check it out.
Think Progress has been digging further into the back story behind the Regnerus/gay parents paper. The news site got one of the study’s funders to admit that the conclusion was predetermined:
Tellez confirmed to The American Independent that he was referring to same-sex marriage cases. In April 2011 — a year before the study was complete — Tellez wrote in a letter that “we are confident that the traditional understanding of marriage will be vindicated by this study as long as it is done honestly and well.” He also suggested that no prior study had properly compared children raised by a mother and father and those “headed by gay and lesbian couples, but of course the Regnerus study doesn’t even do that.
The study was submitted for publication in February 2012 before Regnerus had even completed all of the data collection and accepted just six weeks later, while many other articles published in the same issue took a year between submission and acceptance. Peer review was similarly hurried, with one social demographer admitting that he only had two weeks to review the study and offer a commentary — without even having access to all the data.
Previous Regnerus discussion on orgtheory.
Salary.com had one of those lists of majors that don’t pay very well. #8? You guessed it – sociology:
People who enter the field of sociology generally are interested in helping their fellow man. Unfortunately, that kind of benevolence doesn’t usually translate to wealth. Here are three jobs commonly held by sociology majors (click on job title and/or salary for more info):
… social worker
… corrections officer
… chemical dependency counselor
This is one of those cheesy magazine articles on careers, but it is consistent with prior research on college majors and income. Sociology is a feeder into service professions. That’s a good thing, though I do wonder how my sublime lectures on the differences between structuralism and post-structuralism help people get off of drugs.
When I posted the Sociology Department Rankings for 2013 I joked that Indiana made it to the Top 10 “due solely to Fabio mobilizing a team of role-playing enthusiasts to relentlessly vote in the survey. (This is speculation on my part.)” Well, some further work with the dataset on the bus this morning suggests that the Fabio Effect is something to be reckoned with after all.
The dataset we collected has—as best we can tell—635 respondents. More precisely it has 635 unique anonymized IP addresses, so probably slightly fewer actual people, if we assume some people voted at work, then maybe again via their phone or from home. Our 635 respondents made 46,317 pairwise comparisons of departments. Now, in any reputational survey of this sort there is a temptation to enhance the score of one’s own institution, perhaps directly by voting for them whenever you can (if you are allowed) or more indirectly by voting down potential peers whenever you can. For this reason some reputational surveys (like the Philosophical Gourmet Report) prohibit respondents from voting for their employer or Ph.D-granting school. The All our Ideas framework has no such safeguards, but it does have a natural buffer when the number of paired comparisons is large. One has the opportunity to vote for one’s own department, but the number of possible pairs is large enough that it’s quite hard to influence the outcome.
It’s not impossible, however.
Sullivan is shocked that the level is high. A few comments: First, the rejection of the war has been stable since around 2006. Second, my hypothesis is that there is a baseline level of support for the war. These are mostly strong conservatives who identify with the Republican party. Third, the wording of the question probably inflates the answer a little. Do you think the United States made a mistake? A lot of people don’t like admitting their country is ever wrong. Sullivan’s post even notes that the support for the war drops in a different wording – whether the war was worth the cost. A bit more impersonal. Bottom line: People know things went badly in Iraq, but nationalism suppresses the feeling.
Update: I updated these analyses (fixing the double-counting problem). The results changed a little, so reload to see the new figures.
Last week we launched the OrgTheory/AAI 2013 Sociology Department Ranking Survey, taking advantage of Matt Salganik’s excellent All Our Ideas service to generate sociology rankings based on respondents making multiple pairwise comparisons between department. That is, questions of the form “In your judgment, which of the following is the better Sociology department?” followed by a choice between two departments. Amongst other advantages, this method tends to get you a lot of data quickly. People find it easier to make a pairwise choice between two alternatives than to assign a rating score or produce a complete ranking amongst many alternatives. They also get addicted to the process and keep making choices. In our survey, over 600 respondents made just over 46,000 pairwise comparisons. In the original version of this post I used the Session IDs supplied in the data, forgetting that the data file also provides non-identifying (hashed) IP addresses. I re-ran the analysis using voter-aggregated rather than session-aggregated data, so now there is no double-counting. The results are a little cleaner. Although the All Our Ideas site gives you the results itself, I was interested in getting some other information out of the data, particularly confidence intervals for departments. Here is a figure showing the rankings for the Top 50 departments, based on ability scores derived from a direct-comparison Bradley-Terry model.
The model doesn’t take account of any rater effects, but given the general state of the U.S. News ranking methodology I am not really bothered. As you can see, the gradation looks pretty smooth. The first real “hinge” in the rankings (in the sense of a pretty clean separation between a department and the one above it) comes between Toronto and Emory. You could make a case, if you squint a bit, that UT Austin and Duke are at a similar hinge-point with respect to the departments ranked above and below them. Indiana’s high ranking is due solely to Fabio mobilizing a team of role-playing enthusiasts to relentlessly vote in the survey. (This is speculation on my part.)
While we’re running our Crowdsourced Sociology Rankings, people have been looking a little more closely at the U.S. News and World Report rankings. Over at Scatterplot, Neal Caren points out that U.S. News’s methods page has some details on the survey sample size and response rates. They’re bad:
Surveys were conducted in fall 2012 by Ipsos Public Affairs … Questionnaires were sent to department heads and directors of graduate studies (or, alternatively, a senior faculty member who teaches graduate students) at schools that had granted a total of five or more doctorates in each discipline during the five-year period from 2005 through 2009, as indicated by the 2010 "Survey of Earned Doctorates." … The surveys asked about Ph.D. programs in criminology (response rate: 90 percent), economics (25 percent), English (21 percent), history (19 percent), political science (30 percent), psychology (16 percent), and sociology (31 percent). … The number of schools surveyed in fall 2012 were: economics—132, English—156, history—151, political science—119, psychology—246, and sociology—117. In fall 2008, 36 schools were surveyed for criminology.
So, following Neal, this tells us the Sociology rankings are based on a survey of 117 Heads and Directors with a response rate of 31 percent, which is thirty six people in total. For Economics you have 33 people, for History 29 people, for Political Science 36 people, for Psychology 40 people, and for English 33 people. The methods page also notes that they calculate the scores using a trimmed mean, so they throw out two observations each time (the highest and the lowest). The upshot is that the average score of a department is likely to have rather wide confidence intervals.
But, don’t let all that get in the way of contemplating the magic numbers. The press releases from strongly-ranked departments are already coming thick and fast.
Update: These numbers are too low. Read on.
I guess it’s possible that U.S. News *might* mean that the *effective* N of, e.g., the Sociology survey is 117, and that’s the result of a larger initial survey which yielded a 31 percent response rate. On that interpretation they they initially contacted 378 departments (or thereabouts). That would be a non-standard way of describing what you did. Normally, if you give a raw number for the sample size and tell us the response rate, the raw number is the N you began with, not the N you ended up with. A quick check of the Survey of Earned Doctorates suggests that there were 167 Ph.D granting Sociology programs in the United States in 2010, which suggests that 117 is about right for the number who had awarded five or more in the past five years. Same goes for Economics, which has 179 Ph.D programs in the 2010 SED. Then again, the wording in the methods can also be read as saying every department might have received two surveys (“Questionnaires were sent to department heads and directors of graduate studies … at schools that had granted a total of five or more doctorates … during the five-year period from 2005 through 2009″). Looking again at the available SED data for 2006 to 2010 (one year off the USNWR dates, unfortunately), I found that 115 Sociology Departments met the stated criteria of having awarded five our more doctorates in the previous five years. If both the Dept Head and DGS in all those departmetns got a survey, this makes for an initial maximum N of 230, which is still quite far from the 378 or so needed, if 117 is supposed to mean the 31 percent who responded rather than the total number initially surveyed.
It seems like the most plausible interpretation is that for Sociology the number of schools surveyed is in fact 117, that every school received two copies of the questionnaire (one to the Head, one to the DGS or equivalent), but that the 31 percent response rate means “schools from which at least one response was received”, and so the total N surveys for Sociology is somewhere between 36 and 72 people, with a similar range of between 30 and 80 for the other departments.
Update: While I was offline dealing with other things, then looking at the SED data I’d downloaded, then writing the last few paragraphs above, I see others have come to the same conclusion as I do here by more direct and informed means.
As many of you are by now aware, U.S. News and World Report released the 2013 Edition of its Sociology Rankings this week. I find rankings fascinating, not least because of what you might call the “legitimacy ratchet” they implement. Winners insist rankings are absurd but point to their high placing on the list. Here’s a nice example of that from the University of Michigan. The message here is, “We’re not really playing, but of course if we were we’d be winning.” Losers, meanwhile, either remain silent (thus implicitly accepting their fate) or complain about the methods used, and leave themselves open to accusations of sour grapes or bad faith. They are constantly tempted to reject the enterprise and insist they should’ve been ranked higher, and so end up sounding like the apocryphal Borscht Belt couple complaining that the food here is terrible and the portions are tiny as well.
The best thing to do is to implement your own system, and do it better, if only to introduce confusion by way of additional measures. Omar Lizardo and Jessica Collett have already pointed out that U.S. News decided to cook the rankings by averaging the results from this year’s survey with the previous two rounds. They provide an estimate of what the de-averaged results probably looked like. Back in 20011, Steve Vaisey and I ran a poll using Matt Salganik’s excellent All Our Ideas website, which creates rankings from multiple pairwise comparisons. It’s easy to run and generates rankings with high face validity in a way that’s quicker, more fun, and much, much cheaper than the alternatives. So, we’re doing it again this year. Here is OrgTheory/AOI 2013 Sociology Department Ranking Survey. Go and vote! Chicago people will be happy to hear can vote as often as you like. So, participate in your own quantitative domination and get voting.
A few weeks ago, I argued that the era of overt racism is over. One commenter felt that I needed to operationalize the idea. There is no simple way to measure such a complex idea, but we can offer measurements of very specific processes. For example, I could hypothesize that it is no longer to legitimate to use in public words that have a clearly derogatory meaning, such as n—— or sp–.*
We can test that idea with word frequency data. Google has scanned over 4 million books from 1500 to the present and you can search that database. Above, I plotted the appearance of n—– and sp—, two words which are unambiguously slurs for two large American ethnic groups. I did not plot slurs like “bean,” which are homophones for other neutral non-racial words. Then, I plotted the appearance of the more neutral or positive words for those groups. The first graph shows the relative frequencies for African American and Latino slurs vs. other ethnic terms. Since the frequency for Asian American slurs and other words is much lower, they get a separate graph. Thus, we can now test hypotheses about printed text in the post-racial society:
- The elimination thesis: Slurs drop drastically in use.
- The eclipse thesis: Non-slur words now overwhelm racist slurs, but racist slurs remain.
- Co-evolution: The frequency of neutral and slur words move together. People talk about group X and the haters just use the slur.
- Escalation: Slurs are increasing.
This rough data indicates that #2 is correct. The dominant racial terms are neutral or positive. Most slurs that I looked up seem to maintain some base level of usage, even in the post-civil rights era. The slur use level is non-zero, but it is small in comparison to other words so it looks as if it is zero. Some slure use may be derogatory, while some of it may be artistic or “reclaiming the term.” I can’t prove it, but I think Quentin Tarantino accounts for for 50% or more of post-civil rights use of the n-word.
Bottom line: Society has changed and we can measure the change. This doesn’t mean that racial status is no longer important, but it does mean that one very important aspect of pre-Civil Rights racist culture has receded in relative importance. Some people just love racial slurs, but that its likely not the modal way of talking about people. Is that progress? I think so.
* Geez, Fabio, must you censor? Well, it isn’t censoring if it’s voluntary. I just don’t want this blog to be picked up for slurs. Even my book on 1970s Black Power, when people used the n-word a bit, only uses it once, in a footnote when referring to the title of H. Rap Brown’s first book.
Like most of us in the world of organization studies, I was saddened to hear of Michael Cohen’s passing. I only met him once and he was very gracious. In the spirit of his work, let me me draw your attention to his last research project – an analysis of “handoffs.” The issue is that doctors can’t continuously watch patients. Whenever a doctor leaves to go home, a new doctor comes in and there is a “handoff.” Cohen wrote a nice summary for the Robert Wood Johnson Foundation website:
1. To be effective, a handoff has to happen.
It may seem incredibly commonplace, but all too often preventable injuries or even deaths trace back to handoffs that were abbreviated, conducted in awkward conditions, or downright skipped. The easy cases to identify are things like leaving before handoff is done, or rushing the handoff in order to get out the door.
Unfortunately, many other causes are also in play. Some major examples derive from schedule or workload incompatibilities. If patients are sent from the PACU (post-anesthesia care unit) to a floor unit during its nursing report, the nurses accepting the patients will necessarily miss out on the handoff of existing patients. If a patient is moved from the Emergency Department (ED) before her doctor or nurse has time to complete phone calls to the destination unit, the patient endures some period of having been transferred without benefit of handoff. If there is a shift change in the ED just before a patient moves, the handoff is conducted by a doctor or nurse who has only second-hand familiarity with the events. To improve handoffs, we may need to teach participants to think about the organizational structures that make it hard to do them well.
I am currently working on a super cool project and I was thinking about the following distinction: modelling of data vs. prediction with data. If you give data to a physical science or engineering type, then they want prediction. They want to come up with an accurate prediction of some future state. You want tiny errors. In contrast, most social scientists are interesting in modelling general trends. We understand that statistical models have error terms, so prediction is inherently hard. It’s even beside the point in some sense. If X perfectly predicts Y, you’ve probably just measured the same thing twice. Instead, you want an imperfect, but unexpected, relationship between variables. Neither approach is wrong, but they do represent different philosophies of data analysis.
Scatterplot has a discussion on one my favorite topics, low response rates. The observation is that political polls have low response rates, but they produce decent answers, contrary to standard sociological advice. For years, I have argued that response rates do not logically entail biased data. It is simply a logical fallacy to deduce that survey data is biased only because of the response rate. Two examples that show the logical fallacy of deducing bias from response rates alone:
- High response rate, very biased: Let’s say that I fielded a survey that everyone responded to, except for Jews. They didn’t respond at all because I printed a swastika on the envelope. Every single Jewish respondent just threw it in the trash. The result? A response rate of about 97%. High response rate? Yes – textbook perfect. Bias? Yes – any question regarding Judaism (e.g., is R Jewish?) will be biased.
- Low response rate, no bias: Let’s say that I fielded a survey on Oct 1, 2012 in New York City. Say all 1,000 people who got the survey responded. Great! On October 21, I decide to use research funds to draw an extra sample of 9,000 names and send them the same survey. Oh no! Hurricane Sandy hits and nobody responds. Response rate? 10%. Biased? No – because not responding was a random event. The people in wave 1 were randomly chosen.
The issue isn’t the response rate – it’s selection into the study. If selection is correlated with the data (a religion survey that alienates a religious group), then the data is biased. If selection is random, then you have no bias. Selection biases can occur or not occur over the range of response rates from 1% to 99%.
Ok, you say, but maybe it’s not a logic issue. Sure, logically low response rate doesn’t *have* to lead to lead to bias. But in practice, low response is empirically related to bias. May low response rates means only really weird people answer the phone or send back the survey.
This is actually a fair point, but it’s wrong. You see, the bias-low response rate connection is an assumption that can be tested. And guess what? Public opinion researchers have actually tested the assumption through a number of studies. For example, Public Opinion Quarterly in 2000 published the results of an experiment where a survey was run twice. The first time, you just let people do whatever they want (response rate 30%). The second time, you really, really bug people (response rate 60%). The result? Same answers on both surveys. Follow up studies often find the same result.
In fact, in discussing this issue with John Kennedy, our recently retired director of survey research, I found out that this is an open secret among survey professionals. Response rates are a completely bogus measure of bias in survey data. It’s a shame that social scientists have held on to this erroneous belief, despite the work being done in public opinion research.
A graduate student asked me if the following sources for Congressional district voting data are reliable:
The only book for PhD students: Grad Skool Rulz
There’s a statistical (!) twitter fight this evening – Jennifer Rubin tweets “when do we break it to them that averaging polls is junk?” Hilarity ensues. There are actually some important subtle points about averaging poll data:
- Averaging bad data doesn’t make it better. On this broad point, Rubin is correct.
- Averaging good data does help. The purpose is to not be swayed by outliers that are produced by sampling. If you want to know the average family income in the US, you should average things so you won’t be swayed by the time Bill Gates appeared in the sample. If you believe that the typical polling firm is doing a decent job, it’s actually intuitive to average multiple polls.
- There’s actually research showing that poll averages close to the election aren’t terribly far off from the actual final numbers. See Nate Silver’s review on the subject.
A few days ago, I noted that Obama is slightly behind in the polls mainly because of the South. If it weren’t for the South, Obama would easily have about 51% of the vote in rest of the country. Kieran went back and compared the October Gallup polls in 2008 and 2012 to produce this picture:
You’ll hear all kinds of post-hoc explanations of the election outcome in November. But they’re probably wrong unless they start with the fact that the South really, really, really hates Obama more than the rest of the country for some inexplicable reason.