selection bias in surverys – recent developments?

A common problem in social research is selection bias – the people who choose to respond to your survey may be systematically different than the population. We have some methods, like the Heckman model, for adjusting your final model if you have some data that can be used to  model study participation. If you don’t have a decent selection model, you can still make some assessment using the methods suggested by Stolzenberg and Relles, which have you decompose your models and study the properties of the different parts (e.g., look at the degree of mean regression under certain conditions).

Question for readers: What is the state of art on this issue? Is there something better than Heckman or playing games with Mills ratios?

Adverts: From Black Power/Grad Skool Rulz

Written by fabiorojas

October 5, 2012 at 12:01 am

Posted in fabio, mere empirics

5 Responses

Subscribe to comments with RSS.

  1. If you have a population frame like a census or a probability survey of the population you want to generalize to, then you can use propensity score methods. You are basically modeling non-response, which is common in sampling. A problem that can occur is that you have some units in the population that aren’t in the sample; in that case you can either change the population, collect more data, or accept that your estimate is biased to some degree. (I have a paper on this generalization topic as it relates to experiments, but I think the same idea applies here — for some shameless self promotion, see



    October 5, 2012 at 1:02 am

  2. Agreed. In the world of social epidemiology, propensity scores are sometimes used to address selection or attrition that is due to differential mortality.


    Martin Cooke

    October 5, 2012 at 11:30 am

  3. There’s a lot of relevant work being done in machine learning under the label of “covariate-shift adaptation”. Most of it has something of the flavor of Heckman’s stuff, in that it assumes you have some information about the over-all population, but it’s a lot more flexible. This proceedings volume is a good introduction.

    There’s also Manski’s work on partial identification, i.e. bounding un-identified parameters.


    Cosma Shalizi

    October 5, 2012 at 12:15 pm

  4. Among economists, there’s been a fair bit of work on this topic following Heckman, largely in terms of loosening the strong parametric assumptions used in the Heckit model. See, for example, Ahn and Powell (1993), or Das, Newey, and Vella (2003) for a fully nonparametric version. Generally speaking, these approaches are quite similar to Heckit, except in using more flexible forms for the correction equation.

    Heckman himself seems to have had mixed feelings about the strong parametric assumptions used in his model, as they in principle allow identification of the distribution without a good source of exogenous variation in selection probability. His empirical research project (read, “joke”) “The Effect of Prayer on God’s Attitude Toward Mankind” presents a cogent example of the (potentially spurious) sorts of conclusions that may be drawn using the kind of functional form assumptions used in the basic version of his selection model. He finds that quantity of prayer has a nonmonotonic effect on God’s attitude toward the individual.

    Notably, the conditions for validity for the more flexible versions of the Heckit estimator noted above would not permit inferences to be drawn from the prayer data set…


    David C.

    October 5, 2012 at 6:23 pm

  5. There’s the double-sample approach.


    Jenny Trinitapoli

    October 6, 2012 at 9:52 pm

Comments are closed.

%d bloggers like this: