the consequences of selective sampling

Important article alert! The March issue of Administrative Science Quarterly has an article by Jerker Denrell and Balázs Kovács dealing with the selective sampling of empirical settings. While they are not the first to examine the problems associated with selection bias (there is a significant literature on this dating back to the 70s), they are one of the first to draw attention to the problems associated with choosing empirical settings (rather than just cases) that have atypically positive outcomes. As examples, they talk about two very common sorts of selective sampling: studies of practice diffusion and studies examining density-dependence in organizational populations. In the former, researchers typically study practices that have diffused widely rather than studying practices that failed to diffuse at all or that diffused at some intermediate level. In the latter, scholars typically study populations that have become sufficiently legitimate and have high density levels.

Denrell and Kovács, through both simulation and empirical demonstration, show that the selective sampling of empirical settings can lead to biased results. Diffusion studies typically underestimate the contagion effect. Ecological studies potentially find spurious density-dependent effects. That is, many of the U-shaped mortality curves observed by ecologists over the years may be completely spurious as a result of only studying populations that have survived long enough to generate a sufficiently high population density. Thus, the conclusive and consistent finding of density-dependence isn’t as conclusive as we once thought!

This study urges caution among scholars trying to make vast generalizations when studying a limited population. In fact, the paper suggests that it often isn’t enough to study only one population at a time. These examples are important in light of the excitement generated during earlier discussions on this blog about studying unique or extreme cases. While I wholeheartedly agree with my co-bloggers that it’s important to think about the theoretical implications of the extremes (especially when building new theory), testing those theories requires data covering a broad array of outcomes.

But how does one avoid selective sampling bias? The authors offer several suggestions, including changing the dependent variable. One obvious solution is to study multiple populations chosen for reasons other than the average value of the dependent variable. For example, in the forthcoming issue of the American Journal of Sociology, Sarah Soule and I have a paper that uses resource partitioning theory to explain levels of specialization among social movement organizations in three different social movement industries. The first design advantage is that we study three different industries, rather than just one. Including multiple populations reduces the potential of selective sampling bias. Also, rather than choosing to study industries because they were more or less specialized (on average) than other industries, we chose the industries based on their overall prominence during a 30-year time span. Presumably this prominence was uncorrelated with specialization levels. The data show that specialization actually fluctuates quite a bit over time. Finally, rather than only looking at the hazard rate of mortality among organizations (as ecological studies often do), we also included the level of tactical and goal specialization as a dependent variable in the analysis. Interestingly, some of our results support resource partitioning hypotheses (e.g., yes, industry concentration increases the probability that any single organization will specialize), but we also find some surprising results. You’ll have to read the paper to find out what they are (see an online version here).

Unfortunately we finished our paper before I had a chance to read Denrell’s and Kovács’s study and so we didn’t write much in the paper about the problems of selective sample bias and the design advantages of studying multiple industries (although we do talk about it somewhat in this paper!). My hope though is that the contribution will be obvious and people will cite it in the future. I can always hope right?

Written by brayden king

June 25, 2008 at 6:09 am

4 Responses

Subscribe to comments with RSS.

  1. i agree that “success bias” is a big problem for diffusion studies (especially quantitative work), which is one of the reason most of my current work focuses on aggregating information across multiple innovations. I have a methods paper (together with Ming Chiu and Joeri Mol) on this coming out in the 2008 Sociological Methodology. we just returned the proofs so it should be out in about a month. the title is “Modeling Diffusion of Multiple Innovations via Multilevel Diffusion Curves: Payola in Pop Music Radio.” Here’s the abstract:
    Diffusion curve analysis can estimate whether an innovation
    spreads endogenously (indicated by a characteristic “s-curve”) or
    exogenously (indicated by a characteristic negative exponential
    curve). Current techniques for pooling information across multiple
    innovations require a two-stage analysis. In this paper, we
    develop multilevel diffusion curve analysis, which is more statistically
    efficient and allows for more flexible specifications than do
    existing methods. To substantively illustrate this technique, we use
    data on bribery in pop radio as an example of exogenous influence
    on diffusion.



    June 25, 2008 at 1:40 pm

  2. Congrats on the AJS article!

    “we also find some surprising results. You’ll have to read the paper to find out what they are…”

    OK, quit being a tease and tell us what are the surprising results!



    June 25, 2008 at 4:27 pm

  3. Thanks Mike.

    Well, for me the biggest surprise was that specialization actually hindered organizations during times of economic downturn (measured by national GDP). One might think that SMOs that specialize would have a competitive advantage when the economy struggles because they find a secure niche in which to recruit participants (e.g., economizing on the identity of participants), but the results show the opposite. SMOs are less likely to specialize and generalists have a survival advantage during economic downturns. We speculate in the end that generalists have economies of scale that give them a competitive advantage when the overall level of economic resources are scarce. When resources are plentiful, specialists, which lack the same economy of scale, are better suited to find specialty niches.



    June 25, 2008 at 5:15 pm

  4. Dear Brayden,

    I am not so much into social movements, but I find that your findings are intuitive or at least not that surprising. Consider the following. The success of social movement organizations depends, amidst many other things, on the availability of political opportunity to express descent. Period of economic downturn could be viewed as the failure of the incumbent institutional logic of the society to establish social wellbeing, thus providing sufficient political opportunity to express descent. In addition, economic downturn is characterized by widespread descent in the society, which might manifest into new social movements and/or strengthen existing social movements. However, since, the economic downturn affects the society at large, the descent in the society would not be on singular disconnected issues that call for minor modifications of the incumbent institutional logic, but would be on multiple society wide issues that call for a total replacement of the incumbent institutional logic with a new institutional logic. Generalist movements that address a wide array of issues tend to viewed as being better equipped to attain this objective. Hence, it is likely to see more generalist social movements than specialist social movements during economic downturn. Please correct me if my arguments sounds unreasonable.



Comments are closed.

%d bloggers like this: