orgtheory.net

oh, bonferroni!

I was recently working on a paper and a co-author said, “Yo, let’s slow down and Bonferroni.” I had never done that statistical position before and I thought it might hurt. I was afraid of a new experience. So, I popped out my 1,000 page Greene’s econometrics… and Bonferroni is not in there. It’s actually missing from a lot of basic texts, but it is very easy to explain:

If you are worried that testing multiple hypotheses, or running multiple experiments, will allow to cherry pick the best results, you should then lower the alpha for statistical tests of significance. If you test N hypotheses, your new “adjusted” alpha should be alpha/N.

Simple – ya? No. What you are doing is switching out Type 1 for Type 2 errors. You are increasing false negatives. So what should be done? There no consensus alternative. Andrew Gelman suggests a multi-level Bayesian approach, which is more robust to false positives. There are other methods. Probably something that should be built into more analyses. Applied stat mavens, use the comments to discuss your arguments for Bonferroni style adjustments.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 11, 2015 at 12:01 am

Posted in fabio, mere empirics

5 Responses

Subscribe to comments with RSS.

  1. Bonferroni corrections turn out to be too conservative (in the sense that they produce too many type 2 errors in the pursuit of reducing type 1 errors) for most scientific applications. Much better to control the False Discovery Rate (FDR), which is the proportion of discoveries that are false, times the probability of making any discoveries. Bonferroni controls the Family-Wise Error Rate, which is the probability of making even 1 false discovery.

    In practice, this means that if you’re testing multiple hypotheses, Bonferroni will test each hypothesis at the alpha/N level. An FDR-controlling procedure will test the first hypothesis at the Bonferroni level, then apply a looser rejection threshold as you reject more hypotheses (making the procedure more powerful). FDR-control is standard in genomics, where multiple comparisons issues are ever-present. I haven’t seen it applied much in social science, though I’m working to change that.

    Simon Jackman and I give some background on the two approaches and cover the most popular method to control the FDR in our new paper, available here (skip to page 6 for the statistics):

    https://www.dropbox.com/s/qvqtz99i4bhdore/silenced.pdf?dl=0

    If you want to learn more, Annie Franco and I will be doing a poster on this topic at the Polmeth in July and a presentation APSA in September. We also have a paper in the works.

    Like

    bcommand

    June 11, 2015 at 7:32 am

  2. ^ Comment above was from Brad Spahn. I have no idea how my comment ended up signed as bcommand.

    Like

    Bradley Spahn

    June 11, 2015 at 7:42 am

  3. Auto correct is a bummer! But thank you, that is helpful.

    Like

    fabiorojas

    June 11, 2015 at 4:38 pm

  4. Seconding what Brad said–Bonferroni corrections are too conservative. Psychology stats books usually have a lot of information on familywise error rates so I’m surprised econometrics books (or at least Greene’s books) don’t mention them. The wikipedia page on familywise error rates is good if you want to look at the all the post-Bonferroni developments.

    Like

    Chris M

    June 18, 2015 at 4:54 pm

  5. Chris, there are two issues in play here. One is the type of error rate being controlled, and the other is the procedure for doing so.

    What you say is true, the Bonferroni correction is a conservative procedure for the familywise error rate. Though conveniently, it holds without any assumptions about the dependency structure of the hypotheses, which is extremely convenient. You can always do better with Holm’s step-down method, but for further improvements, you have to start imposing assumptions on the hypotheses being tested.

    My point was a little different, which is that you should probably control a different error rate, namely the false discovery rate. The false discovery rate is less restrictive than the familywise error, and using a testing procedure that controls the fdr instead of the fwer will always increase your power.

    Like

    Bradley Spahn

    June 22, 2015 at 1:56 am


Comments are closed.

%d bloggers like this: