the p-value and the cancer patient


Loyal orgheads know that I think p-values are abused in applied statistical work. Here’s a story I made up to illustrate my point of view: Let’s say you have contracted malignant cancer that will certainly kill you (e.g., pancreatic cancer), but you have discovered a report about treatment X, which scientists believe might be very helpful. Upon reading the clinical trial report, you discover an event history regression table showing that, controlling for other relevant factors, cancer patients randomly assigned X were twice as likely to survive till the end of the clinical trial – a huge impact on the survival odds. However, it turns out that the p-value is .06.* According to modern social science standards, the authors claim that there is no significant effect and the FDA shouldn’t approve this treatment, nor should people be encouraged to pursue it any further. My questions for you are:

  1. Would you take treatment X for yourself?
  2. Would you recommend that other people be allowed to try it out? If your answer to question #2 is different than #1, why?
  3. If you were a peer reviewer, would you say that the clinical trial report authors have drawn a bad conclusion? Assuming the study was competently done, would you suggest that the clinical trial report be published anyway?
  4. If you were a research grant agency, would you suggest that we should keep working on treatment X? If you were the FDA, would you allow it to go to market?
  5. If you answered “yes” to any of these questions, how can you in good conscience apply the “alpha=.05” standard in your own social science research? Why do you support the “asterisk system” at all? Doesn’t it make you a hypocrite?

Comments are open!

* If you are from a field using the alpha=.10 standard, make the p-value=.11.

Written by fabiorojas

October 22, 2007 at 12:06 am

14 Responses

Subscribe to comments with RSS.

  1. a few years ago Scott Lynch pointed out to me a brilliant bit of cynical software. one of the several dozen test statistics in M+ is “critical n,” which tells you how large a sample you’d need to make your effects significant.



    October 22, 2007 at 1:35 am

  2. I didn’t know about that M+ feature… it sickens my p-value hatin’ heart!


    Fabio Rojas

    October 22, 2007 at 6:31 am

  3. I would take it! Well, depends on side effects…

    What if statistics took a less doctrinaire and a more finance-like approach to p-values? In this case, there is a 6% chance that the effect is a false positive. Given the stakes, isn’t that a risk worth taking? Or at least, letting informed decision makers make their own choices? If the treatment were dropped or not introduced, we would never know if we are seeing a real effect or a flase positive.



    October 22, 2007 at 12:47 pm

  4. […] sourced here […]


  5. The p-value is a convention and it must be treated as such. So I think the question should be: is this particular convention useful, or has it outlived its purposes? All societies have conventions that end up having important effects as they become part of legal rules. When is somebody “speeding”? When is a teenager “an adult”? For all of this stuff, the boundary criteria are always arbitrary (why 65mph? why 18 years old?, etc.).

    The problem with all of these conventions is that they attempt to create a categorical partition on what is essentially an non-categorical continuum. So the “significance” of a result or the “confidence” that we have in its validity, really is an interval measure that goes from 0 (absolutely no confidence, p=0.999999) to 100% (very confident p=0.000001…).

    We have “decided” (not really we since if you are born into the system it is a fait social in Durkheim’s sense) to make that continuum into a categorical state “significant versus non-significant” by imposing a conventional boundary upon the continuous stream. This is always bound to produce pathologies (i.e. fake ID cards), but are the pathologies worse than not having the convention? I suppose the haters would say yes. Those that think that having a clear cut boundary in order to make decisions as to worthiness and “significance” (in the vernacular sense) of scientific findings given the usual flood of scientific information that we are bombarded with say no. The physicists of course say that if we had really powerful mathematical representations of the relevant processes, then this would not matter since we would not need statistics to test our theories.



    October 22, 2007 at 1:48 pm

  6. Although I suspect Fabio has said so in other posts, the real issue here is the point-null-hypothesis to which any classical p-value is pegged. Testing against 0 all of the time is silly, and that is an issue of hypothesis testing not p-values themselves. In Fabio’s example, the effect is just as likely to be twice as large as the point-estimate as it is to be 0. This is a much more devastating critique than the contingencies created by sample size. (And, yes, we’ll all be Bayesians before too long … For Fabio’s case, everyone has a non-symmetric loss-function, and thus a point-value-null is totally at odds with sound judgment.)


    Steve Morgan

    October 22, 2007 at 3:30 pm

  7. steve,
    in a weird way medicine is starting to (almost) do this through the back door. in my understanding there’s a move to test new drugs against existing drugs rather than placebos. this is functionally equivalent to doing a placebo test and setting the old drug’s beta (rather than zero) as the null hypothesis.



    October 22, 2007 at 4:05 pm

  8. Steve, I wish we *were* becoming Bayesian, but I don’t see soc journals chock full of bayesian papers… See my “bayes or bust?” post:

    What can we (at least the tenured) do promote Bayesian thinking?



    October 22, 2007 at 8:06 pm

  9. Fabio: I am optimistic. Things move gradually, and we are entering a phase of soft-bayesianism. People are doing Bayesian things without realizing it. And, I see fewer and fewer journal articles where the findings hang only on tests of significance. Twenty years ago, it was easy to find journal articles where the substantive size of coefficients was never discussed, only p-values! (That being said, I don’t see sociologists ever giving full Bayesian treatments to all questions, in part because it is just not worth the effort in most cases. Implicit adoption of mean-squared error loss functions with flat priors is probably the best default position anyway.)

    But here are two things that I think all researchers in our generation should implement:

    1. We should not use stars/asterisks in tables. Just give point estimates and standard errors. Then, interpret properly. If one feels the need to put in p-values, then do it in the text. (Sometimes Editors compel us to put in stars/asterisks. Resist as much as possible.)

    2. We should always reject papers that we review when they are based on authors’ evaluations of statistical significance and comparatively little evaluation of substantive size.

    If our generation does just these two things, the journals will begin to look better on this issue.

    Causal inference is a whole different issue. That should be your next blog posting …


    Steve Morgan

    October 22, 2007 at 8:39 pm

  10. Steve, I am statistically in love with you. When I finished graduate school, I actually wrote papers with only coefficients and s.e.’s, but reviewers insisted on asterisks. Fortunately, a lot of reviewers are now onto point #2. There better attention paid to substantive issues.

    Hmmm… post on causal inference… that sounds pretty cool.



    October 22, 2007 at 8:48 pm

  11. […] the p-value and the cancer patient « (tags: stats) Filed under: Linkage   |   Search […]


  12. […] Even if you disagree with my unusual opinions on p-values, you gotta love this photo from […]


  13. I do not think the opinion reflected here is unusual. At least not in Western Europe like France, Germany or the UK. It is also known that histograms of published p-values show a huge self-inflicted bias for p ≈ .05.

    The comparison with cancer treatment is incorrect, because it forgets one dimension (cost-efficiency) of the ‘significance’ frame for drug administrations. The incentive of tests like odds-ratio for them is less to know what treatment is efficient than to order them by survival rates, for instances. In sociology, few articles compete with each other.



    March 16, 2008 at 10:43 am

  14. each time i used to read smaller posts which also clear their motive, and that is
    also happening with this paragraph which I am reading now.



    March 20, 2013 at 12:32 am

Comments are closed.

%d bloggers like this: