illustrating correlation versus causation

Tomorrow’s topic for my org theory MBA class (yes, we already started this week!) is the importance of theory (even for die-hard practitioners), including a fun discussion of correlation versus causation.  I have a few tricks up my sleeve (a coin toss experiment, a discussion of the NFC-stock market link and other weird correlations, etc — I’ll post these, along with the readings, tomorrow into the comments), but, what tools or experiments do you know of to highlight correlation versus causation and the associated importance of theory?

Written by teppo

January 7, 2009 at 5:05 am

Posted in education, just theory

9 Responses

Subscribe to comments with RSS.

  1. I should just show an episode of ‘House’ — each episode I’ve seen seems to have a segment with the medical team stewing over correlation/causation and symptoms on the white board.



    January 7, 2009 at 5:16 am

  2. Kieran

    January 7, 2009 at 5:50 am

  3. Kieran: Hmmm, not so sure about that.

    The above networks post also nicely illustrates the correlation-causation problem.



    January 7, 2009 at 6:07 am

  4. As someone pointed out in that CT thread, at least correlation is correlated with causation.



    January 7, 2009 at 1:47 pm

  5. Teppo, that’s a course I would love to attend and steal with aplomb. Might you be willing to share a syllabus? As for the experiential component, an episode or three of House could be really good. I can’t think of another show I’ve seen that features a team of 4-5 people exploring paths of possible correlation for 50min, then establishing causation. The show, by the way, derives from the column by Lisa Sanders in the NY Times Sunday Magazine, and those articles tend to be useful in the same way. I also recall that Edward Tufte has had a lot to say about the representation of correlation and causation (“Correlation isn’t causation–but it’s a good start”); The Visual Display of Quantitative Information would present some good examples that might yield interesting exercises.

    I guess if I could make one plea, it would be to emphasize that theory is hugely important (esp. to diehard practitioners who are sure they know what works), and that the relationship goes the other way as well. I’ve noticed a tendency to batter practitioners with the importance of theory without acknowledging a more… mutual relationship between the two. It might help with some of the practitioner resistance.



    January 7, 2009 at 2:04 pm

  6. Ok, this falls in the nit-picky perhaps useless category of blog comments: Because correlation coefficients summarize linear associations, it is theoretically possible to have a genuine causal effect that is non-linear in the causal variable that has a correlation of zero in the cause by outcome scatterplot. So, it really should be “association is not the same as causation.” But that is not very catchy.

    On the more helpful side, I think the most interesting associations/correlations that are non-effects arise in the presence of what Pearl would call colliders (but these go back a long way and are called results of Simpson’s paradox in statistics, various types of causal forks in philosophy, and so on). Anyway, the point is that two common causes of an outcome that are unconditionally unassociated become associated (and perhaps correlated!) within levels of the outcome. There are lots of selective sampling stories out there that have this structure. Patients are admitted to hospitals for alternative reasons. Across the hospital patients, all sorts of bizarre associations emerge. None of these are causal, and so you need a theory of hospital admittance and illness dynamics to make sense of the observed relationships in hospitals.


    Steve Morgan

    January 7, 2009 at 3:43 pm

  7. not exactly what you’re looking for, but on a related subject, here’s a simulation of publication bias that i wrote in stata for my intro to stats class. it’s heavily annotated so as to be human readable (especially to anyone at all familiar with Stata). you can walk them through the file then run it in real time as part of a classroom exercise. i find it’s effective to compare the graphs produced by the simulation to real meta-analyses such as the Card and Krueger thing on elasticity of the minimum wage or various meta-analyses of crackpot telekinesis and telepathy literatures.


    Gabriel Rossman

    January 7, 2009 at 3:53 pm

  8. Too bad Teppo hasn’t posted his exercises (I’m particularly interested in the coin tossing example) because I’ll be teaching a bit about correlation and causation today, in the context of how scholars go about researching OB.

    I’m going to use the example of height and salary, and violence and video games (used to be violence and tv shows, remember?).



    January 7, 2009 at 6:23 pm

  9. In “The Mismeasure of Man”, Stephen Jay Gould provides some fun examples and a very clear discussion of why confusing correlation for causation is a serious error in human reasoning:

    “Correlation assesses the tendency of one measure to vary in concert with another. As a child grows, for example, both its arms and legs get longer; this joint tendency to change in the same direction is called a positive correlation. Not all parts of the body display such positive correlation during growth. Teeth, for example, do not grow after they erupt. The relationship between first incisor length and leg length from, say, age ten to adulthood would represent zero correlation – lets would get longer while teeth changed not at all. Other correlations can be negative – one measure increases while the other decreases. We begin to lose neurons at a distressingly early age, and they are not replaced. Thus, the relationship between leg length and number of neurons after mid-childhood represents negative correlation – leg length increases while number of neurons decreases. Notice that I have said nothing about causality. We do not know why these correlations do or do not exist, only that they are present or not present. […] Arm and leg length are tightly correlated because they are both partial measures of an underlying biological phenomenon, namely growth itself.

    Yet, lest anyone become too hopeful that correlation represents a magic method for the unambiguous indentification of cause, consider the relationship between my age and the price of gasoline during the past ten years. The correlation is nearly perfect, but no one would suggest any assignment of cause. The fact of correlation implies nothing about cause. It is not even true that intense correlations are more likely to represent cause than weak ones, for the correlation of my age with the price of gasoline is nearly 1.0. I spoke of cause for arm and leg lengths not because their correlation was high, but because I know something about the biology of the situation. The inference of cause must come from somewhere else, not from the simple fact of correlation – though an unexpected correlation may lead us to search for causes so long as we remember that we may not find them. The vast majority of correlations in our world are, without a doubt, noncausal. Anything that has been increasing steadily during the past few years will be strongly correlated with the distance between the earth and Halley’s comet (which has also been increasing as of late) – but even the most dedicated astrologer would not discern causality in most of these relationships. The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.”


    Henrik Berglund

    January 12, 2009 at 10:00 am

Comments are closed.

%d bloggers like this: