You say within, I say between…Let’s call the whole thing off!

Suppose we are interested in the effects of some social psychological construct that we are theoretically devoted to (let’s say “symbolic racism”) on support (or lack thereof) for (generous) social welfare policies.  In quantitative social science we would spend a lot of money surveying people, collect some data, and ultimately specify a regression model of the form:

Y=a+bW+cX+e      (1)

Where Y is some sort of scale or that lines up individuals in terms of their support for social welfare policies, W is some sort of scale that lines up individuals in terms of their “symbolic racism” is a matrix of other “socio-demographic” stuff and e is a random disturbance.  Suppose further that the model provides support for our theory; b is substantively and statistically significant and its sign goes in the right direction: the more symbolic racism the less support for social welfare policies.  We would then write a paper arguing that individuals who are high in symbolic racism are less likely to support social welfare policies, and this is a likely source of support for the Republican party in the South, we might even insinuate in the conclusion that trends in income inequality would be much less steep if it wasn’t for these darn racists, etc.

I would bet you 10,000 dollars*, however that in actually presenting their results and their implications the authors would say things that are in fact not supported by their statistical model.  In fact we all say or imply these things, especially when W is an attitude (or some other “intra-individual” attribute) and Y is a behavior, and we desire to conclude from a model such as (1) that attitude is a cause of the behavior (the same thing would apply if  the unit of analysis are organizations, and W is some organizational attribute–like the implementation of a “strategy”–and Y is an organizational outcome).

Now suppose even further that W passes all of the (usual) hurdles for something to constitute a cause: it precedes Y, the model is correctly specified on the observables, etc.  My point here is that even if that were true, it is not true that from the fact of observing a large and statistically significant effect of b we can conclude that at the individual level there is some sort of psychological (intra-organizational) process with the same structure as our W  called “symbolic racism” that causes the individual’s support for this or that policy.

An obscure segment of the statistical and psychometrics literature tells us why this is the case (see in particular Borsboom et al 2003):  in order to jump from information that is obtained from a comparison between persons to statements about the data generating process within persons, we must make what is called the local homogeneity assumption.  This assumption is just that; an assumption.  And for the most part it is a shaky one to make.  For b in (1) only gives us information about the conditional distribution of Y responses among the population of subjects as we move across levels of W; it says nothing about causal processes at the individual level. In fact the model that produces responses at the individual level could be wildly different from (1) above and yet it could generate the between-persons result that we observe.  In this respect, the statements:

1a. Our results provide support for the conclusion that in the contemporary United States a person with a high degree of symbolic racism is less likely to support social welfare policies than another person with a lower degree of symbolic racism.

1b. Our results provide support for the conclusion that a person’s support for punitive welfare policies would decrease if their propensity towards symbolic racism were to decrease.

Are empirically and logically independent.  Model (1) only supports 1a, but it says nothing about 1b (or would only say something about 1b under the weight of a host of unsupportable assumptions).  However, whenever we write up results obtained from models such as (1), we sometimes present them as if (or insinuate that) they provide support for 1b.

Startlingly, this lack of (necessary or logical) correspondence between a between-subjects result and the DGP (data-generating process) at the individual level implies that most statistical models are useless for the sort of thing that people think that they are good for (draw conclusions about mechanisms at the level of the person/organization).   Not only that, it implies that a model that provides a good explanatory fit for within individual variation (let’s say a growth curve model of the factors that account for individual support for social welfare across the life course) might be radically different from the  one that provides the best fit in the between-persons context.  Finally, it implies a “rule” of sociological method: “whenever a within-subject explanation is extracted from a between subjects analysis we can be sure that this explanation is (probably) false (at least for most non-trivial outcomes in social science).”

*I don’t actually have 10,000 dollars.

Written by Omar

December 16, 2011 at 4:28 pm

5 Responses

Subscribe to comments with RSS.

  1. — Omar,

    I fully agree with you that sociologists and other social scientists are frequently making causal claims from
    associational data without being clear about what assumptions they are making. The rest of your post I have considerable trouble with. Let me start with a smaller point. You state: “whenever a within-subject explanation is extracted from a between subjects analysis we can be sure that this explanation is (probably) false (at least for most non-trivial outcomes in social science).” How about I do a well executed
    randomized experiment (RCT)with no compliance problem. Most randomized experiments involve a between subject design. Although there are always questions of internal validity, most methodologist are likely to believe, reasonably so, that RCT is likely to give credible results.

    Of course, in the social sciences it can be difficult to do an RCT. We are often stuck with what is know as observational data. With observational data we are stuck with the fact that other factors may be producing differences in our outcome other than our treatment. This is the case both with cross-sectional, i.e. between individual, data and panel or time series data, within individual/unit data. The problem of causal inference then is whether they are reasonable alternative explanations for why T and Y are correlated.

    The traditional way that this question has been thought is whether there are omitted variables that might
    explain your “b”, that is, the conditional association between T and Y. Recently, the work of Judea Pearl has provided a far more general way of assessing this issue. For an introduction to his thinking see chapter
    3 of my book with Steve Morgan, Counterfactuals and Causal Inference. The key idea is that we need to posit a theory of how the world works and then ask whether within that theory are there other plausible explanations for why T and Y are (conditionally) associated. If not, then we take the observed (conditional) association as evidence of a causal relationship. We say (with respect to our theory) the causal effect is identified. The key idea here is that (conditional) association + a theory => causal evidence. Associational evidence by itself never provides evidence of a causal relationship. This was the mistake Hume made.

    Of course, you and I may well disagree about the theory and we then need to argue about the reasonableness of our different theories. Often this takes the form of arguing about whether there are other plausible explanations for the observed association between T and Y. Of course, other individuals may come along and offer a reasonable theory that we have not thought of. This is why theories/causal claims can never be definitive. As Popper taught as long ago, they can only be falsified not proven.

    It is true that panel data can sometimes provide more powerful ways for testing for causal relations. Specifically, with panel data we can often control for fixed unobserved differences across individuals by using a fixed effects analysis. By using a fixed effect model we move from using data that involves both between and within individual comparisons to only using within comparisons.

    There are of course other strategies. Regression, stratification, etc. work if all the potentially important variables are observed and we can condition on them. Another alternative is to use instrumental variables which is an attempt with observational data to mimic a randomized experiment (see chapter 7 of my book).

    I would agree with the tone of your post, that many of the causal claims in sociology are likely to be false or at least rest on very fragile empirical data. Your claim that between subject analysis never provides serious evidence for within causal processes is just not reasonable. Randomized experiments often work. Instrumental variable estimates are sometimes reasonable and in some cases regression may well be
    persuasive. It all matters on the substantive issue that we are investigating.

    Key take away: Observed associations + theory => causal evidence

    Chris Winship


    Chris Winship

    December 16, 2011 at 9:04 pm

  2. 1. This is a variant of an ecological inference fallacy. Whether going for a causal inference or just a descriptively accurate empirical statement, you are in a tough situation if you are using data only from a higher-up unit of analysis when the theoretical proposition is about the lower-down units. So your between/within person distinction is the same as the classic between/within voting districts examples from that literature. So, I basically agree with the main point of your post, I think.

    2. That being said, I can’t think of any causal inferences that don’t rely on assumptions. Even RCTs typically have to introduce all manner of assumptions that are relied on to link the actual intervention to the theory that is supposedly being informed, and then all sorts of things have to be assumed to deal with non-compliance. Often also you will find homogeneity assumptions throughout (those who don’t respond to the treatment that is administered are just like those who do!) I think this is Chris’ main point above, and of course we always agree on such things!

    3. I suspect that the strong local homogeneity assumption you note is sufficient but not necessary, and that weaker assumptions may still suffice to identify some average effects. Probably would be useful for orgtheory readers if you laid that out more specifically.


    Steve Morgan

    December 18, 2011 at 4:29 pm

  3. @Chris: Yes, the argument above was mainly directed at those using observational data. But more broadly, my last point is actually not about “causal” inference, but about another source of (even more problematic) inference, which (I am just going to make this up) we may call “mechanismic inference.” Like for instance, drawing the conclusion that an “attitude” characterized as a linear dimension drives people to support this or that policy, so that the attitude is (one of) the mechanism(s) that accounts for why persons support this or that party platform.

    Note that in my dreamworld I actually presumed that all of the conditions for concluding that the effect of W on Y was actually causal, so my basic argument is about jumping from a conclusion about the within person mechanism responsible for a given statistical effect (even if this effect meets the conditions for being “causal”) when all of the information that you have is about the between person counterfactual.

    As I said above, you can always draw this conclusion, but the conclusion is based on an untestable assumption (that the construct in question, let’s say W is locally homogeneous). What most people in psychometrics argue is that most individual level attributes (personality, attitudes, cognitive ability) of interest are not locally homogeneous. local heterogeneity entails for instance, that different measurement models apply to the intra-individual case than the between-person case. So that even if your data allow you to conclude that something like symbolic racism is best expressed as a linear additive dimension, at the within-individual level you find that actually thinking of the construct as entailing discrete classes is actually the best way to go. But more importantly, finding a between-person causal effect at the observational level, does not necessarily entail that the within person counterfactual obtains (e.g. that during the life-course as a person becomes less symbolically racist their support for liberal policies will increase). A growth-curve model with discrete latent classes could show for instance, that there is strong heterogeneity in the within person effect so that for some people becoming less racist has no effect, for others it has a strong effect, etc. This heterogeneity is necessarily masked in the between-persons result.

    So, the argument above applies specifically when (a) we have observational data, (b) our statistical model combines a “measurement model” for W (which can be as simple as a linear additive scale that assumes no measurement error), and (c) we use some type of GLM to estimate the causal effect of W on Y. Like the symbolic racism example alluded to above (which is actually more representative of everyday social scientific data analysis than a RCT). My point was that a lot of the time, we want to draw within-persons inferences from between-persons data, as when we conclude from estimating (1) above that there is an “attitude” that has some form of quantitative structure (e.g. it is a linear dimension) that has a causal effect on a person’s behavior or political stance for all individuals. My argument is that even if you’ve taken care of all of the requirements for causality (as let’s say outlined in Pearl or Morgan and Winship) you are still on thin ice when you jump from the between-persons result (the attitude seems to have a causal effect) to theorizing about the within-persons mechanism that produces the effect, especially when your W happens to be the hypothesized mechanism.



    December 18, 2011 at 5:17 pm

  4. On causality and other (endogenous) demons, check out this


  5. Dean Eckles had a post a while ago on this theme with a nice illustration.


    Cosma Shalizi

    December 21, 2011 at 3:16 am

Comments are closed.

%d bloggers like this: