comments on andrew gelman’s dec 21 post

Last Saturday, Andrew Gelman responded to a post about a discussion in my social network analysis course. In that post, my student asked about different strengths of a network effect reported in a paper. Gelman (and Cosima Shalizi) both noted that the paper does not show a statistically significant difference. I quote the concluding paragraphs of Andrew’s commentary:

I’m doing this all not to rag on Rojas, who, after all, did nothing more than repeat an interesting conversation he had with a curious student. This is just a good opportunity to bring up an issue that occurs a lot in social science: lots of theorizing to explain natural fluctuations that occur in a random sample. (For some infamous examples, see here and here.) The point here is not that some anonymous student made a mistake but rather that this is a mistake that gets made by researchers, journalists, and the general public all the time.

I have no problem with speculation and theory. Just remember that if, as is here, the data are equivocal, that it would be just as valuable to give explanations that go in the opposite direction. The data here are completely consistent with the alternative hypothesis that people follow their spouses more than their friends when it comes to obesity.

Fair enough. Let me add a pedagogical perspective. When I teach network science to undergrads, I generally have a few goals. First, I want to show them how to convert social tie data into a matrix that can be analyzed. Second, I want students to learn how network concepts might operationalize social science concepts (e.g., how group cohesion might be described as high density). Third, I want to spark their imagination a little and see how network analysis can be used to describe or analyze a wide range of phenomena and thus encourage students to generate explanations. Given that students have very, very modest math skills and real problems generating hypotheses, getting down into the weeds with the papers is often last.

So when I teach the week on networks and health, my discussion questions are like this: “Why do you think health might be transmitted from one person to another? How would that work?” I also try to get into basic research design: “How do you measure health? Do you know what BMI is?” So the C&F paper has many up sides. The downside is that the paper has an interesting hypotheses and you can easily get distracted from the methodological controversy the paper has generated, or even some very sensible observations on confidence intervals. The bottom line is that when you have to teach everything (theory, methods, research design and topic), you don’t quite get everything. But still, if a student, who self-admitedly knows little math or stats, can get to a point about asking about mechanisms, then that’s a teaching victory.

Post-Christmas blow out: From Black Power/Grad Skool Rulz

Written by fabiorojas

December 26, 2013 at 12:18 am

Posted in fabio, mere empirics, networks

8 Responses

Subscribe to comments with RSS.

Fabio:

I agree completely. As I wrote in response to a commenter on the other blog, in statistics we say that God is in every leaf of every tree: that is, any real example, if studied closely, reveals insight after insight after insight. The challenge in teaching is to get a student interested enough in an example to think hard–and clearly you succeeded in this case! There is the second stage of learning, in which the student realizes that he or she is chasing the noise and could just as well be giving explanations in the other direction, but it’s perhaps best to explain that to the student only after he or she has been thinking hard about the example.

Also, one other thing about this particular example. I was careful in my recent post not to get into the methodological controversies regarding that paper. My point was that, even if that paper had no methodological controversies at all, that particular comparison being discussed by the student was essentially pure noise. Even if every aspect of the paper is correct and all its coefficients could be interpreted directly, even then we’re talking about a difference that is less than the noise level stated in the paper itself.

It’s a difficult balancing act, and I don’t know the best way to teach it (indeed, the best way to teach it must surely depend on the student): on one hand, variation is important and the very essence of statistics is to distinguish signal from noise; on the other hand, the whole point of quantitative social science is to connect the numbers to real-world questions. In this particular case, your student (and two of your blog commenters) took a whack at the second task, and it could well be a distraction for you (or me) to point out that they’re basically explaining a coin flip.

LikeLike

Andrew Gelman

December 26, 2013 at 9:33 am
Curious question: Is it really a coin flip? One can’t argue that one is more likely than the other, given the data? (if one had to bet one’s life on it, e.g.)

LikeLike

Anonymous

December 26, 2013 at 4:08 pm
Anon:

Ok, not a coin flip, but very little evidence, comparable perhaps to taking two students in the classroom and having them take basketball free throws, 20 shots each. Student A hits 8 out of 20 successfully, student B hits 9 out of 20 successfully. It turns out that Student A is a 23-year-old white male, majoring in sociology, and Student B is a 32-year-old Asian female, majoring in business. And now we can theorize reasons for the difference: e.g., the woman tried harder than the man, the Asian was more interested in basketball, the sociology student was less focused on winning, etc etc. But these explanations are pretty empty, given that the outcome easily could’ve gone the other way. That’s what I think about explanations offered by Fabio’s student and commenters in the fat-contagion scenario. Lots of stories can be spun but the available data are pretty much irrelevant to the question.

LikeLike

Andrew Gelman

December 26, 2013 at 5:28 pm
Wow. So it is too difficult to explain to undergrads – even those with very modest math skills – that the difference is not statistically significant!?! This doesn’t require a primer in statistical theory, frequentist or Bayesian. When Very Modest Math Skills asks, “why do friends have a bigger effect than spouses?” The professor responds, “actually, they don’t” …and then takes 30 seconds to explain a confidence interval. …Or the idea of a “significant difference.” …Or maybe just talks about shooting baskets a la AG. It ain’t rocket science.

LikeLike

SJ

December 26, 2013 at 6:16 pm
SJ: You should cut me some slack. All teachers have moments when they should have said something different. And yes, even discussions of confidence intervals can be challenging. For example, there were actually English majors in the class . I even had to do basic “this is a variable.” Don’t confuse the grad seminar with what most profs have to teach.

LikeLike

fabiorojas

December 26, 2013 at 9:00 pm
I was making a somewhat different point than Andy. He was pointing out that the difference in observed associations is not significant, because both associations were measured so imprecisely. I was pointing out that contagion/influence effects aren’t identified by the kind of data Christakis and Fowler collected. (I don’t think causal effects are always unidentified from observational data, but there are specific problems with this sort of network data — which would’ve taken much longer to uncover, I think, without the impetus given by Christakis and Fowler.) With larger samples, the uncertainties in the associations would shrink, and we’d be able to say with confidence which association was larger, which we can’t right now. Even with infinite data, though, we wouldn’t be able to say anything about which relation had more influence, because influence is unidentified. Either way, we have no evidence that there’s any real difference here. Pedagogically, if the obstacle is that students are reluctant to come up with hypotheses, I can certainly see the value of using this to get them spinning stories, but then it seems to me the next lesson is “OK, how do we know there really is something to be explained?”

LikeLike

Cosma

December 26, 2013 at 9:58 pm
I’d also like to point out that as a commenter on original post I was trying to say there is probably a better explanation for the influence of friends found in the paper. I commented without reading the paper again so I didn’t actually look at potential statistical problems nor did I mention methodological issues with paper. But when reading Fabio’s post, the finding didn’t make much sense to me on face of it and given some things I know from lit, so I offered an alternative explanation. As I said in post, it has been awhile since I looked at paper and I still haven’t read it again, but don’tow if I need to in order to offer my reasoning . It was armchair theorizing but my alternative hypothesis is still fair I think: males are more likely to be obese than females. Males are more likely to have male friends rather than female friends. These two elements combined might affect the relationship found in study between friends and obesity. Both elements of my alterbative hypothesis are well known signals (maybe even more so with male obesity rates in Massachusetts). And probably go much farther than trying to determine whether whether obesity is occurring through some sort of social influence or transmission.

LikeLike

Scott Dolan

December 27, 2013 at 3:14 pm
Apologies for typos above, that was sent from phone. And I hit send by accident.

LikeLike

Scott Dolan

December 27, 2013 at 3:16 pm

Comments are closed.

orgtheory.net