Archive for the ‘mere empirics’ Category

working with computer scientists


In North Carolina, this is called the “Vaisey Cart.

I’ve recently begun to work with a crew of computer scientists at Indiana when I was recruited to help with a social media project. It’s been a highly informative experience that has reinforced my belief that sociologists and computer scientists should team up. Some observations:

  • CS and sociology are complimentary. We care about theory. They care about tools and application. Natural fit.
  • In contrast, sociology and other social sciences are competing over the same theory space.
  • CS people have a deep bucket of tools for solving all kinds of problems that commonly occur in cultural sociology, network analysis, and simulation studies.
  • CS people believe in the timely solution of problems and workflow. Rather than write over a period of years, they believe in “yes, we can do this next week.”
  • Since their discipline runs on conferences, the work is fast and it is expected that it will be done soon.
  • Another benefit of the peer reviewed conference system is that work is published “for real” quickly and there is much less emphasis on a few elite publication outlets. Little “development.” Either it works or it doesn’t.
  • Quantitative sociologists are really good at applied stats and can help most CS teams articulate data analysis plans and execute them, assuming that the sociologist knows R.
  • Perhaps most importantly, CS researchers may be confident in their abilities, but less likely to think that they know it all and have no need for help from others. CS is simply too messy a field, which is similar to sociology.
  • Finally: cash. Unlike the arts and sciences, there is no sense that we are broke. While you still have to work extra hard to get money, it isn’t a lost cause like sociology is where the NSF hands out a handful of grants. There is money out there for entrepreneurial scholars.

Of course, there are downsides. CS people think you are crazy for working on a 60 page article that takes 5 years to get published. Also, some folks in data science and CS are more concerned about tools and nice visuals at the expense of theory and understanding. As a corollary, it is often the case that some CS folks may not appreciate sampling, bias, non-response, and other issues that normally inform sociological research design. But still, my experience has been excellent, the results exciting, and I think more sociologists should turn to computer science as an interdisciplinary research partner.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 24, 2015 at 12:01 am

science as a giant conscious hive mind

In a new article in Social Networks, Feng Shi, Jacob Foster, and James Evans argued that the complexity and diversity of scientific semantic networks creates very high rates of innovation. From Weaving the fabric of science: Dynamic network models of science’s unfolding structure:

Science is a complex system. Building on Latour’s actor network theory, we model published science as a dynamic hypergraph and explore how this fabric provides a substrate for future scientific discovery. Using millions of abstracts from MEDLINE, we show that the network distance between biomedical things (i.e., people, methods, diseases, chemicals) is surprisingly small. We then show how science moves from questions answered in one year to problems investigated in the next through a weighted random walk model. Our analysis reveals intriguing modal dispositions in the way biomedical science evolves: methods play a bridging role and things of one type connect through things of another. This has the methodological implication that adding more node types to network models of science and other creative domains will likely lead to a superlinear increase in prediction and understanding.

Bringing soc of science and network analysis together. Love it.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 12, 2015 at 12:01 am

oh, bonferroni!

I was recently working on a paper and a co-author said, “Yo, let’s slow down and Bonferroni.” I had never done that statistical position before and I thought it might hurt. I was afraid of a new experience. So, I popped out my 1,000 page Greene’s econometrics… and Bonferroni is not in there. It’s actually missing from a lot of basic texts, but it is very easy to explain:

If you are worried that testing multiple hypotheses, or running multiple experiments, will allow to cherry pick the best results, you should then lower the alpha for statistical tests of significance. If you test N hypotheses, your new “adjusted” alpha should be alpha/N.

Simple – ya? No. What you are doing is switching out Type 1 for Type 2 errors. You are increasing false negatives. So what should be done? There no consensus alternative. Andrew Gelman suggests a multi-level Bayesian approach, which is more robust to false positives. There are other methods. Probably something that should be built into more analyses. Applied stat mavens, use the comments to discuss your arguments for Bonferroni style adjustments.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 11, 2015 at 12:01 am

Posted in fabio, mere empirics

dear andrew perrin: i was wrong and you were right on the obesity and mortality correlation

A while back, Andrew and I got into an online discussion about the obesity/mortality correlation. He said it was true, I was a skeptic because I had read a number of studies that said otherwise. Also, the negative consequences of obesity can be mitigated via medical intervention. E.g., you may develop diabetes, but you can get treatment so you won’t die.

The other day, I wanted to follow up on this issue and it turns out that the biomedical community has come up with a more definitive answer. Using standard definitions of obesity (BMI) and mortality, Katherine Flegal, Broan Kit, Heather Orpana, and Barry I. Graubard conducted a meta-analysis of 97 articles that used similar measures of obesity and mortality. Roughly speaking, many studies report a positive effect, many report no effect, and some even report a negative effect. When you add them all together, you get a correlation between high obesity and mortality, but it is not true at ranges closer to non-overweight BMI. From the abstract of Association of All-Cause Mortality With Overweight and Obesity Using Standard Body Mass Index Categories: A Systematic Review and Meta-analysis, published in the 2013 Journal of the American Medical Association:

Conclusions and Relevance Relative to normal weight, both obesity (all grades) and grades 2 and 3 obesity were associated with significantly higher all-cause mortality. Grade 1 obesity overall was not associated with higher mortality, and overweight was associated with significantly lower all-cause mortality. The use of predefined standard BMI groupings can facilitate between-study comparisons.

In other words, high obesity is definitely correlated with mortality (Andrew’s claim). Mild obesity and “overweight” are correlated with less mortality (a weaker version of my claim). The article does not settle the issue of causation. It can be very likely that less healthy people gain weight. E.g., people with low mobility may not exercise or take up bad diets. Or people who are very skinny may be ill as well. Still, I am changing my mind on the basic facts – high levels of obesity increase mortality.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 4, 2015 at 12:01 am

dear UK: more tweets, more votes!!!!

Previous More Tweets, More Votes coverage

The Oxford Internet Institute reports that Twitter data picked up some of the trends in last week’s election, when traditional polling did poorly. In their blog, they ask – did social media suggest the massive upset from last week? Answer, somewhat:

The data we produced last night produces a mixed picture. We were able to show that the Liberal Democrats were much weaker than the Tories and Labour on Twitter, whilst the SNP were much stronger; we also showed more Wikipedia interest for the Tories than Labour, both things which chime with the overall results. But a simple summing of mention counts per constituency produces a highly inaccurate picture, to say the least (reproduced below): generally understating large parties and overstating small ones. And it’s certainly striking that the clearly greater levels of effort Labour were putting into Twitter did not translate into electoral success: a warning for campaigns which focus solely on the “online” element.

One of the strengths of our original paper on voting and tweets is that we don’t simply look at aggregate social media and votes. That doesn’t work very well. Instead, what works is relative attention. So I would suggest that the Oxford Institute look at one-on-one contests between parties in specific areas and then measure relative attention. In the US, the problem is solved because each Congressional district has a clearly identified GOP and Democratic nominee. The theory is that when you are winning people talk about you more, even the haters. People ignore losers. Thus, the prediction is that relative social media attention is a signal of electoral strength. I would also note that social media is a noisy predictor of electoral strength. In our data, the “Twitter signal” varied wildly in its accuracy. The correlation was definitely there, but some cases were really far off and we discuss why in the paper.

Finally, I have not seen any empirical evidence that online presence is a particularly good tool for political mobilization. Even the Fowler paper in Nature showed that Facebook based recruitment was paltry. So I am not surprised that online outreach failed for Labour.

Bottom: The Oxford Internet Institute should give us a call, we can help you sort it out!

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

May 11, 2015 at 12:01 am

alcoholism: the nuclear bomb of the life course

This is the last post for now about The Triumphs of Experience. In today’s post, I’d like to focus on one of the book’s major findings: the extreme damage done by alcoholism. In the study, the researchers asked respondents to describe their drinking. Using the DSM criteria and respondents’ answers, people were classified as occasional social drinkers, alcoholics and former alcoholics. Abstainers were very few so they receive no attention in the book. People were classified as alcoholics if they indicated that alcohol drinking interrupted their lives in any significant way.

The big finding is that alcoholism is correlated with nearly every negative outcome in the life course: divorce, early death, bad relationships with people, and so forth. I was so taken aback by the relentless destruction that I named alcoholism the “nuclear bomb” of the life course. It destroys nearly everything and even former alcoholics suffered long term effects. The exception is employment. A colleague noted that drinking is socially ordered to occur at night, so that may be a reason people can be “functioning” alcoholics during the day.

The book also deserves praise for adding more evidence to the longstanding debate over the causes of alcoholism. This is possible because the Grant Study has very rare, and detailed, longitudinal data. They are able to test the hypotheses that development of alcoholism is correlated with addictive personality (“oral” personality in older jargon), depression, and sociopathy. The data does not support these hypotheses.By itself, this is an important contribution.

The two factors that do correlate with alcoholism are having an alcoholic family member and the culture of drinking in the family. The first is probably a marker of a genetic predisposition. The second is about education – people may not understand how to moderate if they come from families that hide alcohol or abuse it. In other words, the family that lets kids have a little alcohol here and there are probably doing them a favor by teaching moderation.

Finally, the book is to be commended for documenting the ubiquity of alcoholism. In their sample, alcoholism occurs in about 25% of the sample of men at age 20. By the mid 40s, alcoholism reaches a peak, with about half of men being classified as alcoholics. After age 50, it then declines – mainly due to death and becoming a “former alcoholic.” If there is any generalizability at all to these findings, it shows that alcoholism has probably been wrecking the lives of millions and millions of people, somewhere between a quarter and half the population. That’s a profound, and shocking, finding.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

May 8, 2015 at 12:01 am

ethnography and micro-arrays


The ethnoarray!!!

A while back, I discussed a new technique for organizing and displaying information collected through qualitative methods like interviews and ethnography. The idea is simple: the rows are cases and the columns are themes. Then, you shade the matrix with color. More intense colors indicate that the case really matches the themes. Clustering of colors indicate clusters of similar cases.

Dan Dohan, who imported this technique in from the biomedical sciences, has a new article with Corey Abramson out that describes this process in detail. From Beyond Text: Using Arrays to Represent and Analyze Ethnographic Data in Sociological Methodology:

Recent methodological debates in sociology have focused on how data and analyses might be made more open and accessible, how the process of theorizing and knowledge production might be made more explicit, and how developing means of visualization can help address these issues. In ethnography, where scholars from various traditions do not necessarily share basic epistemological assumptions about the research enterprise with either their quantitative colleagues or one another, these issues are particularly complex. Nevertheless, ethnographers working within the field of sociology face a set of common pragmatic challenges related to managing, analyzing, and presenting the rich context-dependent data generated during fieldwork. Inspired by both ongoing discussions about how sociological research might be made more transparent, as well as innovations in other data-centered fields, the authors developed an interactive visual approach that provides tools for addressing these shared pragmatic challenges. They label the approach “ethnoarray” analysis. This article introduces this approach and explains how it can help scholars address widely shared logistical and technical complexities, while remaining sensitive to both ethnography’s epistemic diversity and its practitioners shared commitment to depth, context, and interpretation. The authors use data from an ethnographic study of serious illness to construct a model of an ethnoarray and explain how such an array might be linked to data repositories to facilitate new forms of analysis, interpretation, and sharing within scholarly and lay communities. They conclude by discussing some potential implications of the ethnoarray and related approaches for the scope, practice, and forms of ethnography.


50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

May 7, 2015 at 12:01 am


Get every new post delivered to your Inbox.

Join 3,369 other followers