Archive for the ‘mere empirics’ Category
Via Vox: A JAMA Internal Medicine article discusses how people systematically over estimate the benefits of medical treatment. This speaks to a broader issue – we under value things like exercise, diet, sanitation, and vaccination for health and over value “hero medicine” and fancy interventions.
A few weeks ago, I suggested that one can use techniques from computer science to assess, measure, and analyze the field notes and interviews that one collects during field work. The reason is that computer scientists have made progress in writing algorithms that try to pick up the emotional tenor or meaning of texts. Not perfect by any means, but it would be a valuable tool that can be used to help qualitative researchers identify themes and patterns in the text.
In the last round, there were two comments that I want to address. First, Krippendorf wrote: “Why call it computational ethnography and not just text analysis?” Answer: There are two existing modes of analyzing text and techniques like sentiment analysis and topic modelling new things in new ways. Allow me to explain:
- The traditional way of reading qualitative texts is simply for the researcher to read the texts and develop a grounded understanding of the meaning that the text represents. This is the standard mode among historians, most anthropologists, and some sociologists. Richard Biernacki in Reinventing Evidence in Social Inquiry argued that is the only valid mode of qualitative analysis.
- The other major way to deal with qualitative materials is to conduct a two step operation of having people code the data (using key words or other instructions) and then performing an inter coder reliability analysis (i.e., assign codes to texts and compute Krippendorf alpha’s).
So what is new? Techniques like topic models or sentiment analysis do not use people to code data. After you train the algorithms, it is all automated. This has advantages – speed, reproducibility, and so forth – for large data. Another novel aspect is that these algorithms are usually built with some sort of model of language in mind that gives you insight into how the text was coded. For example, the Stanford NSL package essentially breaks down sentences by grammar and then estimates the distribution of words with specific sentiment. Thus, there is an explanation for every output. In contrast, I can’t reproduce even my own codes over time. Give me a set of text next week, and it will be coded a little different.
Second, a number of commenters were concerned about the open ended nature of notes, the volume of materials, and whether the sorts of things that might be extracted would be useful to sociologists. These comments are easily addressed. Lots of projects produce tons of notes. I recently collected 194 open ended interviews. My antiwar project resulted in dozens and dozens of interviews. We have the volume. Sometimes they are standardized, sometimes not. That’s an empirical issue – how badly does it do with unstructured text? Maybe better than we expect. There is no reason for an a priori dismissal. Finally, I think a little induction is helpful. Yes, we can now pick up sentiment, which is an indicator of emotion, but why not let the data speak to us a little? In other, there’s a whole new world around the corner. This is one step in that direction.
Vox has a nice interview with Dartmouth political scientist Brendan Nyhan about vaccine skeptics. What can be done to convince them? Brendan does research on political beliefs and has shown that in experimental settings, people don’t like to change beliefs even when confronted with correct information. His experiments show that this is true not only for political beliefs, but also controversial health beliefs like believing in the vaccine-autism link.
But there was an additional section in the interview that I found extremely interesting. Nyhan notes that it is easier to be a vaccine skeptic when you don’t actually see a lot of disease: “… many of the diseases that vaccines prevent today are essentially invisible in the US. Vaccines are a victim of their own success here.” This reminded me of a 2002 paper I wrote on STD/HIV transmission. In a model worked out by Kirby Schroeder and myself about people proposing to have risky sex with each other, we wrote that the model has an unusual prediction. If people are proposing risky sex based on how often their friends are infected, you may get unexpected outbreaks of disease:
In the models we have presented, there is no replacement; the population is stable. If we allow for replacement, then we arrive at a novel prediction: as uninfected individuals the population (through birth, migration, etc.) and HIV+ individuals leave (through illness), the proportion of infected individuals will decrease. Once this proportion falls, prior beliefs about the proportion of infected individuals will fall, and if this new prior belief is low enough , then HIV- negative individuals will switch from protected to unprotected sex. The long-term effect of replacement in our model, then, is an oscillation of infection rates… There is some evidence that oscillations in infection rates do occur… An intriguing avenue for research would be to link these patterns in infection rates to the behavior depicted in our model.
In other words, if your model of the world assumes that people take risk based on the infection rates of their buddies, then it is entirely possible, even predictable, that you will see sudden spikes or outbreaks because people “let their guard down.” For HIV, as more people use condoms and other measures, people may engage in more risky sex because few of their friends are infected. For measles and other childhood infections, people who live in very safe places may feel free to deviate from the standard practices that create that safety in their first place. I don’t know how to make vaccine skeptics change their minds, but I do know that movements like vaccine skepticism are some what predictable and we can prepare for it.
James Iveniuk is a doctoral candidate in sociology at the University of Chicago. He recently collected data on professors to understand how people choose their research specialty. He collected data on all professors at 97 ranked sociology doctoral programs in the US News & World Report. Click on this link: Iveniuk Discipline Analysis. Lots of fun results. In my view, this report supports the “Prada Bag hypothesis,” which suggests that the areas of cultural, politics, and historical are luxury items more likely to be found at higher ranked programs. Add your own interpretations in the comments.
Measuring such things is tough, but newly published research reports telling indicators can be found in bursts of 140 characters or less. Examining data on a county-by-county basis, it finds a strong connection between two seemingly disparate factors: deaths caused by the narrowing and hardening of coronary arteries and the language residents use on their Twitter accounts
“Given that the typical Twitter user is younger (median age 31) than the typical person at risk for atherosclerotic heart disease, it is not obvious why Twitter language should track heart disease mortality,” writes a research team led by Johannes Eichstaedt and Hansen Andrew Schwartz of the University of Pennsylvania. “The people tweeting are not the people dying. However, the tweets of younger adults may disclose characteristics of their community, reflecting a shared economic, physical, and psychological environment.”
Not a puzzle to me. I have argued that social media content is often an indicator – a smoke signal – of other trends. Thus, if people are stressed due to environmental conditions (the economy, unemployment), they will have heart attacks and write angry text. The only question is when the correlation holds. For more discussion of the more tweets/more votes/more anything phenomena, click here.
Last year, Nicholas Christakis argued that the social sciences were stuck. Rather that fully embrace the massive tidal wave of theory and data from the biological and physical sciences, the social sciences are content to just redo the same analysis over and over. Christakis’ used the example of racial bias. How many social scientists would be truly shocked to find that people have racial biases? If we already know that (and we do, by the way), then why not move on to new problems?
Christakis’ was recently covered in the media for his views and for attending a conference that tries to push this idea. To further promote this view, I would like to introduce Christakis’ Query, which every researcher should ask:
Think about the major question that you are working on and what you think the answer is. Estimate the confidence in your answer. If you already know the answer with more than 50% confidence, then why are you working on it? Why not move on?
Try it out.