Archive for the ‘mere empirics’ Category

science as a giant conscious hive mind

In a new article in Social Networks, Feng Shi, Jacob Foster, and James Evans argued that the complexity and diversity of scientific semantic networks creates very high rates of innovation. From Weaving the fabric of science: Dynamic network models of science’s unfolding structure:

Science is a complex system. Building on Latour’s actor network theory, we model published science as a dynamic hypergraph and explore how this fabric provides a substrate for future scientific discovery. Using millions of abstracts from MEDLINE, we show that the network distance between biomedical things (i.e., people, methods, diseases, chemicals) is surprisingly small. We then show how science moves from questions answered in one year to problems investigated in the next through a weighted random walk model. Our analysis reveals intriguing modal dispositions in the way biomedical science evolves: methods play a bridging role and things of one type connect through things of another. This has the methodological implication that adding more node types to network models of science and other creative domains will likely lead to a superlinear increase in prediction and understanding.

Bringing soc of science and network analysis together. Love it.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 12, 2015 at 12:01 am

oh, bonferroni!

I was recently working on a paper and a co-author said, “Yo, let’s slow down and Bonferroni.” I had never done that statistical position before and I thought it might hurt. I was afraid of a new experience. So, I popped out my 1,000 page Greene’s econometrics… and Bonferroni is not in there. It’s actually missing from a lot of basic texts, but it is very easy to explain:

If you are worried that testing multiple hypotheses, or running multiple experiments, will allow to cherry pick the best results, you should then lower the alpha for statistical tests of significance. If you test N hypotheses, your new “adjusted” alpha should be alpha/N.

Simple – ya? No. What you are doing is switching out Type 1 for Type 2 errors. You are increasing false negatives. So what should be done? There no consensus alternative. Andrew Gelman suggests a multi-level Bayesian approach, which is more robust to false positives. There are other methods. Probably something that should be built into more analyses. Applied stat mavens, use the comments to discuss your arguments for Bonferroni style adjustments.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 11, 2015 at 12:01 am

Posted in fabio, mere empirics

dear andrew perrin: i was wrong and you were right on the obesity and mortality correlation

A while back, Andrew and I got into an online discussion about the obesity/mortality correlation. He said it was true, I was a skeptic because I had read a number of studies that said otherwise. Also, the negative consequences of obesity can be mitigated via medical intervention. E.g., you may develop diabetes, but you can get treatment so you won’t die.

The other day, I wanted to follow up on this issue and it turns out that the biomedical community has come up with a more definitive answer. Using standard definitions of obesity (BMI) and mortality, Katherine Flegal, Broan Kit, Heather Orpana, and Barry I. Graubard conducted a meta-analysis of 97 articles that used similar measures of obesity and mortality. Roughly speaking, many studies report a positive effect, many report no effect, and some even report a negative effect. When you add them all together, you get a correlation between high obesity and mortality, but it is not true at ranges closer to non-overweight BMI. From the abstract of Association of All-Cause Mortality With Overweight and Obesity Using Standard Body Mass Index Categories: A Systematic Review and Meta-analysis, published in the 2013 Journal of the American Medical Association:

Conclusions and Relevance Relative to normal weight, both obesity (all grades) and grades 2 and 3 obesity were associated with significantly higher all-cause mortality. Grade 1 obesity overall was not associated with higher mortality, and overweight was associated with significantly lower all-cause mortality. The use of predefined standard BMI groupings can facilitate between-study comparisons.

In other words, high obesity is definitely correlated with mortality (Andrew’s claim). Mild obesity and “overweight” are correlated with less mortality (a weaker version of my claim). The article does not settle the issue of causation. It can be very likely that less healthy people gain weight. E.g., people with low mobility may not exercise or take up bad diets. Or people who are very skinny may be ill as well. Still, I am changing my mind on the basic facts – high levels of obesity increase mortality.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 4, 2015 at 12:01 am

dear UK: more tweets, more votes!!!!

Previous More Tweets, More Votes coverage

The Oxford Internet Institute reports that Twitter data picked up some of the trends in last week’s election, when traditional polling did poorly. In their blog, they ask – did social media suggest the massive upset from last week? Answer, somewhat:

The data we produced last night produces a mixed picture. We were able to show that the Liberal Democrats were much weaker than the Tories and Labour on Twitter, whilst the SNP were much stronger; we also showed more Wikipedia interest for the Tories than Labour, both things which chime with the overall results. But a simple summing of mention counts per constituency produces a highly inaccurate picture, to say the least (reproduced below): generally understating large parties and overstating small ones. And it’s certainly striking that the clearly greater levels of effort Labour were putting into Twitter did not translate into electoral success: a warning for campaigns which focus solely on the “online” element.

One of the strengths of our original paper on voting and tweets is that we don’t simply look at aggregate social media and votes. That doesn’t work very well. Instead, what works is relative attention. So I would suggest that the Oxford Institute look at one-on-one contests between parties in specific areas and then measure relative attention. In the US, the problem is solved because each Congressional district has a clearly identified GOP and Democratic nominee. The theory is that when you are winning people talk about you more, even the haters. People ignore losers. Thus, the prediction is that relative social media attention is a signal of electoral strength. I would also note that social media is a noisy predictor of electoral strength. In our data, the “Twitter signal” varied wildly in its accuracy. The correlation was definitely there, but some cases were really far off and we discuss why in the paper.

Finally, I have not seen any empirical evidence that online presence is a particularly good tool for political mobilization. Even the Fowler paper in Nature showed that Facebook based recruitment was paltry. So I am not surprised that online outreach failed for Labour.

Bottom: The Oxford Internet Institute should give us a call, we can help you sort it out!

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

May 11, 2015 at 12:01 am

alcoholism: the nuclear bomb of the life course

This is the last post for now about The Triumphs of Experience. In today’s post, I’d like to focus on one of the book’s major findings: the extreme damage done by alcoholism. In the study, the researchers asked respondents to describe their drinking. Using the DSM criteria and respondents’ answers, people were classified as occasional social drinkers, alcoholics and former alcoholics. Abstainers were very few so they receive no attention in the book. People were classified as alcoholics if they indicated that alcohol drinking interrupted their lives in any significant way.

The big finding is that alcoholism is correlated with nearly every negative outcome in the life course: divorce, early death, bad relationships with people, and so forth. I was so taken aback by the relentless destruction that I named alcoholism the “nuclear bomb” of the life course. It destroys nearly everything and even former alcoholics suffered long term effects. The exception is employment. A colleague noted that drinking is socially ordered to occur at night, so that may be a reason people can be “functioning” alcoholics during the day.

The book also deserves praise for adding more evidence to the longstanding debate over the causes of alcoholism. This is possible because the Grant Study has very rare, and detailed, longitudinal data. They are able to test the hypotheses that development of alcoholism is correlated with addictive personality (“oral” personality in older jargon), depression, and sociopathy. The data does not support these hypotheses.By itself, this is an important contribution.

The two factors that do correlate with alcoholism are having an alcoholic family member and the culture of drinking in the family. The first is probably a marker of a genetic predisposition. The second is about education – people may not understand how to moderate if they come from families that hide alcohol or abuse it. In other words, the family that lets kids have a little alcohol here and there are probably doing them a favor by teaching moderation.

Finally, the book is to be commended for documenting the ubiquity of alcoholism. In their sample, alcoholism occurs in about 25% of the sample of men at age 20. By the mid 40s, alcoholism reaches a peak, with about half of men being classified as alcoholics. After age 50, it then declines – mainly due to death and becoming a “former alcoholic.” If there is any generalizability at all to these findings, it shows that alcoholism has probably been wrecking the lives of millions and millions of people, somewhere between a quarter and half the population. That’s a profound, and shocking, finding.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

May 8, 2015 at 12:01 am

ethnography and micro-arrays


The ethnoarray!!!

A while back, I discussed a new technique for organizing and displaying information collected through qualitative methods like interviews and ethnography. The idea is simple: the rows are cases and the columns are themes. Then, you shade the matrix with color. More intense colors indicate that the case really matches the themes. Clustering of colors indicate clusters of similar cases.

Dan Dohan, who imported this technique in from the biomedical sciences, has a new article with Corey Abramson out that describes this process in detail. From Beyond Text: Using Arrays to Represent and Analyze Ethnographic Data in Sociological Methodology:

Recent methodological debates in sociology have focused on how data and analyses might be made more open and accessible, how the process of theorizing and knowledge production might be made more explicit, and how developing means of visualization can help address these issues. In ethnography, where scholars from various traditions do not necessarily share basic epistemological assumptions about the research enterprise with either their quantitative colleagues or one another, these issues are particularly complex. Nevertheless, ethnographers working within the field of sociology face a set of common pragmatic challenges related to managing, analyzing, and presenting the rich context-dependent data generated during fieldwork. Inspired by both ongoing discussions about how sociological research might be made more transparent, as well as innovations in other data-centered fields, the authors developed an interactive visual approach that provides tools for addressing these shared pragmatic challenges. They label the approach “ethnoarray” analysis. This article introduces this approach and explains how it can help scholars address widely shared logistical and technical complexities, while remaining sensitive to both ethnography’s epistemic diversity and its practitioners shared commitment to depth, context, and interpretation. The authors use data from an ethnographic study of serious illness to construct a model of an ethnoarray and explain how such an array might be linked to data repositories to facilitate new forms of analysis, interpretation, and sharing within scholarly and lay communities. They conclude by discussing some potential implications of the ethnoarray and related approaches for the scope, practice, and forms of ethnography.


50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

May 7, 2015 at 12:01 am

the importance of family for the entire life course

Today, we’ll continue discussing George Vaillant’s The Triumphs of Experience, the 70 year long life course study. One of the major findings of the study is the importance of early childhood family conditions. The initial phases of the study asked participants to describe their childhood environment. Were their parents open and warm? Cold and removed? Divorced or still married? Also, the Grant study investigators had the opportunity to interview parents and other family members on occasions. Did the interviewer think the mother was involved or removed?

Using these data, the Grant Study investigators coded a number of variables reflecting family environment. The recorded stratification variables (employed v. unemployed, working class v. upper class), structure (divorced v. married) and emotional content (warm parents vs. cold parents). Then, they looked at the associations with a number of key life course variables.Two answers:

  • First, having a warm father was associated with almost every positive life course outcome – flourishing in late age, not getting divorced, income. In some cases, the association is striking. In retirement, having a warm parent is associated with tens of thousands of dollars in additional income. That is amazing once you consider that this is an insanely biased sample of male Harvard grads. To push your income even higher in a batch of  doctors, executives, and attorneys is stunning.
  • Second, stratification variables don’t matter much. In other words, in this sample, having wealthy parents isn’t much of an asset.
  • Third, divorce of parents does not seem to matter either once you account for having warm parents and having positive coping strategies.

Bottom line: Social networks seem to be very crucial for the life course. Not for their direct instrumental features (aka social capital), but mainly for allowing people to maintain an emotional composure that allows them to solve problems and thrive.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

May 6, 2015 at 12:01 am

how to live a good life, the social science answer

This week, I will spend quite a bit of time discussing a book called The Triumphs of Experience by George Vaillant. I’ve written briefly about the book before, but I didn’t appreciate the magnitude of the book until I assigned it for a class. Roughly speaking, the book follows a cohort of college men from the 1940s to the mid 2000s. Thus, the book tracks people from young adulthood to old age. It’s a powerful book in that it uses enormously rich data to analyze the life course and identify factors that contribute to our well being. You won’t find many other books that have such deep data to address one of life’s most important questions – What makes us happy? What is the good life?

In this first post, I want to briefly summarize the book and then note a few drawbacks. Later this week, I want to delve into two topics in more detail: alcoholism and parental bonds. To start: the Grant Study of Human development randomly selected a few hundred male Harvard undergrads for a long term study on health and the life course. It’s a biased sample, but it’s well suited for studying long life and work (remember, many women became home makers in that era) while controlling for educational attainment. The strength of this book is an ability to mine rich qualitative data on the life course and then mapping the associations over decades. The data is rich enough that the authors can actually consider alternative hypotheses and build multi-cause explanations.

A few drawbacks: Rhetorically, I thought the book was a bit wordier and longer than it needed to be. Also, I wish that the book had a glossary or appendix where one can look up definitions. More importantly, this book will note be convincing to folks who are obsessed with identification. It is very “1960s” in that they collect a lot of data and then channel their energies into looking at cross-group differences. But still, considering that doing RCT with your family is not possible and the importance of the data, I’m willing to forgive. Wednesday: The importance of your family.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

May 4, 2015 at 6:03 pm

gay identities and occupational segregation

Ray Fisman and Tim Sullivan, an emeritus guest blogger, have written an article in Slate about the clustering of LBGT workers into specific occupations. In other words, is there any truth to the view that LBGT people tend to go into specific professions like cosmetology? Fisman and Sullivan use an ASQ paper to discuss the issue. The idea is simple – LBGT people probably are attracted to jobs that either (a) require subtle interactional skills, which they have cultivated because they live in a hostile environment or (b) they seek jobs where they can work by themselves so they don’t have to deal with hostility or constantly trying to stay submerged. From Fisman and Sullivan’s analysis:

The central thesis of Tilcsik, Anteby, and Knight’s paper is that gays and lesbians will tend to be employed at high rates in occupations that require social perceptiveness, allow for task independence, or both. They test their theory using data from the American Community Survey—a gargantuan study of nearly 5 million Americans conducted annually by the U.S. Census Bureau—and the U.S. National Longitudinal Study of Adolescent Health (Add Health), an ongoing study that has followed the same group of Americans since 1994. All Add Health respondents were in middle or high school in the mid-1990s, so they were just beginning to settle into their careers around 2008, the year the study uses for its analyses. Both data sets include questions that can be used to infer sexual orientation, as well as information on respondents’ occupations.

The authors connected these data to assessments of the extent to which particular jobs require social perceptiveness and whether they allow for task independence, which come from ratings from the Occupational Information Network, a survey of employees on what they see as their job requirements and attributes. The survey seems particularly well-suited to the researchers’ task. One question asks the extent to which workers “depend on themselves rather than on coworkers and supervisors to get things done” (task independence), while another asks whether “being aware of others’ reactions and understanding why they react as they do is essential to the job” (social perceptiveness).

The link between these attributes and sexual orientation is immediately apparent from browsing the list of the top 15 occupations with the highest proportions of gay and lesbian workers. Every single one scores relatively high on either social perceptiveness or task independence, and most vocations score high on both. According to the authors’ calculations, the proportion of gays and lesbians in an occupation is more than 1.5 times higher when the job both has high task independence and requires social perceptiveness.

Clever paper! The paper is also an excellent contribution to studies of occupational segregation that go beyond stories of human capital. Recommended!

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

April 24, 2015 at 12:01 am

lessons from the social organization of sexuality

In 1994, The Social Organization of Sexuality was published. The authors, Ed Laumann, John Gagnon, Robert Michael and Stuart Michaels,conducted a large N survey of a random sample of Americans. I use the book in my freshman class to discuss sexual behavior. In today’s post, I will discuss what sociologists should take away from the book.

1. Doing a well crafted large N survey on an important topic is huge service to science. When we think of sociology, we often think of “high theory” as being the most important. But we often overlook the empirical studies that establish a baseline for excellence. American Occupational Structure is just as important as Bourdieu, in my book. Laumann et al is one such study and, I think, has not been surpassed in the field of sex research.

2. The book is extremely important in that good empiricism can abruptly change our views of specific topics. Laumann et al basically shattered the following beliefs: people stop having sex as they age; marriage means sex is less frequent; cultural change leads to massive changes in sexual behavior. Laumann et al showed that older people do keep on having sex; married people have more sex; and cultural moments (like AIDS in the 80s) have modest effects on sexual behavior. Each of these findings has resulted in more research over the last 20 years..

3. An ambitious, but well executed, research project can be the best defense against critics. The first section of Laumann at al. describes how federal funding was dropped due to pressure. Later, the data produced some papers that had politically incorrect results. In both cases, working from the high ground allowed the project to proceed. It’s a model for any researchers who will be working against the mainstream of their discipline or public opinion.

4. Quality empiricism can lead to good theory. Laumann et al’s sections on homophily motivated later theory about the structure of sexual contact networks and prompted papers like Chains of Affection. Also, by discovering that network structure affects STD’s, it lead to the introduction of network theory into biomedical science about a decade before Fowler/Christakis.

When we think of “glory sociology,” we think of succinct theoretical “hits” like DiMaggio and Powell or Swidler. But sociology is also profoundly shaped by these massive empirical undertakings. The lesson is that well crafted empirical research can set the agenda for decades just as much as the 25 page theory article.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

April 23, 2015 at 12:01 am

we often over-estimate medicine


Via Vox: A JAMA Internal Medicine article discusses how people systematically over estimate the benefits of medical treatment. This speaks to a broader issue – we under value things like exercise, diet, sanitation, and vaccination for health and over value “hero medicine” and fancy interventions.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!! 



Written by fabiorojas

March 11, 2015 at 12:17 am

Posted in fabio, mere empirics

computational ethnography #3: the battle continues

A few weeks ago, I suggested that one can use techniques from computer science to assess, measure, and analyze the field notes and interviews that one collects during field work. The reason is that computer scientists have made progress in writing algorithms that try to pick up the emotional tenor or meaning of texts. Not perfect by any means, but it would be a valuable tool that can be used to help qualitative researchers identify themes and patterns in the text.

In the last round, there were two comments that I want to address. First, Krippendorf wrote: “Why call it computational ethnography and not just text analysis?” Answer: There are two existing modes of analyzing text and techniques like sentiment analysis and topic modelling new things in new ways. Allow me to explain:

  • The traditional way of reading qualitative texts is simply for the researcher to read the texts and develop a grounded understanding of the meaning that the text represents. This is the standard mode among historians, most anthropologists, and some sociologists. Richard Biernacki in Reinventing Evidence in Social Inquiry argued that is the only valid mode of qualitative analysis.
  • The other major way to deal with qualitative materials is to conduct a two step operation of having people code the data (using key words or other instructions) and then performing an inter coder reliability analysis (i.e., assign codes to texts and compute Krippendorf alpha’s).

So what is new? Techniques like topic models or sentiment analysis do not use people to code data. After you train the algorithms, it is all automated. This has advantages – speed, reproducibility, and so forth – for large data. Another novel aspect is that these algorithms are usually built with some sort of model of language in mind that gives you insight into how the text was coded. For example, the Stanford NSL package essentially breaks down sentences by grammar and then estimates the distribution of words with specific sentiment. Thus, there is an explanation for every output. In contrast, I can’t reproduce even my own codes over time. Give me a set of text next week, and it will be coded a little different.

Second, a number of commenters were concerned about the open ended nature of notes, the volume of materials, and whether the sorts of things that might be extracted would be useful to sociologists. These comments are easily addressed. Lots of projects produce tons of notes. I recently collected 194 open ended interviews. My antiwar project resulted in dozens and dozens of interviews. We have the volume. Sometimes they are standardized, sometimes not. That’s an empirical issue – how badly does it do with unstructured text? Maybe better than we expect. There is no reason for an a priori dismissal. Finally, I think a little induction is helpful. Yes, we can now pick up sentiment, which is an indicator of emotion, but why not let the data speak to us a little? In other, there’s a whole new world around the corner. This is one step in that direction.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

March 10, 2015 at 12:01 am

measles, HIV, brendan nyhan, and an obscure paper I wrote in 2002

Vox has a nice interview with Dartmouth political scientist Brendan Nyhan about vaccine skeptics. What can be done to convince them? Brendan does research on political beliefs and has shown that in experimental settings, people don’t like to change beliefs even when confronted with correct information. His experiments show that this is true not only for political beliefs, but also controversial health beliefs like believing in the vaccine-autism link.

But there was an additional section in the interview that I found extremely interesting. Nyhan notes that it is easier to be a vaccine skeptic when you don’t actually see a lot of disease: “… many of the diseases that vaccines prevent today are essentially invisible in the US. Vaccines are a victim of their own success here.” This reminded me of a 2002 paper I wrote on STD/HIV transmission. In a model worked out by Kirby Schroeder and myself about people proposing to have risky sex with each other, we wrote that the model has an unusual prediction. If people are proposing risky sex based on how often their friends are infected, you may get unexpected outbreaks of disease:

In the models we have presented, there is no replacement; the population is stable. If we allow for replacement, then we arrive at a novel prediction: as uninfected individuals the population (through birth, migration, etc.) and HIV+ individuals leave (through illness), the proportion of infected individuals will decrease. Once this proportion falls, prior beliefs about the proportion of infected individuals will fall, and if this new prior belief is low enough , then HIV- negative individuals will switch from protected to unprotected sex. The long-term effect of replacement in our model, then, is an oscillation of infection rates… There is some evidence that oscillations in infection rates do occur… An intriguing avenue for research would be to link these patterns in infection rates to the behavior depicted in our model.

In other words, if your model of the world assumes that people take risk based on the infection rates of their buddies, then it is entirely possible, even predictable, that you will see sudden spikes or outbreaks because people “let their guard down.” For HIV, as more people use condoms and other measures, people may engage in more risky sex because few of their friends are infected. For measles and other childhood infections, people who live in very safe places may feel free to deviate from the standard practices that create that safety in their first place. I don’t know how to make vaccine skeptics change their minds, but I do know that movements like vaccine skepticism are some what predictable and we can prepare for it.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

February 9, 2015 at 12:01 am

asian american privilege? a skeptical, but nuanced, view, and a call for more research – a guest post by raj ghoshal and diana pan

Raj Andrew Ghoshal is an assistant professor of sociology at Goucher College and Yung-yi Diana Pan is an assistant professor of sociology at Brooklyn College. This guest post is a discussion of Asian Americans and their status in American society.

As a guest post last month noted, Asian Americans enjoy higher average incomes than whites in the United States. We were critical of much in that post, but believe it raises an under-examined question: Where do Asian Americans stand in the US racial system? In this post, we argue that claims of Asian American privilege are premature, and that Asian Americans’ standing raises interesting questions about the nature of race systems.

We distinguish two dimensions of racial stratification: (1) a more formal, mainly economic hierarchy, and (2) a system of social inclusion/exclusion. This is a line of argument developed by various scholars under different names, and in some ways parallels claims that racial sterotypes concern both warmth and competence. We see Asian Americans as still behind in the more informal system of inclusion/exclusion, while close (but not equal) to whites in the formal hierarchy. Here’s why.

Read the rest of this entry »

Written by fabiorojas

February 4, 2015 at 12:01 am

how sociology professors choose their specialties – a guest post by james iveniuk

James Iveniuk is a doctoral candidate in sociology at the University of Chicago. He recently collected data on professors to understand how people choose their research specialty. He collected data on all professors at 97 ranked sociology doctoral programs in the US News & World Report. Click on this link: Iveniuk Discipline Analysis. Lots of fun results. In my view, this report supports the “Prada Bag hypothesis,” which suggests that the areas of cultural, politics, and historical are luxury items more likely to be found at higher ranked programs. Add your own interpretations in the comments.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

February 2, 2015 at 12:01 am

more (angry) tweets, more heart attacks

PS Magazine reports on research that links tweet sentiment and health:

Measuring such things is tough, but newly published research reports telling indicators can be found in bursts of 140 characters or less. Examining data on a county-by-county basis, it finds a strong connection between two seemingly disparate factors: deaths caused by the narrowing and hardening of coronary arteries and the language residents use on their Twitter accounts

“Given that the typical Twitter user is younger (median age 31) than the typical person at risk for atherosclerotic heart disease, it is not obvious why Twitter language should track heart disease mortality,” writes a research team led by Johannes Eichstaedt and Hansen Andrew Schwartz of the University of Pennsylvania. “The people tweeting are not the people dying. However, the tweets of younger adults may disclose characteristics of their community, reflecting a shared economic, physical, and psychological environment.”

Not a puzzle to me. I have argued that social media content is often an indicator – a smoke signal – of other trends. Thus, if people are stressed due to environmental conditions (the economy, unemployment), they will have heart attacks and write angry text. The only question is when the correlation holds. For more discussion of the more tweets/more votes/more anything phenomena, click here.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 


Written by fabiorojas

January 30, 2015 at 12:19 am

christakis’ query

Last year, Nicholas Christakis argued that the social sciences were stuck. Rather that fully embrace the massive tidal wave of theory and data from the biological and physical sciences, the social sciences are content to just redo the same analysis over and over. Christakis’ used the example of racial bias. How many social scientists would be truly shocked to find that people have racial biases? If we already know that (and we do, by the way), then why not move on to new problems?

Christakis’ was recently covered in the media for his views and for attending a conference that tries to push this idea. To further promote this view, I would like to introduce Christakis’ Query, which every researcher should ask:

Think about the major question that you are working on and what you think the answer is. Estimate the confidence in your answer. If you already know the answer with more than 50% confidence, then why are you working on it? Why not move on?

Try it out.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

January 27, 2015 at 12:02 am

there’s probably a lot more cheating with the patriots

So far, the Patriots have been nailed on two cheating scandals – deflation gate 2015 and the 2006 spying scandal. Each of these is interesting in its own right but there is one implication that few are willing to utter. The Patriots are probably cheating in more ways than we imagine.

The intuition is simple. Cheating incidents are not independent. It is not likely that every person will cheat with equal probability. Rather, people who want to cheat are the most likely to cheat and do so over and over. Also, consider incentives. They have been caught cheating multiple times and that hasn’t seemed to harm them much at all. The conclusion is that it is highly likely the Patriots are cheating in other ways.

I think it would be interesting for the fans of vanquished teams to conduct Levitt style analyses of the Patriots. I would guess that looking at other data in addition to the now famous fumble analysis will yeild some interesting answers.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

January 26, 2015 at 12:45 am

help me! this is my hot mess in teaching social networks

Question for readers who teach networks: What software should I use for low tech undergrads? So far, I am having some real challenges…

I have an undergrad class where the first major assignment is to download one’s Facebook network and analyze it. I have been using NetVizz, an app inside Facebook, to extract network data. But it suddenly disappeared! One solution is to use the Facebook importer in NodeXL. That works but… Windows 8 is highly allergic to NodeXL. And lots of people have Windows 8 and they have endless installation problems. And the Java version is an issue. Even when it does work, NodeXL gets stuck downloading data from some student accounts. No explanation. It just does.

Then one can try Gephi, which is a whole ball of wax. The issue with Gephi is that it is highly sensitive to OS version. Luckily, there are fixes but they often involve Mac esoterica (e.g., Apple support does weird things in Safari, but not Chrome). Even then, students have all kinds of unexplained Gephi problems (e.g., the visualization pane simply doesn’t work on some Macs).

I need people to download a spreadsheet of data (e.g., centrality scores for people in your network) and not just pictures, so the Wolfram App and others are of limited value. Also, Wolfram seems to stall on some machines (including a Mac I have at home). I tried installing UCINET on Windows 8 as an end run… but had installation problems.

Here are my requirements. I need software that:

  • Can be easily used by low-math undergrads
  • Low cost/free
  • Is very stable in terms of Windows 7, 8 and various Mac OS versions.
  • If possible, a way to import Facebook data, and produce spreadsheets of data.

The last time two times I did this course, NetVizz, Gephi and UCINET did the trick. But there is a new generation of operating systems and the usual software hasn’t been upgraded and thoroughly tested. In previous years, I might have only or two students who couldn’t get network software running. This semester, it is a third of the class. Argh.

Any advice is welcome.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 


Written by fabiorojas

January 24, 2015 at 12:19 am

Posted in fabio, mere empirics

defending computational ethnography

Earlier this week, I suggested a lot is to be gained by using computational techniques to measure and analyze qualitative materials, such as ethnographic field notes. The intuition is simple. Qualitative research uses, or produces, a lot of text. Normally, we have to rely on the judgment of the researcher. But now, we have tools that can help us measure and sort the materials, so that we have a firmer basis on which to make claims about what our research does and does not say.

The comments raised a few issues. For example, Neal Caren wrote:

 This is like saying that you want your driverless cars to work for Uber while you are sleeping. While it sounds possible, as currently configured neither ethnographic practices nor quantitative text analysis are up to the task.
This is puzzling. No one made this claim. If people believe that computers will do qualitative work by collecting data or developing hypotheses and research strategies, then they are mistaken. I never said that nor did I imply it. Instead, what I did suggest is that computer scientists are making progress on detecting meaning and content and are doing so in ways that would help research map out or measure text. And with any method, the researcher is responsible for providing definitions, defining the unit of analysis and so forth. Just as we don’t expect regression models to work “while you are sleeping,” we don’t expect automated topic models or other techniques to work without a great level of guidance from people. It’s just a tool, not a magic box.
Another comment was meant as a criticism, but actually supports my point. For example, J wrote:
This assumes that field notes are static and once written, go unchanged. But this is not the consensus among ethnographers, as I understand the field. Jonathan van Maanen, for example, says that field notes are meant to be written and re-written constantly, well into the writing stage. And so if this is the case, then an ethnographer can, implicitly or intentionallly, stack the deck (or, in this case, the data) in their favor during rewrites. What is “typical” can be manipulated, even under the guise of computational methods.
Exactly. If we suspect that field notes and memos are changing after each version, we can actually test that hypothesis. What words appear (or co-appear) in each version? Do word combinations with different sentiments or meanings change in each version? I think it would be extremely illuminating to see what each version of an ethnographer’s notes keeps or discards. Normally, this is impossible to observe and, when reported (which is rare), hard to measure. Now, we actually have some tools.
Will computational ethnography be easy or simple? No. But instead of pretending that qualitative research is buried in a sacred and impenetrable fog of meaning, we can actually apply the tools that are now becoming routine in other areas for studying masses of text. It’s a great frontier to be working in. More sociologists should look into it.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

January 23, 2015 at 12:01 am

computational ethnography

An important frontier in sociology is computational ethnography – the application of textual analysis, topic modelling, and related techniques to the data generated through ethnographic observation (e.g., field notes and interview transcripts). I got this idea when I saw a really great post-doc present a paper at ASA where historical materials were analyzed using topic modelling techniques, such as LDA.

Let me motivate this with a simple example. Let’s say I am a school ethnographer and I make a claim about how pupils perceive teachers. Typically, the ethnographer would offer an example from his or her field notes that illustrates the perceptions of the teacher. Then, someone would ask, “is this a typical observation?” and then the ethnographer would say, “yes, trust me.”

We no longer have to do that. Since ethnographers produce text, one can use topic models to map out themes or words that tend to appear in field notes and interview transcripts. Then, all block quotes from fields notes and transcripts can be compared to the entire corpus produced during field work. Not only would it attest to the commonality of a topic, but also how it is embedded in a larger network of discourse and meaning.

Cultural sociology, the future is here.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

January 20, 2015 at 12:01 am

more tweets, more votes: it works for TV!!!


Within informatics, there is a healthy body of research showing how social media data can be used for forecasting future consumption. The latest is from a study by Nielsen, which shows some preliminary evidence that Twitter activity forecasts television program popularity. In their model, adding Twitter data increases the explained variance in how well a TV show will in addition to data on promotions and network type. Here’s the summary from Adweek.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

January 14, 2015 at 12:07 am

building computational sociology: from the academic side

Before the holiday, we asked – what should computational sociologists know? In this post, I’ll discuss what sociology programs can do:

  • Hire computational sociologists. Except for one or two cases, computational sociologists have had a very tough time finding jobs in soc programs, especially the PhD programs. That has to change, or else this will be quickly absorbed by CS/informatics. We should have an army of junior level computational faculty but instead the center of gravity is around senior faculty.
  • Offer courses: This is a bit easier to do, but sociology lags behind. Every single sociology program at a serious research university, especially those with enginerring programs should offer undergrad and grad courses.
  • Certificates and minors: Aside from paperwork, this is easy. Hand out credentials for a bundle of soc and CS courses.
  • Hang out: I have learned so much from hanging out with the CS people. It’s amazing.
  • Industry: This deserves its own post, but we need to develop a model for interacting with industry. Right now, sociology’s model is: ignore it if we can, lose good people to industry, and repeat. I’ll offer my own ideas next week about how sociology can fruitfully interact with the for profit sector.

Add your own ideas in the comments.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

January 2, 2015 at 5:27 am

zeynep tufekci and brayden king on data and privacy in the new york times

My co-bloggers are on a roll. Zynep Tufekci and Brayden King have an op-ed in the New York Times on the topic of privacy and data:

UBER, the popular car-service app that allows you to hail a cab from your smartphone, shows your assigned car as a moving dot on a map as it makes its way toward you. It’s reassuring, especially as you wait on a rainy street corner.

Less reassuring, though, was the apparent threat from a senior vice president of Uber to spend “a million dollars” looking into the personal lives of journalists who wrote critically about Uber. The problem wasn’t just that a representative of a powerful corporation was contemplating opposition research on reporters; the problem was that Uber already had sensitive data on journalists who used it for rides.

Buzzfeed reported that one of Uber’s executives had already looked up without permission rides taken by one of its own journalists. Andaccording to The Washington Post, the company was so lax about such sensitive data that it even allowed a job applicant to view people’s rides, including those of a family member of a prominent politician. (The app is popular with members of Congress, among others.)

Read it. Also, the Economist picked up Elizabeth and Kieran’s posts 0n inequality and airlines.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

December 8, 2014 at 4:39 am

“chicago economics,” “chicago sociology,” and “chicago anything else”

Near the end of James Heckman’s lecture on the scholarly legacy of Gary Becker, Heckman argued that Becker was a fine addition to the legacy of “Chicago economics.” He didn’t mean that Becker was a monetarist – the “Chicago school” of Friedman and his followers. Instead, he meant that Becker fit in well with the long tradition of great Chicago economic thinkers including not only free marketers (like Friedman) but also liberals (Paul Douglas), socialists (Oscar Lange), and weirdos (Thorstein Veblen). But what does that mean? Here is what it means:

  1. People know the whole field of economics, they aren’t just narrow specialists.
  2. Economics is not a parlor game. It is important.
  3. Empirical work is important and it is not devalued.

Thumbs up. But let me extend it. This Chicago attitude should extend to the whole of social sciences. People ask me, for example, why I was so damn harsh on the critical realists and the post-modernists. Why? Because what I do is important. It is empirical and it reflects what I’ve learned from absorbing the hard earned lessons of my predecessors. So when I see scholarship sink into a miasma of words, or the toy tinkering with cuteonomics, I can only conclude that the person is here to play games, not figure out how the world works. Excuse me while I get back to work.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

December 1, 2014 at 12:01 am

snow snow snow

Here in the non-Buffalo part of upstate New York, we just got our first big snow dump of the year. Okay, it was seven inches, not sixty, but enough to create that Winter Wonderland effect. Fortunately for us, my family’s not traveling till Saturday, so we’re not stuck in an airport or behind an accident on the interstate, but watching from our cozy living room.

Last year, we were living in central New Jersey. It’s only 3 1/2 hours to the south, but what a world of difference in terms of weather. 2013-14 was one of the ten snowiest winters in NJ, but it was still a bit less snowy than an average winter in Albany. (And Albany only gets two-thirds the snow of Buffalo, and just over half that of Syracuse.)

The big difference, of course, is that Albany is prepared for 60 inches of snow a year. Central New Jersey is not.

So, you know, we did all the things that northerners do when faced with the obvious weakness of those in more southerly climes — mostly mock them for closing things down at the first indication of snow. Of course, we realize that that’s just compensating for the fact that we live somewhere with six months of winter, but we’ll take what we can get.

Anyway, there was a map going around last winter that showed the inches of snow at which school is typically canceled in various places in the U.S. (It originally came from an awesome sounding Reddit called MapPorn.)


Read the rest of this entry »

Written by epopp

November 27, 2014 at 9:09 pm

a comment about agent based models in sociology in response to freese

About a week ago, Jeremy stopped by ye olde alma mater to give a talk on some new work. I was at SocInfo 2014, but my spies told me he made a quip about me. He mentioned that I thought that computer simulations were on the decline, even though his talk was about simulations. Of course, haters being haters,* the whole thing got blown out of proportion. Maybe, but it almost came to fisticuffs.**

Still, there remained a basic point – was I wrong? First, it helps to clarify. I never said that simulations were declining overall. In fact, simulations are a core technique in engineering, biology, physics, and computer science. Simulations also have a long history in *some* social science areas. Demographers, for example, have used them for population projections for decades. So, I fully admit (and have always admitted) that outside sociology, simulations are alive and well.

My point is specific to sociology. Simulations are honestly quite rare. Sure, a few folks do them. James Kitts, Kathleen Carley, and Peter Bearman are card carrying sociologists who have routinely used simulations. But how frequent is this? Not very, I’d hazard that less than 5% of papers in our main soc journals (top 4 + regionals). And yes, a few famous papers have been simulations (the 72 Cohen, March, and Olson comes to mind), but that generally doesn’t trigger a wave of simulations *IN SOCIOLOGY.* For example, how many authors in the Journal of Artificial Societies and Social Simulation  have been tenure track faculty in sociology programs? Some, but not that many. How many readers of this blog have ever read JASSS?

I’d love to be wrong. I would love for their to be a large and growing contingent of social simulation in sociology programs.But right now, it’s niche area. Why? My guess is that there is a lot of inertia and there is a selection effect. I hope that changes.

* Hater = Fabio looking for twitter action on a Sunday night.

** How old people fight.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power 

Written by fabiorojas

November 25, 2014 at 12:01 am

Posted in fabio, mere empirics

socinfo 2014

If you are in Spain, you might enjoy attending the SocInfo 2014 conference which will be held at the Yahoo Headquarters in Barcelona. The goal is to bring together people at the edge of computer science and social science. Click here for the program and details on papers.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

November 5, 2014 at 12:01 am

Posted in fabio, mere empirics

waves of internet commerce

Each wave of Internet commerce “solves” a particular social issues related to computing:

  • 1980s: PC computing – how to get everyone a machine
  • 1990s: The email/Internet revolution – how to get everyone hooked up into the system
  • 1990s (late): The Amazon/Ebay/PayPal eruption – how to solve the issue of exchanging physical goods using the Internet
  • 1990s (late): The indexing revolution – how to help people quickly find information.
  • 2000s: Social networking – how to build systems of tailor made communication
  • 2000s (mid): Mobile devices – how to make computing mobile

So what is the 2010s about? My guess is that we’re in the middle of Internet as commodity monetizer. AirBnB turns extra rooms into lodging, Uber turns cars into transport for hire, and so forth. Interesting observation: Apple and Google are the only firms to successfully participate in two waves.

Add your comments below.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power 

Written by fabiorojas

November 3, 2014 at 12:22 am

Posted in fabio, mere empirics

letters of recommendation: still garbage

Long time readers know that I am a skeptic when it comes to letters of recommendation. The last time I wrote about the topic, I relied on a well cited 1993 article by Aamodt, Bryan amd Whitcomb in Public Personnel Management that reviews the literature and shows that LoR’s have very little validity. I.e., they are poor predictors of future job performance. But what if the literature has changed in the meanwhile? Maybe these earlier studies were flawed, or based on limited samples, or better research methods provide more compelling answers. So I went back and read some more recent research on the validity of LoRs. The answer? With a few exceptions, still garbage.

For example, the journal Academic Medicine published a 2014 article that analyzed LoR for three cohorts of students at a medical school. From the abstract:

Results: Four hundred thirty-seven LORs were included. Of 76 LOR characteristics, 7 were associated with graduation status (P ≤ .05), and 3 remained significant in the regression model. Being rated as “the best” among peers and having an employer or supervisor as the LOR author were associated with induction into AOA, whereas having nonpositive comments was associated with bottom of the class students.

Conclusions: LORs have limited value to admission committees, as very few LOR characteristics predict how students perform during medical school.

Translation: Almost all information in letters is useless, except the occasional negative comment (which academics strive not to say). The other exception is explicit comparison with other candidates, which is not a standard feature of many (or most?) letters in academia.

Ok, maybe this finding is limited to med students. What about other contexts? Once again, LoRs do poorly unless you torture specific data out of them. From a 2014 meta-analysis of LoR recommendation research in education from the International Journal of Selection and Assessment:

… Second, letters of recommendation are not very reliable. Research suggests that the interrater reliability of letters of recommendation is only about .40 (Baxter, et al., 1981; Mosel & Goheen, 1952, 1959; Rim, 1976). Aamodt, Bryan & Whitcomb (1993) summarized this issue pointedly when they noted, ‘The reliability problem is so severe that Baxter et al. (1981) found that there is more agreement between two recommendations written by the same person for two different applicants than there is between two people writing recommendations for the same person’ (Aamodt et al., 1993, p. 82). Third, letter readers tend to favor letters written by people they know (Nicklin & Roch, 2009), despite any evidence that this leads to superior judgments.

Despite this troubling evidence, the letter of recommendation is not only frequently used; it is consistently evaluated as being nearly as important as test scores and prior grades (Bonifazi, Crespy, & Reiker, 1997; Hines, 1986). There is a clear and gross imbalance between the importance placed on letters and the research that has actually documented their efficacy. The scope of this problem is considerable when we consider that there is a very large literature, including a number of reviews and meta-analyses on standardized tests and no such research on letters. Put another way, if letters were a new psychological test they would not come close to meeting minimum professional criteria (i.e., Standards) for use in decision making (AERA, APA, & NCME, 1999). This study is a step toward addressing this need by evaluating what is known, identifying key gaps, and providing recommendations for use and research. [Note: bolded by me.]

As with other studies, there is a small amount of information in LoRs. The authors note that “… letters do appear to provide incremental information about degree attainment, a difficult and heavily motivationally determined outcome.” That’s something, I guess, for a tool that would fail standard tests of validity.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power 

Written by fabiorojas

October 29, 2014 at 12:01 am

hector cordero-guzman on measuring latino ethnicity

Hector Cordero-Guzman is a sociologist at CUNY who writes extensively on immigration, ethnicity, and related topics. In relation to our post on race agnosticism, Hector reminded me that he wrote a post on measuring race for the blog Latino Rebels. In the post, he describes his reaction and analysis to the claim that Latinos were increasingly self-identifying as white. From the post:

Recently, a controversy about Latinos and racial classifications has led to heated debate based on a toxic mix of incomplete conclusions from research and rampant speculation.

A draft presentation at the Population Association of America (PAA) chronicled by a Pew Research senior writer was then picked up by Nate Cohn, writing for The New York Times’ “Upshot” blog.  In the eyes of Cohn, his editor David Leonhardt and the Times, and based on a report that the scientific community has not seen or evaluated, Latinos were becoming “whiter.”

Surrounding all the controversy and discussion about reporting on research that was not available for inspection or review by other academics, two explanations to the tentative result from the unavailable census study have emerged: that the people changed (Cohn, Leonhardt and The Times) or that the census questions changed (Manuel Pastor in the HuffPost).

He follows with an analysis that can be summarized as:

A second possibility is that the context where the question is asked matters and that asking about race in Puerto Rico is different than asking the same population about their race in New York City. The question is not changing and the people are not changing—what is changing is the context, the reference point, the broader racial classification schema and categories that are used, how they are interpreted, their subjective meaning, and their social and sociological role.

Cohn further argues that the reported change in the answers given to the race question suggest Hispanic assimilation into the U.S. and into its racial classification schema. If anything, comparing data from Puerto Rico and Puerto Ricans in New York City suggests that mainland Puerto Ricans develop a sense of “otherness” as they come into closer contact with the U.S. racial classification regime. In fact, it would be interesting to compare the data from Puerto Rico with data from Puerto Ricans throughout the U.S. (not just New York City), those residing in various regions, as well as looking at the more recent arrivals to see if the categories they pick are different from Puerto Ricans that have been living on the mainland for a longer period of time.

In other words, study context acts as important cue for creating interpretations of race on surveys. The whole post is highly recommended.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power 


Written by fabiorojas

October 7, 2014 at 2:10 am

nussbaum on GDP alternatives

Written by fabiorojas

October 4, 2014 at 12:01 am

race agnosticism: commentary on ann morning’s research

Earlier this week, Ann Morning of NYU sociology gave a talk at the Center for Research on Race and Ethnicity in Society. Her talk summarized her work on the meaning of race in varying scientific and educational contexts. In other words, rather than study what people think about other races (attitudes), she studies what people think race is.  This is the topic of her book, The Nature of Race.

What she finds is that educated people hold widely varying views of race. Scientists, textbook writers, and college students seem to have completely independent views of what constitutes race. That by itself is a key finding, and raises numerous other questions. Here, I’ll focus on one aspect of the talk. Morning finds that experts do not agree on what race is. And by experts, she means Ph.D. holding faculty in the biological and social sciences that study human variation (biology, sociology, and anthropology). This finding shouldn’t be too surprising given the controversy of the subject.

What is interesting is the epistemic implication. Most educated people, including sociologists, have rather rigid views. Race is *obviously* a social convention, or race is *obviously* a well defined population of people. Morning’s finding suggests a third alternative: race agnosticism. In other words, if experts in human biology, genetics, and cultural studies themselves can’t agree and these disagreements are random (e.g., biologists themselves disagree quite a bit), then maybe other people should just back off and admit they don’t know.

This is not a comfortable position since fights over the nature of human diversity are usually proxies for political fights. Admitting race agnosticism is an admission that you don’t know what you’re talking about. Your entire side in the argument doesn’t know what it’s talking about. However, it should be natural for a committed sociologist. Social groups are messy and ill defined things. Statistical measures of clustering may suggest that the differences among people are clustered and nonrandom, but jumping from that observation to clearly defined groups is very hard in many cases. Even then, it doesn’t yield the racial categories that people use to construct their social worlds based on visual traits, social norms, and learned behaviors. In such a situation, “vulgar” constructionism and essentialism aren’t up to the task. When the world is that complicated and messy, a measure of epistemic humility is in order.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

October 3, 2014 at 12:01 am

the declining role of simulations in sociology

Over the weekend, I got into an exchange with UMD management student Robert Vesco over the computer science/sociology syllabus I posted last week. The issue, I think, is that he was surprised that the course narrowly focused on topic modelling – extracting meaning from text. Robert thought that maybe there should be a different focus. He proposed an alternative – teaching computer science via simulations. Two reactions:

First, topic modelling may seem esoteric to computer scientists but it lies at the heart of sociology. We have interviews, field notes, media – all kinds of text. And we can move beyond the current methods of having humans slowly code the data, which is often not reliable. Also, text is “real data.” You can easily link what you extract from a topic modelling exercise to traditional statistical analysis.

Second, simulations seem to have a historically limited role in sociology. I find this sad because my first publication was a simulation. I think the reason is that most sociologists work with simple linear models. If you examine nearly all quantitative work, you see that most statistical analyses use OLS and its relatives (logits, event history, Tobit. Heckman, etc). There’s always a linear model in there. Also, in the rare cases where sociologists use mathematical models for theory, they tend to use fairly simple models to express themselves.

Simulation is a form of numerical analysis – an estimate of the solutions of a system of equations that is obtained by random draws from the phase space. You would only need to do this if the models were too complicated to solve analytically, or the solution is too complex to describe in a simple fashion. In other words, if you have a lot of moving parts, it makes sense to do a simulation. Since sociological models tend to be very simple, there is little demand for simulations.

Robert asked about micro-macro transitions. This proves my point. A lot of micro-macro models in sociology tend to be fairly simple and stated verbally. For example, many versions of institutionalism predict diffusion driven by elites. Thus, downward causation is described by a simple model. More complex models are possible, but people seem not to care. Overall, simulation is cool, but it just isn’t in demand. Better to teach computer science with real data.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

September 30, 2014 at 12:01 am

Posted in fabio, mere empirics

a whole pile of piketty

Econlog collects a few links:

Bon appetit.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power


Written by fabiorojas

September 26, 2014 at 12:02 am