orgtheory.net

Archive for the ‘mere empirics’ Category

there’s probably a lot more cheating with the patriots

So far, the Patriots have been nailed on two cheating scandals – deflation gate 2015 and the 2006 spying scandal. Each of these is interesting in its own right but there is one implication that few are willing to utter. The Patriots are probably cheating in more ways than we imagine.

The intuition is simple. Cheating incidents are not independent. It is not likely that every person will cheat with equal probability. Rather, people who want to cheat are the most likely to cheat and do so over and over. Also, consider incentives. They have been caught cheating multiple times and that hasn’t seemed to harm them much at all. The conclusion is that it is highly likely the Patriots are cheating in other ways.

I think it would be interesting for the fans of vanquished teams to conduct Levitt style analyses of the Patriots. I would guess that looking at other data in addition to the now famous fumble analysis will yeild some interesting answers.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

January 26, 2015 at 12:45 am

help me! this is my hot mess in teaching social networks

Question for readers who teach networks: What software should I use for low tech undergrads? So far, I am having some real challenges…

I have an undergrad class where the first major assignment is to download one’s Facebook network and analyze it. I have been using NetVizz, an app inside Facebook, to extract network data. But it suddenly disappeared! One solution is to use the Facebook importer in NodeXL. That works but… Windows 8 is highly allergic to NodeXL. And lots of people have Windows 8 and they have endless installation problems. And the Java version is an issue. Even when it does work, NodeXL gets stuck downloading data from some student accounts. No explanation. It just does.

Then one can try Gephi, which is a whole ball of wax. The issue with Gephi is that it is highly sensitive to OS version. Luckily, there are fixes but they often involve Mac esoterica (e.g., Apple support does weird things in Safari, but not Chrome). Even then, students have all kinds of unexplained Gephi problems (e.g., the visualization pane simply doesn’t work on some Macs).

I need people to download a spreadsheet of data (e.g., centrality scores for people in your network) and not just pictures, so the Wolfram App and others are of limited value. Also, Wolfram seems to stall on some machines (including a Mac I have at home). I tried installing UCINET on Windows 8 as an end run… but had installation problems.

Here are my requirements. I need software that:

  • Can be easily used by low-math undergrads
  • Low cost/free
  • Is very stable in terms of Windows 7, 8 and various Mac OS versions.
  • If possible, a way to import Facebook data, and produce spreadsheets of data.

The last time two times I did this course, NetVizz, Gephi and UCINET did the trick. But there is a new generation of operating systems and the usual software hasn’t been upgraded and thoroughly tested. In previous years, I might have only or two students who couldn’t get network software running. This semester, it is a third of the class. Argh.

Any advice is welcome.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

 

Written by fabiorojas

January 24, 2015 at 12:19 am

Posted in fabio, mere empirics

defending computational ethnography

Earlier this week, I suggested a lot is to be gained by using computational techniques to measure and analyze qualitative materials, such as ethnographic field notes. The intuition is simple. Qualitative research uses, or produces, a lot of text. Normally, we have to rely on the judgment of the researcher. But now, we have tools that can help us measure and sort the materials, so that we have a firmer basis on which to make claims about what our research does and does not say.

The comments raised a few issues. For example, Neal Caren wrote:

 This is like saying that you want your driverless cars to work for Uber while you are sleeping. While it sounds possible, as currently configured neither ethnographic practices nor quantitative text analysis are up to the task.
This is puzzling. No one made this claim. If people believe that computers will do qualitative work by collecting data or developing hypotheses and research strategies, then they are mistaken. I never said that nor did I imply it. Instead, what I did suggest is that computer scientists are making progress on detecting meaning and content and are doing so in ways that would help research map out or measure text. And with any method, the researcher is responsible for providing definitions, defining the unit of analysis and so forth. Just as we don’t expect regression models to work “while you are sleeping,” we don’t expect automated topic models or other techniques to work without a great level of guidance from people. It’s just a tool, not a magic box.
Another comment was meant as a criticism, but actually supports my point. For example, J wrote:
This assumes that field notes are static and once written, go unchanged. But this is not the consensus among ethnographers, as I understand the field. Jonathan van Maanen, for example, says that field notes are meant to be written and re-written constantly, well into the writing stage. And so if this is the case, then an ethnographer can, implicitly or intentionallly, stack the deck (or, in this case, the data) in their favor during rewrites. What is “typical” can be manipulated, even under the guise of computational methods.
Exactly. If we suspect that field notes and memos are changing after each version, we can actually test that hypothesis. What words appear (or co-appear) in each version? Do word combinations with different sentiments or meanings change in each version? I think it would be extremely illuminating to see what each version of an ethnographer’s notes keeps or discards. Normally, this is impossible to observe and, when reported (which is rare), hard to measure. Now, we actually have some tools.
Will computational ethnography be easy or simple? No. But instead of pretending that qualitative research is buried in a sacred and impenetrable fog of meaning, we can actually apply the tools that are now becoming routine in other areas for studying masses of text. It’s a great frontier to be working in. More sociologists should look into it.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

January 23, 2015 at 12:01 am

computational ethnography

An important frontier in sociology is computational ethnography – the application of textual analysis, topic modelling, and related techniques to the data generated through ethnographic observation (e.g., field notes and interview transcripts). I got this idea when I saw a really great post-doc present a paper at ASA where historical materials were analyzed using topic modelling techniques, such as LDA.

Let me motivate this with a simple example. Let’s say I am a school ethnographer and I make a claim about how pupils perceive teachers. Typically, the ethnographer would offer an example from his or her field notes that illustrates the perceptions of the teacher. Then, someone would ask, “is this a typical observation?” and then the ethnographer would say, “yes, trust me.”

We no longer have to do that. Since ethnographers produce text, one can use topic models to map out themes or words that tend to appear in field notes and interview transcripts. Then, all block quotes from fields notes and transcripts can be compared to the entire corpus produced during field work. Not only would it attest to the commonality of a topic, but also how it is embedded in a larger network of discourse and meaning.

Cultural sociology, the future is here.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!! 

Written by fabiorojas

January 20, 2015 at 12:01 am

more tweets, more votes: it works for TV!!!

nielsen-social-tv

Within informatics, there is a healthy body of research showing how social media data can be used for forecasting future consumption. The latest is from a study by Nielsen, which shows some preliminary evidence that Twitter activity forecasts television program popularity. In their model, adding Twitter data increases the explained variance in how well a TV show will in addition to data on promotions and network type. Here’s the summary from Adweek.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

January 14, 2015 at 12:07 am

building computational sociology: from the academic side

Before the holiday, we asked – what should computational sociologists know? In this post, I’ll discuss what sociology programs can do:

  • Hire computational sociologists. Except for one or two cases, computational sociologists have had a very tough time finding jobs in soc programs, especially the PhD programs. That has to change, or else this will be quickly absorbed by CS/informatics. We should have an army of junior level computational faculty but instead the center of gravity is around senior faculty.
  • Offer courses: This is a bit easier to do, but sociology lags behind. Every single sociology program at a serious research university, especially those with enginerring programs should offer undergrad and grad courses.
  • Certificates and minors: Aside from paperwork, this is easy. Hand out credentials for a bundle of soc and CS courses.
  • Hang out: I have learned so much from hanging out with the CS people. It’s amazing.
  • Industry: This deserves its own post, but we need to develop a model for interacting with industry. Right now, sociology’s model is: ignore it if we can, lose good people to industry, and repeat. I’ll offer my own ideas next week about how sociology can fruitfully interact with the for profit sector.

Add your own ideas in the comments.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($1!!!!)/From Black Power/Party in the Street!!

Written by fabiorojas

January 2, 2015 at 5:27 am

zeynep tufekci and brayden king on data and privacy in the new york times

My co-bloggers are on a roll. Zynep Tufekci and Brayden King have an op-ed in the New York Times on the topic of privacy and data:

UBER, the popular car-service app that allows you to hail a cab from your smartphone, shows your assigned car as a moving dot on a map as it makes its way toward you. It’s reassuring, especially as you wait on a rainy street corner.

Less reassuring, though, was the apparent threat from a senior vice president of Uber to spend “a million dollars” looking into the personal lives of journalists who wrote critically about Uber. The problem wasn’t just that a representative of a powerful corporation was contemplating opposition research on reporters; the problem was that Uber already had sensitive data on journalists who used it for rides.

Buzzfeed reported that one of Uber’s executives had already looked up without permission rides taken by one of its own journalists. Andaccording to The Washington Post, the company was so lax about such sensitive data that it even allowed a job applicant to view people’s rides, including those of a family member of a prominent politician. (The app is popular with members of Congress, among others.)

Read it. Also, the Economist picked up Elizabeth and Kieran’s posts 0n inequality and airlines.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

December 8, 2014 at 4:39 am

Follow

Get every new post delivered to your Inbox.

Join 2,376 other followers