Archive for the ‘mere empirics’ Category

data bleg: categorical data

Please put in the comments, or link to, a data set that has the following properties:

  1. A few hundred cases, but not too many ( 300 < N < 1000).
  2. Longitudinal categorical variable X with the following properties
  3. Categorical variable should NOT be ordered. States should be like {chocolate,vanilla, strawberry}, not {strong agree, neutral, strong disagree}.
  4. About 4-7 time periods.
  5. About 4-7 states that X can be in (e.g., five political parties, five ice cream flavors).
  6. “Legitimate data” – no one will bug me about using this data set. Decent response rate, nice set of covariates for X, data collected for a legitimate research project, etc.

This is for a methods project I’ve been working on. So I don’t need something fancy, just something that that has these specific properties to highlight the strengths of the method. Feel free to email me as well.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

July 25, 2014 at 12:01 am

Posted in fabio, mere empirics

computer science “brain drain”

has an interesting post on the perceived “brain drain” in computer science. From a recent post at the Committee on the Anthropology of Science, Technology, and Computing blog:

But what do scientists think of big data? Last year, in a widely circulated blog post titled “The Big Data Brain Drain: Why Science is in Trouble,” physicist Jake VanderPlas made the argument that the real reason big data is dangerous is because it moves scientists from the academy to corporations.

“…But where scientific research is concerned, this recently accelerated shift to data-centric science has a dark side, which boils down to this: the skills required to be a successful scientific researcher are increasingly indistinguishable from the skills required to be successful in industry. While academia, with typical inertia, gradually shifts to accommodate this, the rest of the world has already begun to embrace and reward these skills to a much greater degree. The unfortunate result is that some of the most promising upcoming researchers are finding no place for themselves in the academic community, while the for-profit world of industry stands by with deep pockets and open arms.  [all emphasis in the original]“

His argument proceeds in four steps: first, he argues that yes, new data is indeed being produced, and in stupendously large quantities. Second, processing this data (whether it’s biology or physics) requires a certain kind of scientist who is skilled at both statistics and software-building. Third, that because of this shift, “scientific software” to clean, process, and visualize data has become a key part of the research process. And finally, because this scientific software needs to be built and maintained, and because the academy evaluates its scientists not for the software they build but for the papers they publish,  all of these talented scientists who would have spent a lot of their time building software are now moving to corporate research jobs, where this work is better rewarded and appreciated. All of this, he argues, does not bode well for science.

We’ve discussed this point on the blog before. We aren’t keeping good people in the academy. Aside from the financial incentives, we are really bad in terms of career development, job security, and gender equity. No wonder why we can’t keep people. We have to seriously reconsider the model where the only people who get good rewards are those who spend a decade getting their PhD dissertation published.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power

Written by fabiorojas

July 24, 2014 at 12:01 am

Posted in fabio, mere empirics

go to big cities, big data!

This August 15, Alex Hanna, a computational sociologist at Wisconsin, will host “Big Cities, Big Data” at the campus of UC Berekeley. BC/BD is a “hackathon” – a meeting of people who program all night long to develop new projects. The next day, the results will be presented at a workshop at ASA. From the announcement:

The theme is “big cities, big data: big opportunity for computational social science,” the idea being looking at contemporary urban issues — especially housing challenges — using data gathered and made publicly available by cities including San Francisco, New York, Chicago, Austin, Boston, Somerville, Seattle, etc.

The hacking will start at noon on August 15 and go until the next day. Sleeping is optional. We’ll have a presentation and judging session in the evening of August 16 in San Francisco, exact location TBD.

We’re working with several academic and industry partners to bring together tools and datasets which social scientists can use at the event. So stay tuned as that develops.

Check it out! It’s the place to meet the next generation of sociology hackers!

50+ chapters of grad skool advice goodness: From Black Power/Grad Skool Rulz

Written by fabiorojas

July 8, 2014 at 12:01 am

sociology/computer science team up: part 2

A few days ago, I suggested that sociologists should seriously consider teaming up with computer scientists. Here, I’d like to sketch out the big picture to suggest why we are in a special moment. Basically, computer science has had three major stages of development:

  • Stage 1 (1949-1970s): The construction of computers. In this stage, it was all about the engineering. How could you make a machine that (a) could be programmed, as opposed to running one command, and (b) do it in a way that didn’t require a machine the size of a house?
  • Stage 2 (1970s-1990s): Learning and theory. Could you make a machine that could, say, solve an algebra equation? Play chess? See things? CS also developed its mathematical side. Does this algorithm find an answer in a reasonable amount of time?
  • Stage 3: (1990s-present): Social computers. Can we build machines that will help people, say, trade using e-currency? Operate in secure networks? In other words, instead of making computers mimic people, we make computers extensions of people.

Of course, people still work in all streams of computer science. The issue is that the social computing stream is now huge. That means that computer scientists are building a technical system that integrates human beings and computer networks. In other words, there isn’t going to be real sharp distinction between online behavior and “real world” behavior. They’ll be connected.

A second observation is that social computing is the engineering analog of “social action.” It’s a broad idea that encompasses a lot of behavior. This is a bit different than say, economics, which reduces a lot to price theory, or political science, which focuses on very specific things like voting or legislation. Instead, computer scientists are dealing with something that is extremely broad. That’s why they can entertain all the different types of data: video recording how people use computers, text analysis, online experiments, and plain old vanilla stats.

None of this means that the CS/soc hookup will automatically happen. Rather, this post explains why this opportunity has appeared. It’s up to us to make the most of it. Otherwise, you can bet on a series of Nature and Science articles that are sociological, but lack sociology authors.

50+ chapters of grad skool advice goodness: From Black Power/Grad Skool Rulz

Written by fabiorojas

July 3, 2014 at 12:01 am

sociology: don’t screw this up, but we need to seriously hook up with computer science

Every once in a while, you get a free lunch. About a year and a half ago, sociology got a small free lunch. It was announced that the MCAT would now include sociology material. Awesome.

But there is a seriously huge free lunch coming up – the rise of “big data.” Ignore the nay sayers. Ignore the hand wringers who worry if Facebook is hurting our feelings. Look at the big picture. Silicon Valley has created a new social world that requires analysis. And not just the generic stuff you get from your local management consultant. They need analysis from people who understand human behavior and can build arguments. They don’t want data mining. They want theory and real research designs.

Consider this tweet from Elise Hu, a Washington Reporter, who quoted Joi Ito, director of the MIT media lab:

In other words, the world of computer science has stumbled into social science. As usual, many think that social science is garbage, but that is slowly changing. Many are being hired at Google and Facebook. Others are striking out on their own. Many within the social sciences are using computer science.

The big message? This is a huge opportunity. It can change the discipline – but only if we constructively interact with the computer science discipline. My recommendations:

  • Reach out to your colleagues in computer science. Run a seminar or write a grant.
  • Reach out to computer science students. Create courses for them, invite them to be on projects.
  • Treat “big data” was we would other data. It has strengths and weaknesses, but in being critical we can use it in the correct way and raise the level of discussion.
  • Submit to computer science conference. I’ll be honest, computer scientists are not statisticians. There are a lot of fascinating areas of computer science where the stats are very simple or the ideas are basic. We can add a lot of value.

The benefit? CS will get an infusion of good ideas to work through. Sociology will come into contact with some really cool  people, create a bigger audience, and get more resources. We can also get answers to some great questions.

So don’t screw it up, people. This doesn’t happen very often.

50+ chapters of grad skool advice goodness: From Black Power/Grad Skool Rulz

Written by fabiorojas

June 30, 2014 at 1:51 am

more tweets, more votes: it works in india


A recent article in the Atlantic provides some evidence that the tweets/votes correlation holds up in the recent Indian election:

The direct comparison between volumes of tweets mentioning the different parties shows a similar movement: from a somewhat even distribution—particularly in the mid phases of the campaign between January 28 and March 3, before Kejriwal started his road show in Gujarat and his live Facebook talk—but the BJP took over in the final stages of the campaign.

They should do relative tweet measures, which helps with American data.

For previous More Tweets, More Votes – click here.

50+ chapters of grad skool advice goodness: From Black Power/Grad Skool Rulz 

Written by fabiorojas

June 9, 2014 at 12:01 am

is gordon gee responsible for the adjunct explosion?

Gordon Gee, former president of Ohio State, made more than $6 million in FY 2013, including the $1.5 million “release payment” he got in exchange for not letting the door hit him agreeing not to sue the university on his way out. Now the New York Times is reporting that the 25 public universities with the highest-paid presidents have greater increases in student debt and numbers of adjuncts than other publics.

I had a story for this, an organizational story. Ah, I thought. The NYT is implying that the high pay is taking away money that would be going to the other stuff. But really, this just reflects a new model for flagship publics: limit faculty costs (hence the adjuncts), increase the proportion of out-of-state students paying high tuition (hence the debt), and pursue corporate-style CEOs who can lead us into this brave new world (hence the salaries). The non-flagships can’t pursue this strategy successfully, so we’re seeing a divergence between the two groups.

But it turns out that the data don’t, in fact, support that story. They don’t really support any story. The NYT article is based on a report from the Institute for Policy Studies, a progessive think tank. And as I read it, things didn’t seem quite right. IPS reports on the number of adjunct faculty at these institutions, but I haven’t seen good data anywhere on the number of adjuncts. And administrative spending at publics increased 65% between FYs 2006 and 2012, as states slashed budgets?

So I went to their sources. The IPS data on adjuncts and administrators comes from the American Federation of Teachers, which in turn is taking it from IPEDS. And IPEDS data is notoriously wonky.

Yeah, basically the IPS report is just a mess. IPEDS made some major redefinitions of terms in the middle — like who falls under “Part-time/Instruction, Research and Public Service,” what IPS is calling “Adjunct Labor” — so the years aren’t comparable with each other, and AFT appears to have mislabeled some of the years entirely. The University of Minnesota’s impressively fast PR office has a debunking report up, and while I haven’t checked all the numbers, my impression is that it’s right on target.

That doesn’t disprove my theory that there will be increasing divergence between the model for flagships and the path taken by the rest of the publics. And it’s entirely possible that universities with highly paid presidents have underwhelming outcomes in other areas. But if we’re going to argue over what to do about it, it would be nice if it were based on numbers that actually mean something.

Written by epopp

May 19, 2014 at 3:05 am


Get every new post delivered to your Inbox.

Join 1,168 other followers