Archive for the ‘mere empirics’ Category
Last week, I argued that retractions are good for science. Thomas Basbøll correctly points out that retractions are hard. Nobody wants to retract. Good point, but my argument wasn’t about how easy it is to retract. Rather, it’s about the fact that science is exceptional in that it has a built in error correction mechanism.
In reviewing the debate, Andrew Gelman wrote:
One challenge, though, is that uncovering the problem and forcing the retraction is a near-thankless job…. OK, fine, but let’s talk incentives. If retractions are a good thing, and fraudsters and plagiarists are not generally going to retract on their own, then somebody’s going to have to do the hard work of discovering, exposing, and confronting scholarly misconduct. If these discoverers, exposers, and confronters are going to be attacked back by their targets (which would be natural enough) and they’re going to be attacked by the fraudsters’ friends and colleagues (also natural) and even have their work disparaged by outsiders who think they’re going too far, then, hey, they need some incentives in the other direction.
A few thoughts. First, fraud busting should be done by those who have some security – the tenured folks – or folks who don’t care so much (e.g., non-tenure track researchers in industry). Second, data with code should be made available on journal websites, with output files. Already, some journals are doing that. That reduces fraud. Third, we should revive the tradition of the research note. Our journals used to publish short notes. These can be used for replications, verifications, error reporting and so forth. Fourth, we should rely on journal models like PLoS. In other words, the editors will publish any competent piece of research and do so in a low cost and timely way. Fraud busting and error correction will never be easy, but we can make it easier and it’s not hard to do so.
A focus of network research since, say 1999 or so, has been to identify “laws” that generate large networks with certain properties.* For example, the small world network is built by rewiring a grid. Various processes generate power-law networks (i.e., the node distribution is described by a power law).
I can see two justifications for this type of research. The first is diffusion theory. The speed at which something diffuses in a network is definitely governed by the structure. The second is a sort of physical science justification, where you think of a network as a “system” and you show that some micro-process (e.g., preferential attachment) creates that network.
Is there any other behavioral implication of studying power laws/small worlds or other specific large scale properties? In other words, why should I care about scale free or small world networks aside from diffusion theory?
* Let’s leave aside recent criticism of power-law centric research for the sake of the post.
I’m still mulling over some of the issues raised at the Chicago ethnography and causal inference conference. For example, a lot of ethnographers say “sure, we can’t generalize but ….” The reason they say this is that they are making a conceptual mistake.
Ethnography is generalizable – just not within a single study. Think of it this way. Data is data, whether it is from a survey, experiment or field work. The reason that surveys are generalizable is in the sampling. The survey data is a representative sub-group of the larger group.
What’s the deal with ethnography? Usually, we want to say that what we observe in fieldwork is applicable in other cases. The problem is that we only have one (or a few) field sites. The solution? Increase the number of field sites. Of course, this can’t be done by one person. However, there can be teams. Maybe they aren’t officially related, but each ethnographer could contribute to the field of ethnography by randomly selecting their field site, or choosing a field site that hasn’t been covered yet.
Thus, over the years, each ethnographer would contribute to the validity of the entire enterprise. As time passes, you’d observe new phenomena, but by linking field site selection to prior questions you’d also be expanding the sample of field sites. This isn’t unheard of. The Manchester School of anthropology did exactly that – spread the ethnographers around – to great effect. Maybe it’s time that sociological ethnographers do the same.
a response to andrew gelman on the statistics discipline, but not scott because he thinks i’m a sad distraction in higher education and that like, totally, hurt my feelings
On Friday, I write a semi-humorous post about the interaction between statisticians and non-statisticians. The issue that brought it up was that sometimes statisticians like to work on asymptotic results. This, by itself, isn’t bad. It’s good to know what an estimator does when you have a nice big sample that behaves well. My beef is that sometimes small samples – the ones that most social scientists work with – are treated as an inconvenient afterthought. That rubs me the wrong was because mathematical elegance is accorded more importance than addressing the core problem of statistics – which is to accurately model, measure and study the relationships between variables.
Andrew Gelman wrote a simple response, which is that I am hanging around with the wrong people. There is some truth. The last time I had the “n–>00″ argument was with a visitor. Indiana has hired some exceptional applied statisticians, like Stanley Wasserman. The program has also hired people with non-statistics PhDs, like sociology and economics. I have consulted with these folks and it is easier to get concrete guidance on statistical practice.
But still, as multiple comments noted at orgtheory and Gelman’s blog, there are a lot of people with the title “statistician” who do treat issues of model estimation with small samples as an afterthought. This does happen, though maybe not as much as it used to.
Let me conclude this post with a comment about the sociology of the statistics profession. Statistics is a discipline that is analogous to computer science. Computer science can be math, engineering, applied science, or even philosophy (think artificial intelligence). Statistics is the same way. It can be mathematical, applied, or even visual. Consequently, there is no standardized cultural template for what a statistics department is.
Sometimes, statistics lives inside a math department. Sometimes it is distinct. At Indiana, they are trying an interdisciplinary approach where you have stat, math, and social science PhDs in the same unit. Each organizational environment creates pressures for different research.
If you live in a math department, you almost certainly can’t get promoted unless you study functional analysis or numerical analysis as applied to statistical issues. That produces people who are probably incapable of interacting with others who aren’t interested in the mathematics. Once you have your own department, you diverge from this model. Some statisticians are highly applied and many PhD graduates get jobs in professional schools and social science programs. These multiple pressures mean that you probably get a wide range of people, some of whom think statistics is just a field of mathematics while other can actually help people with real world statistical problems.
Here’s a conversation I’ve had a few times with statisticians:
Statistician: ” … and these simulations show how my results work.”
Me: “What does your research tell us about a sample of, say, a few hundred cases?”
Statistician: “That’s not important. My result works as n–> 00.”
Me: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.”
Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.”
Me: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.”
Statistician: “The Internet is a network with millions of nodes.”
Me: “Sure, but the Internet is one specific network. Most real world networks have hundreds or thousands of nodes. Like a school, or firms that trade with each other. Network data is expensive to collect. Some famous social science papers analyze networks of dozens of people.”
Statistician: “Um… the Internet! Scaling! Big networks! The Internet is a network! Facebook! FACE. BOOK!”
Me (rolls eyes): “What-EVER!”
This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. An economist works on another equilibrium theorem rather than, say, taxes. The physicist works on the mathematics of super string theory, even when the experimental evidence isn’t there.
We have the same issue in statistics. “Statistics” can mean “the mathematics of distributions and other functions arising in statistical models.” Or it can mean the traditional problems of statistics like inference, measurement, model estimation, sampling, data collection/management, forecasting, and description. The problem for a guy like me (a social scientist with real data) is that the label “statistician” often denotes someone who is actually a mathematician who happens to be interested in distributions. That’s why they are happy with limit theorems, because limits smooth out hard problems and produce elegant results.
What I really want is a nuts and bolts person to help me solve problems. I may tease economists for their bizarre obsession is identification at the expense of all else, but at least identification is a real issue that needs to be taken seriously.
Let’s say you are doing discrete logit event history analysis. You are simply pooling all cases and time periods and just estimating a logit , where Y = failure event. See Yamaguchi’s (1991) book, chapter 2.
Question: why don’t people do a fixed effects kind of model, or cluster by case? There may person level heterogeneity that you want to account for. One way to address this is to do a logit w/fixed effects for each person in the population. Another way to do it is to try control for inter-person correlation (i.e., person X’s observation in time T and T+K are probably correlated).
This sort of adjustment is standard in panel data. Event history data has the same basic set and the same issues with correlated errors within cases, but most event history papers (including my own) don’t deal with this. Why?
These questions came up during orgtheory training last week. I did not have good answers:
- A lot of performativity research focuses on stock options, less on futures. Why?
- Are there good studies of performativity of theory that aren’t about the economic profession?
My lame answers: 1. Everyone is taught Black-Scholes first, but no reason performativity theory couldn’t be applied in other types of markets, 2. economics is the most influential intellectual group that has a theory of social behavior that is inaccurate (which makes performativity possible) . Post your answers in the comments.
Michael Bishop, of the Permutations blog, has set up a web site to archive R code for Add Health. Rather than have every Add Health researcher reinvent the wheel, he wants to sponsor an open source community that will provide R code.
Over at Salon, Alex Pareene made fun of people who try to guess presidential politics. Fair enough, there are a lot of lame guesses. However, there are patterns. It’s not as hard as you think. Basically, in American politics the *only* people who ever make any headway are people who have/recently had the following positions:
- Vice Presidents
- Cabinet Secretaries
All major party nominees come from this group. Of course, not everyone in the group has an equal chance of winning. Generals only seem to get nominated if they win a big war. In the post-war era, cabinet secretaries and representatives never get nominated, though a few may get a VP nod.
Even among governors and senators, there seems to be a rule that only recently elected leaders have the energy and resources to win. So the guy who’s been in Congress 30 years is unlikely to be the nominee. The governor with a term or two under the belt is in the position to win. So you could probably produce a list of people who have a decent chance of getting the nomination. This list includes 30 or 40 recently elected governors and senators as well as a few others, like sitting VPs or popular generals.
If you lower the bar and ask who is influential in presidential elections, you’ll find that the pool expands a little bit. Here you get the occasional rich dude who wins a state in the primary (Forbes ’96) or goes independent (Perot ’92), as well as the Representative who fights for a constituency (Chisholm ’72). Sometimes military figures step in. Wesley Clark actually won a state in the ’04 Democratic primary. But still, most of the action is in the recent governor/senator/VP pool.
The only person who has ever had any real impact in an election that wasn’t in this pool is Jesse Jackson, who won 3 primaries in ’84 and 10 in ’88. He did better than Al Gore, a well entrenched establishment figure. Jackson represents a political type that is fairly rare in American politics, the social movement leader who has a mass following. But that’s truly unique – the Civil Rights movement was extremely successful and then transitioned into the Democratic party. Don’t expect a similar figure any time soon. Few other movement leaders would have such a strong base that it would trump traditional party politics.
Bottom line: You never know what will happen in presidential politics, but you can come up with a reliable list of eligible bachelors.
Our friend Kieran has a series of posts on his research at Leiter Reports, the leading academic philosophy blog. Aside from writing on economic sociology, Kieran has begun an ambitious project analyzing the way that philosophers evaluate each other. Three posts so far, each well worth reading:
- The overall pattern of department evaluations.
- Descriptive analysis of who does the ratings.
- Specialties and raters.
I’ve seen this project presented in workshops. There is much more and it is very good. Can’t wait to see more posts.
Over at Evil Twin, Nicolai Foss gently chides Bloom and Van Reenen for publishing a paper in the AER proceedings called “New Approaches to Surveying Organizations.” The issue is the validity of survey data versus other types of data:
As a rule register data are not available that can be used to address numerous interesting issues in organizational economics, labor economics, productivity research and so on. Scholars working on these issues have to resort to those softy surveys and interviews that have been the workhorses of business school faculty for decades. This is a new recognition in economics. Case in point: A recent paper by Nicholas Bloom and John Van Reenen, “New approaches to surveying organizations.” There is absolutely nothing, I submit, in this short, well-written paper that would surprise virtually any empirically oriented business school professor (i.e., virtually all bschool professors) to whom this would not be anything “new” at all, but rather old hat.
This is not a critique of Profs. Bloom and Van Reenen at all (on the contrary, it is excellent that they educate their economist colleagues in this way). It is just striking and a little bit amusing, however, that we have had to wait until 2010 until empirical approaches that have been mainstream in management research for decades reach the pages of the American Economic Review.
I agree. In the comments, Bloom argues that he didn’t find any papers addressing these issues. This is odd, a lot of the suggestions for surveys make sense and many are well discussed in the literature on surveying individuals. For example, did they consult Dillman’s works? There are also handbooks discussing surveying organizations. There’s a huge industry of people who study survey bias.
A few additional comments: I have heard multiple economists express survey skepticism. The correct response is that reliability of survey responses varies and some questions are better than others. For example, people seem to be pretty good at reporting health, while they outright lie about attending church. Surveys by themselves aren’t good or bad, but individual questions can be high quality or low quality. Also, a lot of our most important data is from self-reports – like the Census, CPS, HRS, etc. I don’t see people ditching the Census.
Second, the real problem in survey research in organizations isn’t bias. It’s response rate. There’s all kinds of tricks to boost response rates for people, but getting people to respond at work (or about work) is really, really hard. And it’s miserable for longitudinal work. If Bloom and Van Reenen can produce a solution to low response rates from orgs, I’ll be really impressed.
mass media, you so dumb, you can’t count delegates. don’t come to math class, i’ll come to your house and give you an f and save you the trip.
You’re seeing a lot of headlines about how Tuesday’s results somehow put Santorum back on track. Romney got spanked. It’s now a two man race. Right…
Let’s look at the box and check the rules of the game. You need 1,144 delegates to win the nomination. Ok, so let’s count the pledged delegates awarded on Tuesday.
- Romney: 11 (AL) + 6 (American Samoa) + 9 (HI) + 12 (MS) = 38.
- Santorum: 18 (AL) + 0 (American Samoa) + 5 (HI) + 13 (MS) = 36.
That’s right, Romney actually *expanded* his lead on Tuesday. By a small amount, but he was the actual winner Tuesday.
We’re now in a replay of the 2008 Democratic primary. The delegate math heavily favors the well organized candidate who won some early states, racked up delegates in ignored states, and avoided delegate blow outs in other states. Just like Ohio and Pennsylvania didn’t derail Obama after he got that healthy delegate lead in February 2008, losing Southern states isn’t going to sink Romney. Toss in winner takes all states like California, and Romney has an obvious and likely path to victory. As long as Romney limits losses and keeps the delegate count close, he’ll slowly slog to the nomination and the primary fight will have no effect on the final outcome of the election.
At the Chicago ethnography conference, I saw an excellent presentation by Dan Dohan and Corey Abramson. The ethnography addresses how cancer patients are assigned to potentially life-saving clinical trials. Dan has collected an amazing amount of data on a crucial subject. I want to talk about how he illustrated this data. He used something called “microarrays.”
The concept is simple. The microarray is 2×2 matrix. Each row is a case. The columns represent variables. Columns are clustered according to similarity. The color of each cell represents some variable. Red might mean “high on the scale.” The combined effect is to create visual map of where the action is happening in the data.
This tool was invented by geneticists. In their case, rows are individuals. The columns are genes, clustered by similarity. Colors indicate whether the person has the gene. Intense colors indicate a cluster of people who have genes of importance. Here’s an example from the wiki.
Dan used this technique to illustrate his data. In his case, rows are people and columns are life course events. Colors indicate evaluations of the event, as reported by respondents. I have never seen ethnographic data displayed in this way before. It’s simple and intuitive. It can stand by itself, or be used as a guide for further qualitative or quantitative analysis.
In the Q&A, I asked Dan is he thought that ethnography was limited by narrative and verbal description (like vignettes). Maybe they should explore new ways to use all their data. The microarray shows the full power of data gathered through ethnography. Maybe there are other powerful ways to display qualitative data. In response, he focused on narrative and said that in the health world people needed to hear narrative. I’m not sure. Regardless, I think Dan should run with this. This opens up a whole new world for qualitative research and it needs to be explored.
“Aren’t we all Wittgensteinians here? Yes?” – Andreas Glaeser
1. Ethnography is watching everything that is the case.
2. Maybe not. It’s hard to see everything, but you don’t see nothing. That’s gotta count for something.
3. Something is better than nothing. That’s useful.
4. Pragmatism is the new black.
5. It gets better than pragmatism. We can be positivist.
6. Positivists like mechanisms, which grandma used to call process.
7. What about Uncle Mack? He doesn’t like abstract discussion, he likes cases and mid-level theory.
8. Lunch was good. I like the sandwiches with pretzel bread.
9. I feel weird when quantizoids show up. We can be friends.
10. Screw that. We’re talking about Bhaskar and critical realism. Fight club ensues.
10.1. I have to read a book on critical realism before someone can explain to me what critical realism is.
10.2. Critical realism means never having to choose a variable, level, or mechanism.
11. Chris Winship mentions orgtheory and scatter plot.
11.1. Margarita and Cassidy tweeted. #inferenceandethnography.
12. You can communicate ethnography with great diagrams.
12.1. If you can illustrate ethnographic data, does that mean ethnographers limit themselves with verbal and narrative forms of data presentation?
12.2. Dan doesn’t take the bait. Ethnography is about narrative. Period.
12.3 Dan is wrong. There is a world where Tufte meets Whyte.
13. Diane Vaughn, patron saint of orgtheory, speaks. Ethnography can generate ideas and mechanisms. I sweat with joy.
14. I am a stranger to a group of people who defined themselves as strangers to other groups. Yet, I am not in that group of strangers. This is called “Rojas’ Paradox.”
15. You make inferences by observing things.
15.1. Talk vs. action.
15.2. Chains vs. correlations.
15.3. Variance within the field site.
15.4. Counterfactuals are out there. Ethnographers can see them.
15.5. Simple modes of inference allow you to talk of these things.
16. Whereof one cannot observe, one must remain silent.
I will be in Chicago on Thursday and Friday for the causal inference & ethnography conference. Please email me if you want to hang out. Most of the time, I will be at the UoC. We can have heady discussions in the Sem Coop. Fri afternoon is flexible. Also, I will be live tweeting (@fabiorojas) the proceedings. Hashtag: #inferenceandethnography. Email/tweet your questions. Will see if I can ask them.
The puzzle for me is not that Romney is facing resistance. Nearly all non-incumbents face resistance in presidential primaries. But still, Romney’s problems puzzle me. His opposition is incoherent, underwhelming, and underfunded. Romney has the establishment backing, wealthy backers who can pour millions into Super-PACS, and he actually won 11 states in 2008. So why is now losing states that were slam dunks in 2008?
As usual, the story is complex. A lot of early primary states, like Michigan, have lost the moderate and wealthy Republicans that Romney relies on. Republicans from liberal states are decidedly unpopular for the Tea Party base. Evangelicals probably don’t tolerate Mormons.
But there is one factor that has yet to be mentioned – maybe Romney is just really bad at being a conservative politician. Until 2012, he’s never been forced to actually talk the talk in any serious way. During his 1994 battle against Ted Kennedy, he talked non-stop about how he wasn’t conservative, a theme he picked up as her ran for governor. In 2008, Romney won 11 states, but he only appeared conservative when compared to John McCain – the guy who thumbed his nose at the GOP base. In other words, a competent flip-flopper like Romney only appears conservative when standing next to someone who actively makes fun of the base.
2012 is the first time that Romney has had to run against other national politicians who are consistently to the right of him. Even though they are waging an uphill battle, they do have compensating factors. Santorum has always been hyper-conservative on social issues, Paul has the libertarian wing, and Gingrich … well … he’s special. Anyway, 2012 is the first time that Romney has had to fight for conservative votes with competitors who are, well, actually conservative. And the lack of experience shows.
I think it’s kind of weird for someone to review a single command, but here it goes. The most recent version of Stata includes a command/package called “mi impute.” It is supposed to be a flexible all purpose utility for addressing missing data using multiple imputation (e.g., filling in missing data through constrained random draws, and then combining estimate). I’ve used it on and off since the Fall and I want to talk about my experiences.
First, as with most Stata software, mi impute is rather impressive. When you type “mi impute,” you enable a whole package of tools for doing multiple imputation analysis. It’s much like “svy,” “st” and other commands that allow you to do do all kinds of operations that are needed for specific types of analyses (Cox models, time series, etc). The documentation is extensive and the options available would help most run of the mill social scientists, like me.
Second, there are some serious drawbacks. Let’s start with speed. Mi impute is very expensive in terms of time. A student of mine recently worked with a very well known data set with 10,000 cases. The UNIX machines took hours to impute. My desktop will come a halt for a few minutes doing 5 imputations for N=690.
Another drawback is that mi impute is very fussy. Once you deviate from linear variables, mi impute produces a multitude of errors and warnings. It is not entirely obvious that using the various fixes is the correct and proper way to do things.
Finally, as with many multiple imputation methods, you are fairly constrained with what you can do with the final model. Because mi requires you to combine data sets, there is often no confidence interval for the coefficients, which very much limits post-estimation commands.
Overall, I admire mi impute and I’m glad it’s part of Stata 11. But at the same time, the cost-benefit ratio is out of whack. I can get similar and valid answers by using much simpler imputation methods without crashing my machine or making lots of dubious choices with the options.
We’ve had some nice discussions of high quality ethnography. Here’s my question: which ethnographies have been responsible for introducing a new theoretical ideas into sociology? For example, I do know that early in his career Bourdieu did ethnography and his early theory was inspired by his field work. What other ideas have been brought into sociology this way? I want to distinguish between ethnography as thick/insightful description (e.g., more details on urban poverty) and ethnography as an argument for a new concept.
From the home office in Hyde Park, guest blogger emeritus Mario draws my attention to a conference that should be of special interest to Midwest ethnographers:
University of Chicago Urban Network are sponsoring a conference, “Causal Thinking and Ethnographic Research,” devoted to understanding the contributions of ethnographic research to contemporary causal thinking and scientific inference. Is counterfactual thinking useful to ethnographers? Does ethnographic research help identify its flaws? Are the deductive methods underlying QCA appropriate to a research endeavor primarily driven by induction and abduction? Do mechanism-based explanations simply push the difficulties of causal inference deeper? What approaches to inference in ethnographic research would constitute a better alternative? Many of the most interesting and promising ethnographers in sociology will be addressing these and other questions.
Here’s the conference website. Check it out.
Shamus Khan’s book, Privilege, makes an excellent point. Elites get relatively little attention in sociology. However, I think this needs an important qualification. Elites get little attention from non-organizational sociology. Folks in stratification, political sociology, and other areas love the little guy. The situation in organizational sociology is the reverse. Elite organizations get tons of attention. Think about how much attention has been paid to firms like GM. Now, think about how much attention is paid to the local auto repair shop.
Why is that? I think it goes something like this…
- Regular people are easy to find and highly accessible. Many are even excited when university researchers contact them. If you ask enough people to be part of your research, you’ll get enough. In contrast, elite people are highly secretive. First, there are fewer of them. More importantly, they highly value their privacy. They also, in my experience, are more guarded and like to give canned answers. So: commoners – all over and open to discussion; elites – few and they hide in the Bohemian Grove. There are a few exceptions – celebrities and politicians have a lot of public data about them. But your average hedge fund managers is harder to track down.
- Regular organizations are often hard to find and they don’t like outsiders. Small businesses may be numerous, but they don’t have a lot of time to talk to you. They don’t have a lot of public data about them. In contrast, elite organizations often generate tons of public information through litigation, journalism written about them, public filings, and high profile leaders. If they are publicly traded firms, they disclose a lot. If they are government organizations, there is also tons of information in public archives. And don’t forget disgruntled employees and customers – they’ll talk to no end about the inner workings of their organizations.
Also, there is a professional incentive. It’s glamorous (in sociology at least) to talk about the poor, but less glamorous to talk about the 1%. In organization studies, we care a lot about market leaders and innovators, so we focus on the elites. Shamus’ excellent book and the work of Lauren Rivera shows an important change among younger researchers. I hope it continues.
The GOP caucus is on Tuesday. You will win nothing but honor for a correct guess. Nate Silver raises some interesting points about Iowa. In the GOP caucus, according to political scientists, moderates under perform their poll numbers while conservatives over perform. Makes sense. The caucus is a high commitment political act, which favors intense activists. If you believed in polls “as is,” expect a Romney win. But if you weight by ideology, then someone like Paul might win. But, according to polls, Paul is dropping while Santorum is picking up a little, possibly robbing Paul of the win that a more hard core conservative might normally get in this contest.
UPDATE: As of 2pm EST, the orgtheory readers rank Romney > Paul > Felin > Santorum. Check in late this evening for an update.
Here’s a an idea. Let’s say that Ron Paul has perfect timing and wins the Iowa caucus, which is on January 3, 2012. It can happen. He’s got good organization, which matters in a small caucus state like Iowa, and a strong brand name. People hate the opposition. After Romney, Paul is the only presidential contender with a remotely decent track record. In 2008, he was getting somewhere between 5% and 15% in various primaries and caucuses. He even came in second place in a few states, like Nevada.
Then Paul hits the “Jesse Jackson ceiling.” Where Jackson could only go so far on the civil rights coalition in the Democratic party, Paul can only go so far on an ideologically pure libertarian platform in the GOP. Fox news hates Paul, as does the GOP establishment which is firmly against Paul. In a best case scenario, Paul wins some more libertarian leaning small states before Romney gets the Northeast, the West and the Mountain states in some sort of Super Tuesday landslide.
Here’s the twist: a semi-successful Paul 2012 run means that there is now a whole network of party activists who love the Paul brand and know the ropes. They’re ready to go if Kentucky Senator Rand Paul – Ron’s son – wants to run. He’s also a fairly pure libertarian in many ways and could easily pick up that wing of the party. If the social conservatives burn out in 2012 and 2016, by running against Democrats during the peak of the business cycle, then the GOP may be ready to let Rand Paul run in 2020 and he might win. The real legacy of Paul’s 2012 primary run may be laying the groundwork for Rand Paul presidency.
I really thought Perry would stampede over Romney. The Tea Party just had to love a conservative Christian Texas governor. However, the evidence goes against my intuition. Social science says that early polls have little value. Endorsements matter a lot more. By this logic, Romney’s weak poll numbers aren’t important. He leads in endorsements and will likely be the GOP nominee. Recently, Perry has been sinking, fast. Social science: 1, Fabio: 0.
Dave Carney (Rick Perry’s chief strategist), is the Billy Beane of politics. During Perry’s 2006 campaign he brought in academic experts to run various tests on what campaign practices actually worked, and which ones didn’t. They found out intriguing things that run against conventional wisdom:
1) Media coverage is highly overrated. In the 2010 campaign, Perry didn’t debate Houston Mayor Bill White once or visit any editorial boards. Perry also makes very few television appearances – even Sean Hannity complained that it was hard to book Perry.
2) People make voting decisions based on neighbors organizing and convincing each other, not through the media coverage. So while the media attempts to declare his campaign dead, Perry and Carney don’t really care.
3) Unearned media is only valuable late in the game. The professors’ research found that the benefits of television advertisements dissipated after one week, and that direct mail was ineffective all together. So the Perry campaign believes in saving all their resources until a last minute, huge television ad buy rather than trying to combat the media narrative by wasting money on ads before they drive any votes. This strategy lends itself to a last minute rise in the polls, rather than peaking too early.
If candidates are playing a new game, then the old findings might not hold. Perry, in this view, is doing a run around Romney. He avoids the media and focuses on grassroots mobilization. Aside from Romney, Perry’s the only candidate who has the money to actually pull this off (grassroots can be expensive) and the only one who has extensively tested this tactic.
The remaining questions are: (a) Will the immigrant/college tuition issue tank Perry among activists and nullify the strategy? The one thing Tea Party activists hate more than government bailouts, it’s assistance to undocumented immigrants. (b) The reason that endorsements help candidates is that political leaders share resources that assist with grassroots mobilization – donor lists, telephone lists, voter registration lists, etc. Romney likely already knows about this – that’s why he continues despite weak poll numbers and a dislike from the base – and Perry is already late to the game.
I take a special interest in the Moneyball movie because I used to teach the book in a class. Before I get to the academic comments, I’ll give the movie a thumbs up. It’s a fun movie and, as usual, Brad Pitt puts in a believable performance as a conflicted manager. It’s slow for a modern film, but I liked it. If you are a sports fan and you have a tolerance for chatty films, then you’ll probably like this.
Anyway, the reason I went to see the movie is that tor a while, I taught IU’s course on organizations and work. I used Moneyball to explain two concepts – market imperfections and organizational culture.
Markets are imperfect when buyers and sellers do not incorporate all the available knowledge. Moneyball is really about taking advantage of the fact that most sports team managers don’t use some very basic data to choose players. Organizational culture simply means the shared ideas in an organization that are used to interpret things and motivate behavior. Moneyball is about the conflict between people who think baseball can be successfully quantified and those who think that good coaching should be based on experience and gut feelings.
[link via David Lazer]
Twitter is getting lots of interest from social scientists. Here’s a piece from the current issue of Science about how “social scientists wade into the tweet stream” (the figure below is from this article). And, an NPR piece on a forthcoming Science article by Macy and Golder on affect and mood and twitter.
“In particular, when the ratio between number of listeners who tagged “I like” and “I don’t like” is calculated, it can be observed that almost all positively ranked songs carry some form of “universal” values, such as wisdom, compassion, love, peace, …
The visualization consists of a scatterplot, in which the size of the squares correspond to the number of users who tagged that song as a favorite. The Y-axis represents a general consensus that the songs are likeable, or not-likeable.”
The Chronicle of Higher Education has a nice review of recent work that maps scientific citation networks. The image above is a rough map of where everything is. The neat thing is that sociology, according to the data from J-Stor citations, is in the middle of things. I think Jim Moody has work showing the same thing. The articles also discusses new techniques and how they can be used to map out scientific specialties as they emerge from citation patterns\.
As long time readers know, I believe that most college rankings are garbage because they use dubious measures of performance and quality. Also, the leading magazines tend to cherry pick data so that a handful of schools (H/Y/P and Stanford) are always on top. For example, there’s an old Slate article on how Cal Tech, perhaps the most elite science college in the world, routinely gets shafted. Then we get to the issue of bad data. College administrators are often sloppy or dishonest when submitting data for these rankings.
Bad data, favoritism, and a lack of logic. How could it get any worse? You can depend on Newsweek and the Daily Beast to rise to the occasion. They’ve now produced a ranking of the least rigorous colleges. In their own words:
To pick out the least challenging of the nation’s top colleges, we considered schools that admit students with an average Critical Reading/Math SAT score of at least 1250. We then took into account student opinion, quality and quantity of professors (which directly impacts challenge and workload), and drop-out rate. The total score for each school consisted of several components: College Prowler‘s “Most Manageable Workload” score (40%), student-to-faculty ratio (25%; from the National Center for Education Statistics), and an analysis of student-posted evaluations on RateMyProfessors.com (25%; generated by the Center for College Affordability and Productivity, an education think tank). Additionally, we plotted each school’s average SAT score for admitted students against its freshman retention rate (percent of first-years who return the following fall; from NCES) to estimate the degree to which each college’s actual retention rate differed from what the correlation would predict. We took the results as a measure of relative ease or difficulty, and factored this in as 10% of the overall score.
You read that right: RateMyProf scores – people voluntarily griping or praising profs. I’d flunk anyone who used that data in an intro research seminar. Some of the data is puzzling. The ratio of freshman SAT’s and retention? Very ambiguous. Retention may be due to many things aside from rigor in the class, such as financial aid or location of the school. Also, schools may teach hard material but allow people to hide or surmount the problems. For example, MIT has a fairly high retention rate because the freshman year is either pass (C or higher) or “no record” (D or F).
The results of the “least rigorous” study? Many top notch engineering schools, which most observers recognize as being very demanding, make the list of the 25 least rigorous schools, such as Rensselear Polytechnic, Illinois-Urbana, and UC Berkeley.* Other schools, which are not known as being easy grades, such as Wisconsin-Madison and Hopkins make the list. This isn’t fun, lighthearted journalism. It’s an embarrassment.
* Disclaimer: I’m a Berkeley graduate. I studied the extremely easy topics of math and engineering. And boy, was it easy!
Cranked up Stata 12. I love the new mi manual – one click and you can read the whole imputation manual. I also like how mi pulls it all together. Question: There’s now, like, a gazillion missing data imputation methods. Any advice on which to pick? What is accepted in different disciplines as a reasonable and justified imputation procedure? What’s standard in soc? Econ? Policy?
Philip Cohen linked to this diagram. Mentions in books, according to Google:
Kieran does the same diagram w/upper case letters.
From Information is Beautiful. Larger circles indicate popular supplements. Y-axis indicates clinical evidence of evidence. Higher is better.