picking the right metric: from college ratings to the cold war
Two years ago, President Obama announced a plan to create government ratings for colleges—in his words, “on who’s offering the best value so students and taxpayers get a bigger bang for their buck.”
The Department of Education was charged with developing such ratings, but they were quickly mired in controversy. What outcomes should be measured? Initial reports suggested that completion rates and graduates’ earnings would be key. But critics pointed to a variety of problems—ranging from the different missions of different types of colleges, to the difficulties of measuring incomes along a variety of career paths (how do you count the person pursuing a PhD five years after graduation?), to the reductionism of valuing college only by graduates’ incomes.
Well, as of yesterday, it looks like the ratings plan is being dropped. Or rather, it’s become a “college-rating system minus the ratings”, as the Chronicle put it. The new plan is to produce a “consumer-facing tool” where students can compare colleges on a variety of criteria, which will likely include data on net price, completion rates, earning outcomes, and percent Pell Grant recipients, among other metrics. In other words, it will look more like U-Multirank, a big European initiative that was similarly a response to the political difficulty of producing a single official ranking of universities.
A lot of political forces aligned to kill this plan, including Republicans (on grounds of federal mission creep), the for-profit college lobby, and most colleges and universities, which don’t want to see more centralized control.
But I’d like to point to another difficulty it struggled with—one that has been around for a really long time, and that shows up in a lot of different contexts: the criterion problem.
While this has been going on in the news, I’ve been buried in the 1950s in Santa Monica—at the RAND Corporation (or at least in its documents). The RAND Corporation played a key role in bringing economics to public policy in the 1960s, as the source of the McNamara “whiz kids” who would bring rational, quantifiable social science to first the Defense Department and eventually the whole executive branch. It’s a fascinating story, but not generally the most bloggable.
Except that in the 1950s, as these ideas were being worked out at RAND, they were basically struggling with the same exact problem the Education Department has been having this week. Plus ça change…They didn’t solve it either.
They called it the “criterion problem”. In the 1950s, RAND was conducting what they called systems analysis. During World War II, operations research had become really good at answering questions that optimized decisions at the human-technology interface. If you were going to drop mines in Japanese waters and wanted to minimize your losses along the way, at what times should your pilots fly? At what altitude? In what formation? Data on past expeditions could be used in conjunction with rapidly developing mathematical methods to identify the best solution to such problems. Methods like these, advanced especially in Britain and the U.S., contributed significantly to winning the war.
But by the 1950s, RAND analysts were trying to answer related, but broader, questions of national defense for the Air Force. What, for example, was the most efficient way for the U.S. to deliver nuclear weapons to Soviet territory? RAND used its cutting-edge mathematical techniques to identify the “best” answer: Don’t use fancy new jet bombers. Instead, buy a large number of cheap, slow, turboprop planes, and bomb the heck out of Soviet targets.
Well, this recommendation went down like a ton of bricks. RAND’s recommendation optimized based on a specific criterion: maximizing damage inflicted per dollar spent. But despite the quantity of its calculations and sophistication of its analysis, RAND had overlooked some really big things. One, it ignored the value of pilots’ lives—which did not go over so well with Air Force brass, most of them former pilots themselves. Two, it went against a deep organizational imperative of the Air Force: to develop exciting new planes.
The failure of this massive project, the result of many man-hours (sic) of work, caused major handwringing at RAND. Much of it was centered around the criterion problem. What should they be optimizing for? What if the best thing to optimize for was hard, or impossible, to measure? What if conflicting goals seemed equally valuable? Should they “sub-optimize”—solve narrower problems, where the criterion problem was most acute—and let the big picture take care of itself? What was quantifiable, and what should they do with important, but unquantifiable, factors?
Some of the best minds of mid-century social science—Jack Hirshleifer, Armen Alchian, Charles Hitch, Charles Lindblom—published internal papers contributing to this debate. A number (including Hirshleifer, Alchian, and Lindblom) were quite skeptical that the problem was solvable, and thought that systems analysts were, more or less, barking up the wrong tree. But theirs was not the position that won out. Though RAND’s analysts were slightly chastened by this first big failure—Hitch pointed out that “calculating quantitative solutions using the wrong criteria is equivalent to answering the wrong questions” and “may prove worse than useless”—they soldiered on, believing that despite the influence of politics, organizational dynamics, and human psychology, better criteria could improve the decision-making process.
And, for the most part, they succeeded—at least at convincing others, first in the Defense Department and then elsewhere in Washington—that this was the case. There is, in fact, a long but direct line from RAND’s 1950s struggle with the criterion problem and the Education Department’s struggle with how to measure successful college outcomes today. (It is no coincidence that one of RAND’s first non-defense studies was an application of systems analysis to education by an economist who would go on to play a key role in the War on Poverty.)
The problems today are basically the same. We haven’t really made much progress toward solving the criterion problem—because, as a fundamentally human, not technical, problem, it is not actually solvable. But our hope or faith that we can, in fact, answer it continues to take us down all sorts of interesting roads. A little more understanding of the past might usefully inform these efforts.
(Note: There are a zillion sources on RAND. The list of books in the picture is a good start; I’ll also point to Will Thomas’s new book, Rational Action: The Sciences of Policy in Britain and America, 1940-1960), as well as David Jardini’s Thinking through the Cold War, which, until it was recently self-published, was the best never-published dissertation I’d ever read.)