orgtheory.net

students evaluations are garbage and so are letters of recommendation – but NOT gre scores, haters!

For a long time, I believed that student evaluations were valid measures of teaching effectiveness. My belief was based on the following issues.

  • First, there are a fair number of studies that claim a correlation between student evaluations and learning. The critics conveniently overlook this literature.
  • Second, I believed that students can spot a miserable teacher. You don’t need to be steeped in pedagogy theory to see if an instructor is disorganized, or is simply a horrid lecturer.
  • Third, most complaints about student evaluations seemed pretty self-interested. Who complains about evals the most? The professors!* Doesn’t mean they are wrong, but one should examine self-interested claims with some caution.
  • Fourth, critiques of student evaluations of faculty are often couched in bad logic. For example, if an instrument is biased against group X, it doesn’t mean automatically that the instrument is not consistent or valid. It might be the case that the instrument is less valid and consistent for group X, but still points in the right direction. You can only say that student evaluations are “worthless” if the correlation between evals and learning is zero and that is a stubbornly empirical point. Yet, critics in the popular media jump from bias to a lack of validity.

But over time, there have been a parade of better studies that explore the link between outcomes and evaluations and the answer is often null. So what should any seriously interested person do? Wrong answer: Cherry pick studies that confirm one’s belief. Better answer: look for a meta-study that combines data from new and old studies. This fall, Studies in Educational Evaluation published on such meta-study of student evaluations of teacher. Bob Uttl, Carmela White, and Daniela Wong Gonzalez performed such a meta-analysis can come to the following conclusions:

• Students do not learn more from professors with higher student evaluation of teaching (SET) ratings.

• Previus meta-analyses of SET/learning correlations in multisection studies are not interprettable.

• Re-analyses of previous meta-analyses of multisection studies indicate that SET ratings explain at most 1% of variability in measures of student learning.

• New meta-analyses of multisection studies show that SET ratings are unrelated to student learning.

There article is not perfect, but it is enough to make me seriously reconsider my long standing belief in student evaluations. I am very willing to consider that student evaluations are garbage.

However, I want to the reader to be consistent in their intellectual practice. If you believe that student evaluations are bunk, then similar evidence suggests that letters of recommendation are garbage as well. Here is what I wrote two years ago:

I slowly realized that there are researchers in psychology, education and management dedicated to studying employment practices. Surely, if we demanded all these letters and we tolerated all these poor LoR practices, then surely there must be research showing the system works.

Wrong. With a few exceptions, LoRs are poor instruments for measuring future performance. Details are here, but here’s the summary: As early as 1962, researchers realized LoRs don’t predict performance. Then, in 1993, Aamondt, Bryan and Whitcomb show that LoRs work – but only if they are written in specific ways. The more recent literature refines this – medical school letters don’t predict performance unless the writer mentions very specific things; letter writers aren’t even reliable – their evaluations are all over the place; and even in educational settings, letters seem to have a very small correlation with a *few* outcomes. Also, recent research suggests that LoRs seem to biased against women in that writers are less likely to use “standout language” for women.

The summary from one researcher in the field: “Put another way, if letters were a new psychological test they would not come close to meeting minimum professional criteria (i.e., Standards) for use in decision making (AERA, APA, & NCME, 1999).”

 

If you are the type of person who thinks student evaluations are lousy, then you should also think letters of recommendation are garbage as well. To believe otherwise is simply inconsistency, as the evidence is similar in both cases.

While I am at it, I also want to remind readers that similar analysis shows that standardized tests are actually not bad. When you read the literature on standardized tests, like the GRE, you find that standardized tests and grades are actually correlated – the intended purpose. And I haven’t seen many other meta-analyses that over turn the point.

To summarize: student evaluations and letters of recommendation are bunk, but standardized tests are not.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($4.44 – cheap!!!!)/Theory for the Working Sociologist (discount code: ROJAS – 30% off!!)/From Black Power/Party in the Street / Read Contexts Magazine– It’s Awesome!

*For the record, my evals range from slightly below average to very good. And I’ve actually won multiple teaching awards. So this is not a “sour grapes” issue for me.

 

Written by fabiorojas

January 5, 2018 at 4:53 am

3 Responses

Subscribe to comments with RSS.

  1. Unfortunately, students evaluations are not used having learning as a primary goal. They are mostly used as a customer satisfaction tool. The real problem is considering and treating students as customers.

    Like

    Richard Diebenkorn

    January 5, 2018 at 11:28 am

  2. Course evaluations certainly don’t measure teaching effectiveness, just student experience. Letters also don’t measure a candidate’s potential. GREs measure grades? Sure, but does that correlate with success in PhD programs?

    Like

    cwalken

    January 5, 2018 at 7:46 pm

  3. Evaluations may not be useful as a metric to evaluate the quality of instructors, but I often get useful and constructive feedback in the qualitative portions. If a lot of students say something did or didn’t work well, that’s useful to know.

    Like

    JPD

    January 5, 2018 at 11:58 pm


Comments are closed.