student evaluations work


A few days ago, my esteemed co-blogger Brayden wrote the following in a post about university teaching:

“I almost finished this post with a rant about the ineffectiveness of student evaluations in measuring teaching quality (coming from someone who has no complaints about ratings received in the past), but I don’t think it’s worth bringing up again as almost everyone I know in academia seems to agree that student evaluations are poor measures of teaching quality.”

Is this true? Answer: Not really. There’s actually a healthy scholarly & non-polemical literature showing that student evaluations of faculty (SEF) aren’t half bad. Michael Huemer (“I have the best job possible: I am a professional philosopher”), a philosophy professor at Colorado @ Boulder, has written a nice essay summarizing the empirical research on SEF’s. Here are the main points I take away from Michael’s essay:

  • SEF’s are reliable. Instructors receive similar ratings from different students.
  • They also seem to have some validity. A few studies have tested students from calculus course sections with different instructors. SEF’s correlate (.4 to .5) with test scores.
  • Evaluations from trained observers or other faculty are not as reliable as SEF’s, which Huemer notes is a precondition for being valid.
  • Evaluations correlate with easy grading, race/gender of instructor (not addressed by Huemer, but it’s true according to other research), and the charisma of the professor.

The way I interpret SEF research:

  • Students aren’t dumb. They know when they are learning something. They may not articulate their views terribly well, but they seem to know when the class is working. It shouldn’t surprise us that class surveys capture these views.
  • Students may like their professor because of their charm, their race/gender and lax grading standards. I would guess that liking someone means you learn more from them. Ever take a math course from an instructor you hate? I bet you didn’t learn much!
  • SEF’s are useful but they can be manipulated. You can give easy grades or otherwise pander to your class. Some effects – like race and gender – can’t be manipulated so easily. Teacher’s can also improve SEF’s by doing simple things like preparing lectures well, speaking clearly, showing up on time, returning papers quickly and using transparent grading standards.

In the end, SEF’s are here to stay. They may not be perfect, but they work decently and there is not an obvious alternative. Instructors can take heart as well. Though you can’t change some features of your teaching (like your gender), people do respond well if you are enthusiastic, fair and well prepared. And that’s not bad!

Written by fabiorojas

March 8, 2007 at 6:28 pm

Posted in academia, education, fabio

7 Responses

Subscribe to comments with RSS.

  1. So, studies show that students who like the class also tend to get good grades in the class. This seems to be the best evidence we have that SEF actually measure learning outcomes. I’m skeptical though. Couldn’t it just be that students who get good grades feel more positively about their professors than students with bad grades? I have a friend who told me that he’s going to start grading his freshman course easier because he was getting attacked in the comments section of the SEF. What a horrible incentive to dumb down the material!

    My critique of student evaluations is based more on the face validity of the measures. If we want to know how much students learn in a class, then we should probably try to measure those learning outcomes more directly, rather than relying on measures that are confounded by a number of other factors that are mostly unrelated to learning outcomes. As to your point that we learn more from those we like: this may be partly true, yet during my time as a student I had some professors that I didn’t necessarily enjoy on a personal level but that taught me a lot. Based on my experiences, the teachers that challenged the students the most to learn were not necessarily the most likable.



    March 8, 2007 at 7:59 pm

  2. “In institutionalized organizations, then, concern with the efficiency of day-to-day activities creates enormous uncertainties. Specific contexts highlight the inadequacies of the prescriptions of generalized myths, and inconsistent structural elements conflict over jurisdictional rights. Thus the organization must struggle to link the requirements of ceremonial elements to technical activities and to link inconsistent ceremonial elements to each other” (Meyer and Rowan 1977: 356).

    Do we really want to reduce our hallowed teaching activities to some mundane technical definition of efficiency? Really, we are not calculus teachers. We are supposed to convey material, but also to challenge, deepen and in overall respects upgrade our students critical thinking capacities. It is unclear whether we know half the time whether we are effective in such ill-defined endeavors…are we really going to let THEM decide? No freakin’ way! :)



    March 9, 2007 at 1:26 am

  3. “Couldn’t it just be that students who get good grades feel more positively about their professors than students with bad grades?”

    Michael’s page describes one of the few experiments to address this issue. One of the studies cited was an analysis of how different course sections did on a test administered at the end of class. Presumably, evaluations were taken before the end and grading was not so dependent on the course section leaders. In that case, it would be hard to ascribe that good grades were directly responsible for good evals.

    I think the simplest explanation is probably right: students can spot teaching talent but they also respond to pandering such as lowering standards. Two simple factors probably account for a lot SEF variation. Is that really such an implausible model of SEF’s?

    Empirically, how do we see most profs respond to low evaluations? (a) Be better teachers and (b) be easier graders/ make the class more fun. Makes sense to me.


    Fabio Rojas

    March 9, 2007 at 1:35 am

  4. I agree with Fabio’s assessment. SEFs are effective and very blunt. What bugs me is how throughly unproblematic they are often treated. Does a 4.5 versus a 4.2 mean something?

    And, the lack of transparency…

    I have no idea what other faculty get, how departments differ, how SEFs vary wit required versus elective classes, how they vary with teacher experience (if I teach a required class for the first time, is there a generally lower evaluation?).



    March 9, 2007 at 2:19 am

  5. I’m thoroughly convinced that grades affect student evaluations of teaching. That’s not all there is to it obviously, but it can have a big effect–big enough to move you from the top of the heap to the bottom.

    Not to brag, but I consistently get high teaching scores on student evaluations. One semester, I decided I wanted to challenge my students a little more and try to induce them to work a little harder on their papers. This was maybe my 6th time teaching the class and I’d had very consistent teaching scores across all prior offerings. It’s a big class with 350 students and so this wasn’t a case of fluctuation from just a couple of people. I taught the class the same as always, except for one thing. I lowered the grades on the first paper by about a full letter grade. (Given grade inflation around here, that’s probably where they should have been in the first place, but that’s another topic altogether).

    That did two things–the second set of papers was WAY better than the first one. And, I got the lowest teaching evaluations I had received in any class since I started teaching. You may not want to try such an experiment prior to tenure, but it might prove interesting.


    Dan Myers

    March 9, 2007 at 3:58 am

  6. Great, that little experiment of yours Dan sounds like what I’ve been trying in the classroom this semester.



    March 9, 2007 at 6:27 am

  7. Maybe SEFs could be improved by somehow taking student reputations into account as well?

    And has academia ever tried anything like 360 degree reviews like some companies do, where you review your subordinates, superiors, and your peers?



    March 14, 2007 at 12:13 am

Comments are closed.

%d bloggers like this: