Philip Stark (UC Berkeley statistics) and Richard Freishtat (UC Berkeley Center for Teaching and Learning, which supports undergraduate education) have written An evaluation of course evaluations.
Stark and Freishtat observe, among other things:
- that nonresponse bias is a serious problem;
- that averaging ordinal variables doesn’t make sense (are a 3 and a 7 on a seven-point scale the same as two 5s?);
- that students can more effectively comment on some aspects of pedagogy than others;
- That student evaluations are influenced by student grade expectations, and by instructor gender, age, ethnicity, and attractiveness…
If anybody should know how hard measurement is, it’s statisticians.
With this in mind, I’m amused to remember that when I was there, teaching evaluation averages by instructor and course were actually posted on a bulletin board, with other information relevant to students, outside the departmental office on the third floor of Evans Hall. Apparently the department has a more “holistic” procedure in place now for evaluating teaching; I was not at Berkeley long enough to comment on the old process. (Two academic years, as a lecturer.)
To be honest, I often found student comments more useful than grades – but it is difficult to read those comments. The format of the evaluations and the fact that they’re usually given at the end of a class period seems designed to discourage thorough comments (and Stark and Freishtat point out that comments in evaluations of technical courses tend to be less discursive). And the most critical comments tend to stick in one’s craw, which is only human nature.
I’d always found student comments more useful than numbers, although sometimes those comments were “his hair is a mess” and “he dresses lousy”. They were correct, but not helpful in making me a better teacher.