Student evaluations are biased against professors teaching quantitative courses – Author interview with Bob Uttl

Today we published “Student evaluations of teaching: Teaching quantitative courses can be hazardous to one’s career”  by Bob Uttl and Dylan Smibert. This study used 14,872 publicly posted class evaluations with input from over 325,000 students to identify the differences in standard evaluations of teaching in quantitative vs non-quantitative subject areas. Their analysis demonstrates the bias against professors teaching quantitative courses. The study also finds that professors teaching quantitative vs. non-quantitative courses are far less likely to receive tenure, promotion, and/or merit pay when their performance is evaluated against common standards.

These findings have substantial implications for professors teaching quantitative courses, especially in a time when performance metrics have such a strong influence on university decision-making. Here we interview the corresponding author Bob Uttl on the apparent bias against professors teaching quantitative courses and what colleges and universities can do to be more transparent about the way Student Evaluations of Teaching are used.

PeerJ: Can you tell us a bit about yourself?

Bob Uttl: I am a professor of Psychology at Mount Royal University (MRU), a midsize undergraduate university in Calgary, Alberta, Canada. I am a cognitive psychologist with main research interests in memory, ageing, assessment, and psychometrics. In any given year, I typically teach statistics, advanced research methods, psychometrics, and cognitive psychology courses. I have also accumulated lots of personal experience with Student Evaluation of Teaching (SET) ratings both as a faculty member whose “teaching effectiveness” was evaluated primarily if not exclusively by SET scores, as a member and later a chair of promotion and tenure committees, and as a co-chair and later chair of Mount Royal Faculty Association’s Faculty Evaluation Committee. As the result of these experiences and my research methods, assessment and psychometrics, I have become interested in the uses and misuses of SET ratings.

PJ: Can you briefly explain the research you published in PeerJ?

BU: My former students and I (Uttl, White, & Wong Gonzalez, 2016) have recently shown that the widely accepted evidence of SETs validity as measure of professors’ teaching effectiveness – meta-analyses of so-called multi-section studies by Cohen (1981), Feldman (1989) and Clayson (2009) – are evidence of their lack of validity. We conducted re-analyses of these previous meta-analyses and we also completed a new up-to-date meta-analysis of the multi-section studies. We found that there was zero correlation between SETs and student achievement/learning when small study effect and students’ prior ability and knowledge were taken into account.

Image credit: Sebastiaan ter Burg (Wikimedia Commons CC BY)

Our research published in PeerJ examines the validity of SETs from a different angle; it examines whether SET ratings depend on courses professors are assigned to teach (quantitative vs. non-quantitative) and quantifies the impact of teaching quantitative vs. non-quantitative courses on making high-stakes personnel decisions. Our results show that professors teaching quantitative vs. non-quantitative courses are far less likely to receive tenure, promotion, and/or merit pay when their performance is evaluated against common standards. Although lower SETs of professors teaching quantitative vs. non-quantitative courses is not evidence, by itself, that SETs are biased, other well-established findings suggests that lower SETs of professors teaching quantitative vs. non-quantitative courses are due to factors unrelated to professors’ teaching effectiveness, including students’ lack of basic numeracy, lack of interest in quantitative courses, and math anxiety.

PJ: Do you have any anecdotes about this research?

BU: The way SETs are sometimes used is nothing short of astonishing to anyone who understand key concepts such as precision, central tendency, standards, etc.. To illustrate, one department’s personnel committee concluded that a professor was unsatisfactory and not worthy of promotion and tenure because this professor’s SET scores were 0.01 below the department’s mean SET ratings of 4.25 on 5-point Likert scale. Pushed to its logical conclusion, this department would have to fire annually approximately 50% of its professors and sooner or later would run out of professors to hire since no one would be able to exceed their 5.00 average.

Image credit: University of Liverpool (Flickr, CC BY)

PJ: What kind of lessons do you hope universities take away from the research?

BU: Today, nearly all colleges and universities in many countries ask students to evaluate teaching effectiveness of their professors using SETs. The SET scores are then used to make high-stakes personnel decisions about faculty including hiring, firing, re-appointment, promotion, and merit pay. The administrators, evaluation committees, public, and educational policy makers need to realize that students’ perceptions of teaching, as measured by SETs, are not valid measures of professors’ contribution to students’ learning and reflect

  1. students’ characteristics including prior abilities, knowledge, interests, and motivation;
  2. situational characteristics including course subject (e.g., quantitative vs. non-quantitative), class size, class level, class time, and class physical environment (e.g., room layout, external noise);
  3. course events including the number of students caught plagiarising course work (and subsequently evaluating the professor who called them on it); and
  4. professor attributes that have nothing to do with professors’ teaching abilities such as hotness, accent, and perceived approachability.

Colleges and universities that continue to insist that SETs are valid measures of professors’ teaching effectiveness, despite all evidence to the contrary, ought to be clear and transparent with their hiring and evaluation policies. For example, they ought to include in their hiring ads that applicants must not have a foreign accent, must be “hot”, and must have facial features showing high approachability. To increase hiring efficiency and to avoid costs of later firings, the applicants should be required to provide voice samples as well as professional full body and head color photos of themselves so that the hiring committees can assess each applicant’s accent, hotness, and facial approachability and screen out applicants who do not meet these job requirements. If that seems discriminatory and contrary to public policy, these colleges and universities should reconsider use of SETs in evaluating professors’ teaching effectiveness.

PJ: How did you first hear about PeerJ and what persuaded you to submit to us?

BU: I have published in PLOS ONE, and became aware of PeerJ when Pete Binfield visited Mount Royal University and gave a talk about open access journals. At that time, I already knew that he had previously run PLOS ONE. There were several key factors in our decision to publish in PeerJ. First, we wanted to publish in an open access journal, given that our findings are of interest to thousands of professors as well as to the general public interested in educational issues. Second, PeerJ’s publication plan fees are substantially cheaper than alternatives. Third, PeerJ has a speedy review process.

PJ: How was your publishing experience with us and did you enjoy the process?

BU: Our experience with PeerJ has been flawless except one little hiccup: our manuscript was initially rejected without a review as being out of scope but that decision was promptly (within hours) reversed. We were very impressed with the speed of response to our request for clarification and with the detailed reasoned answer and the reversal of the decision received directly from Pete Binfield.

I was also very impressed by PeerJ providing copy proofs prior to publishing and believe it is extremely important and useful for authors to have one more look and to be sure that no significant changes or mistakes were introduced into their paper by the typesetting process. We also appreciated an optional open peer review. We believe that the open peer review facilitates both transparency and accountability.

I was very impressed by PeerJ providing copy proofs prior to publishing and believe it is extremely important and useful for authors to have one more look and to be sure that no significant changes or mistakes were introduced into their paper by the typesetting process.

PJ: How would you describe your experience of our submission/review process?

BU: Very easy and efficient. The submission platform is easy to use. The reviews were fast. The PeerJ staff responded relatively quickly to one issue that arose during the final submission.

PJ: Would you submit again, and would you recommend that your colleagues submit?

BU: Yes, I will submit again and I am recommending PeerJ to my colleagues.

You may also like...

  • Jeremiah Flintwinch

    Interesting stuff. Did you also look at differences in ratings for elective vs. compulsory courses? Being forced to take “Stats 101” vs. an elective “Applied stats for research” course will likely also greatly influence ratings.

    • Catherine Waddell

      I agree. My uncle used to be a professor and I read some of his online reviews. Student who took his 100-level class thought he was boring. Student with a major in his department and took one of his upper-level courses thought he was smart and engaging.

  • sapp56345

    We never expected such kind of incident in the education. For more developed the education field there need a mutual solutions and every body needs work for them.