“The Widget Effect,” a 2009 report published by TNTP, showed that principals’ subjective ratings of teachers in the context of performance evaluation tend to be incredibly inflated. That high-stakes decisions such as continued employment depend on such ratings is an important part of this story, but a new study by researchers Jason Grissom of Vanderbilt University and Susanna Loeb of Stanford University offers some welcome nuance by combining data from principals’ ratings in both high- and low-stakes venues with value-added estimates of teachers’ impact on student achievement.
Their article in the journal Education Finance and Policy addresses three questions. First, how are principals’ teacher ratings distributed, and does this vary with the stakes attached? Second, how well do the high- and low-stakes ratings track one another, and how do they stack up with value-added estimates? Third, what principal and teacher characteristics predict differences, positive or negative, between a teacher's high- and low-stakes ratings?
Grissom and Loeb find that the high- and low-stakes ratings are moderately correlated with one another, though correlation masks important differences in the absolute ratings—even teachers rated “very ineffective” on classroom effectiveness under low-stakes receive scores averaging above 3.0 (“effective”) under high-stakes. Each of the ratings track value-added estimates similarly, with stronger correlation in math than reading. Teacher and principal characteristics do not help explain differences between high- and low-stakes ratings, but there seems to be some “idiosyncrasy” or bias behind part of the differences. Some principals, for example, are prone to inflating novice teachers’ high-stakes rating.