We are an independent, solution-oriented think tank at Georgetown University's McCourt School of Public Policy.Learn More.

Exploring The Human Side of Teacher Evaluation

Over the past decade we have learned a tremendous amount about measuring teaching effectiveness, but not enough about how to design the teacher evaluation systems that use these measures. TNTP’s seminal report, The Widget Effect, unmasked for policymakers what practitioners long knew: Teacher evaluation was largely a perfunctory exercise, with few participants taking the process or the data seriously.

The Gates Foundation’s Measures of Effective Teaching (MET) project proved it is possible to measure teaching effectiveness and also demonstrated that school personnel, given proper training, could reliably assess effectiveness using the new generation of classroom observation instruments.

In the wake of MET and with the encouragement of federal competitive grants, states and districts rushed to design and implement new teacher evaluation systems that promised to be markedly different from those denounced in The Widget Effect. These new systems, it was hoped, would strengthen accountability and also help teachers improve their practice.

The new teacher evaluation systems have increased accountability, systematically tying employment to performance for the first time in public education history. They have provided school systems with information to make sounder human capital decisions. They have forced principals to prioritize classrooms over bus schedules and lunch menus, engendering much-needed conversations in many schools about the elements of good teaching.

And the best of the new measurement systems are supplying a foundation for a wide range of new, performance-based teacher roles that are making teaching more attractive.

But using the new systems to improve teaching has proven to be a more challenging task. It is a design problem. Teacher evaluation has been designed to fail at improvement in at least two ways. First, evaluation systems that expect all teachers to simultaneously fulfill accountability and improvement aims fail to recognize, let alone resolve, the tension between accountability and improvement.

Accountability vs. Improvement

Second, teachers bear too much of the burden for teaching improvement, particularly when relevant administrators and central office personnel do not share accountability for teaching effectiveness. These are design flaws because they require teachers to disregard their instinctual, and very human, tendencies to avoid loss and seek fairness.

Teachers will naturally feel conflicted when asked to fulfill accountability and improvement aims simultaneously. Improvement rewards identifying and addressing weaknesses. Accountability penalizes weaknesses. During a classroom observation, the teacher makes a decision to hide or expose weakness.

To expose weakness may elicit feedback that leads to improvement, but it could also make it more likely that a teacher loses her job. The alternative, to hide weakness, impairs improvement, but it preserves employment. Given a strong human tendency to give more weight to the risk of loss (behavorial economists call this loss aversion), teachers are apt to hide, rather than expose, weaknesses in their teaching practice.

Teachers also will resent bearing more than their reasonable share of responsibility for improving teaching. Hidden within the oft-repeated mantra “the teacher is the most important in-school factor to student success” is a host of supports that together with the teacher constitute teaching. While the teacher is obviously the primary actor, teaching is buttressed by standards, curriculum, assessments, instructional leadership, professional development, school culture, and on and on.

Also Read: Grading the Graders

However reasonable, it is rare for those responsible for providing these supports to share accountability for the overall quality of teaching. Yet, teachers are dependent upon these supports, and their teaching will suffer should they fail. Indeed, this has happened in many school systems where teachers were accountable for student performance on the new Common Core assessments, but lacked standards-aligned curriculum materials. When these teachers were held accountable for things out of their control, they rightly judged the system as unfair.

Nevertheless, good teacher evaluation system design is possible, despite inherent tensions between accountability and improvement and the difficulties of assigning accountability fairly. It requires abandoning a “one size fits all” design approach in favor of one that addresses the tension between accountability and improvement and distributes accountability fairly among those responsible for teaching quality. Two practical design suggestions follow.

If the purpose of accountability is to prevent irrecoverable harm to student learning, then it is essential to establish what level of learning lag is tolerable and where learning is lagging beyond that level. While the answer to the latter question will never be precise, it will also never be: All teachers, everywhere.

A Better Design

Yet, the typical teacher evaluation system pretends that failure could occur anywhere at anytime. It just doesn’t happen that way. Data from the MET project, corroborated with data elsewhere, suggest sizeable learning lag occurs in no more than 5 to 6 percent of classrooms.

Moreover, using prior teacher evaluation data, school systems would be able to identify nearly all of these instances of unacceptable learning lag by focusing data collection for accountability on the bottom 15 percent of classrooms.

The other 85 percent of teachers should focus on improvement and not be hampered by accountability concerns. The solution is to place individual teachers in either the accountability system or the improvement system, but not both. These two systems, while using the same high quality measures and rigorous measurement standards, will necessarily use the data for different purposes.

If many factors that impact teaching quality lie outside of teachers’ control, then it makes sense to hold school systems responsible for establishing a sufficiently high level of average teacher performance and hold individual teachers to account only when their individual performance is demonstrably different from average. This fairly assigns responsibility to individuals to perform within a tolerable range of the system’s average.

Imagine that the school system sets a learning lag of two months below the system average as unacceptable and a learning gain two months above the system average as commendable.

In such a case, the teachers whose students produce learning gains within two months of the system average are performing as expected. While these teachers perform as expected, the result is not necessarily effective. Average teaching quality may be too low to help students succeed.

Even so, the system needs to take responsibility for the average level of teaching. Teacher performance will need to keep up with the improving system, but average performing individual teachers should not be blamed for a failing system.

These two remedies, disentangling accountability from improvement and extending accountability for teaching quality to administrators and central office personal, should help teachers experience evaluation as more helpful and fair. It is time to attend to the human side of teacher evaluation.

Steve Cantrell is former head of evaluation and research at the Bill & Melinda Gates Foundation where he co-directed the Measures of Effective Teaching (MET) project and led the foundation’s internal measurement, learning, and analytics.