From the Field

Q&A: Backes on New Ways of Measuring Teachers’ Impact on Student Success

In research and in some school district evaluation systems, teacher effectiveness is measured by teachers’ impact on student test scores. But in the past decade, researchers have started finding that teachers add value on other success factors. In a pair of studies, a team from the National Center for Analysis of Longitudinal Data in Education Research looked at how teachers influence both what’s known as non-test outcomes—absences, suspensions, course grades and grade retention—and school climate. FutureEd Associate Director Phyllis Jordan spoke to American Institutes for Research economist and co-author Ben Backes about the results of the studies and their implications for school districts and states.

What did you find about teachers’ impact on the indicators for student success beyond test scores?

We found that some teachers do a better job influencing the non-test factors, and others have a greater impact on test scores. Both types of teachers affect which colleges students attend several years later. But the teachers who contribute to higher student test scores have a greater impact on whether students at the top of the academic scale attend a more selective college. The teachers affecting non-test measures are more likely to impact whether students graduate from high school and enroll in college of any sort.

Why do you think that is?

This is really the black box of every type of teacher measurement that we have. We know that some teachers raise test scores, but we don’t know what they’re doing differently. And that is just as true of non-test outcomes. That’s something we want to explore further—what teachers are doing differently, which types of teachers are going to be better at which types of things, and why their [test and non-test] skills don’t overlap.

You did a second study looking at teachers’ influence on school climate and how it affected student outcomes.

The climate study is based on the VOCAL student surveys that Massachusetts collects. There are three categories: whether students feel engaged at school, whether they feel safe at school, and the instruction, mental-health support, and discipline of the school. Massachusetts computes scores on these, and we have various sub scores to compare.

The survey asks about school climate, not teachers. But there are questions we would expect to be under the influence of teachers. For example, they ask about how students feel about relationships between students and teachers. We expect that some teachers might be better than other teachers at making students feel safe and comfortable when they come to school.

Did you find that was true?

We found that students in some classrooms report a better climate than those in other classrooms, what we call “climate value-added.” The effect is larger in elementary classrooms, which are usually self-contained. And it’s larger for certain aspects of school climate, including classroom environment and relationships with school personnel, rather than safety. Climate value-added has the smallest effect on physical safety and bullying.

Does this translate into better outcomes for the students?

Yes. Primarily on test scores and course grades, surprisingly. I would have guessed that would show up on the non-test factors, disciplinary incidents and things like that. I would have thought that was more connected to school climate—your perception of how safe you feel at school, whether you’re connected to the adults on staff at your school. But it turns out that that’s not the case.

Did you find any variations among different groups of students?

We find that Black students assigned to Black teachers report better school climate than Black students assigned to other teachers. In addition, we find that teachers identified by students of color as contributing to positive school climate also reduce disparities in educational outcomes between students of color and White students.

Given what you found, how should schools use this information? Does it make sense to evaluate teachers based on their influence on non-test factors or school climate issues?

That’s one of the big questions that comes out of this is: How can we actually use this information? For research purposes, there’s a lot of ways that you could apply this. For example, if you had professional development for teachers and wanted to see if they improved, you could look at whether they improved not just test scores but these other outcomes. That’s a broad research application. It’s harder to use this information to say something about individual teachers. And there are a lot of problems using just one year of student outcomes. And we find non-test results are even more variable from year to year.

So you’d want to look at several years of data to get a clear picture?

You need to observe the same teacher a lot of years, research suggests at least three years, to be able to have any degree of confidence that you’re actually learning something about how well they raise the test scores or other outcomes for their students.

Is there a concern that some of these non-test aspects could be manipulated? For instance, if it’s a good indicator that you have students with high grades or low suspension rates, then that’s something a teacher would control.

Yes, a thousand percent. Even at a school level, if you want to use this for school accountability, and you told schools that they’re going to be graded based on how often they suspend their students, we can probably guess what’s going to happen. It may or may not be a good thing if the students are suspended at a lower rate, but the measure would probably stop being as useful as it is now if schools or teachers were held accountable based on these measures.

So what is the best way to use this information?

It can be important for research purposes. If you have an intervention and want to know if it’s working, or you want to know what matters about some characteristic of a teacher—their background or how they were trained, or their route into the profession.

These are questions where we’re using information from a lot of different teachers pooled together to learn something about the teacher workforce, whether it’s how quickly teachers get better over time as they gain experience or whether teachers from alternative routes into the profession are more or less effective. All these are questions that traditionally we have answered using test-based value-added measures; it’s an easy next step to start incorporating the non-test and climate measures.

Where it gets harder is how to make this useful for anybody who’s not a researcher. If you’re a state official and you’re in charge of accrediting teacher-preparation programs and you want us to know whether graduates of certain programs were better than others, you could extend that analysis to non-test or other outcomes. Again, these are broad aggregations of teachers. You’re not trying to say this teacher is better than this other.

Could this information be used for student assignment? For instance, if there’s a student who’s on track and is trying to go to a selective school, would it be better to put them with the teacher who’s good at improving test scores? Or to put someone who needs more support with another kind of teacher?

That’s definitely what the research suggests. If you’re a student on the high end of the test-score distribution, your outcomes are likely to be better if you have a teacher who has higher test-based, value-added scores.  The opposite is true for the non-test measures. If you’re a student that is at risk of not finishing high school or not going to college, you’re better off with a teacher with high scores on non-test value-added measures.

What’s your next step? Are you going to try to figure out why some teachers are more successful than others in improving school climate and non-test indicators of student success?
This is part of a broader project we’re doing with Massachusetts, trying to understand how teacher-training programs shape value-added qualities in novice teachers. We’re also looking at how the match between mentor teachers and novice teachers in the induction process affects these indicators of student success.

Teachers and Students’ Postsecondary Outcomes: Testing the Predictive Power of Test and Nontest Teacher Quality Measures

Teachers and School Climate: Effects on Student Outcomes and Academic Disparities