From the Field

New Report Finds Few Federally-Funded Education Innovations Helped Students

Education journalism is chock full of stories touting some brand new idea that could fix schools. Artificial intelligence is the current obsession. Philanthropic funders often say they want to see fewer stories about problems and more stories about solutions. But the truth is that lifting student achievement is really hard and the vast majority of innovations don’t end up working.

February 2024 report about a research-and-development program inside the Department of Education makes this truth crystal clear. The failure rate was 74 percent. Under this program, called Investing in Innovation or i3, the federal government gave out $1.4 billion between 2010 and 2016 to education nonprofits and researchers for the purpose of developing and testing new ideas in the classroom. But only 26 percent of the innovations yielded any positive benefits for students and no negative harms, according to the program’s final report.

Most of the 172 grants tested ideas about improving instruction or turning around low-performing schools. Almost 150 of them reported results with more than 20 still unfinished. Of the completed ones, a quarter of the innovations hadn’t been properly tested. Doing rigorous research isn’t easy; you need to set up a group of comparison students who don’t get the intervention and track everyone’s progress. Of the 112 properly evaluated grants, the most common result was a null finding, meaning that the intervention didn’t make a difference. Only a small handful left students worse off. The results for each program are hidden in pages 55 through 64 of a separate appendices document, but I have created a pdf of them for you.

The low success rate for new ideas is “psychologically disappointing,” said Barbara Goodson, lead author of the report and an expert in educational research at the consulting firm Abt Global. “You would hope that all this [innovation] would pan out for students and that we would know better how to make education.”

The original ideas all showed promise and outside reviewers rated applications. But when you try new things and put them to a rigorous test in real classrooms, human behavior and students achievement are influenced by so many things that you cannot control, from struggles at home and poverty to health issues and psychological stress. And it can be difficult to generate downstream results for students on a year-end achievement test when an intervention is targeting something else, such parent engagement.

Some innovations did work well. Building Assets, Reducing Risks or BARR is the poster child for what this grant program had hoped to produce. The idea was an early warning system that detects when children are starting to stumble at school. Teachers, administrators or counselors intervene in this early stage and build relationships with students to get them back on track. It received a seed grant to develop the idea and implement it in schools. The results were good enough for BARR to receive a bigger federal grant from this R&D program three years later. Again it worked with different types of students in different parts of the country, and BARR received a third grant to scale it up across the nation in 2017. Now BARR is in more than 300 schools and Maine is adopting it statewide.

Some ideas that were proven to work in the short term didn’t yield long-term benefits or backfired completely. One example is Reading Recovery, a tutoring program for struggling readers in first grade that costs $10,000 per student and was a recipient of one of these grants. A randomized control trial that began in 2011 produced a giant boost in reading achievement for first graders. However, three years later, Reading Recovery students subsequently fell behind and by fourth grade were far worse readers than similar students who hadn’t had the tutoring, according to a follow-up study. The tutoring seemed to harm them.

It can be hard to understand these contradictions. Henry May, an associate professor at the University of Delaware who conducted both the short-term and long-term Reading Recovery studies, explained that the assessment used in the first grade study was full of simple one-syllable words. The tutoring sessions likely exposed children to these words so many times that the students memorized them. But Reading Recovery hadn’t taught the phonics necessary to read more complex words in later grades, May said. Reading Recovery disputes the long-term study results, pointing out that three-fourths of the study participants had departed so data was collected for only 25 percent of them. A spokesperson for the nonprofit organization also says it does teach phonics in its tutoring program.

I asked Abt’s Goodson to summarize the lessons learned from the federal program:

  • More students. It might seem like common sense to try a new idea on only a small group of students at first, but the Department of Education learned over time that it needed to increase the number of students in order to produce statistically significant results. There are two reasons that a study can end with a null result. One is because the intervention didn’t work, but it can also be a methodological quirk. When the achievement benefits are small, you need a large number of students to be sure that the result wasn’t a fluke. There were too many fluke signals in these evaluation studies. Over the years, sample sizes were increased even for ideas that were in the early development stage.
  • Implementation. Goodson still believes in the importance of randomized control trials to create credible evidence for what works, but she says one of the big lessons is that these trials alone are not enough. Documenting and studying the implementation are just as important as evaluating the results, she said. Understanding the barriers in the classroom can help developers tweak programs and make them more effective. They might be too expensive or require too many weeks of teacher training. The disappointing results of the i3 program have helped spawn a new “science of implementation” to learn more about these obstacles.
  • National scale up. Too much money was spent on expanding new ideas to more students across the nation, and some of these ideas ended up not panning out in research evaluations. In the successor program to i3, the scale up grants are much smaller. Instead of using the money to directly implement the intervention nationwide, the funds help innovators make practical adjustments so that it can be replicated. For example, instead of using expensive outside coaches, a program might experiment with training existing teachers at a school to run it.

Though the original i3 program no longer exists, its successor program, Education Innovation and Research (EIR), continues with the same mission of developing and evaluating new ideas. Currently, it is ramping up funding to deal with the post-pandemic crises of learning loss, mental health and teacher attrition.

It’s easy to feel discouraged that the federal government has invested around $3 billion in the last dozen years on educational innovation with so little to show for it. But we are slowly building a good evidence database of some things that do work – ideas that are not just based on gut instincts and whim, but are scientifically proven with a relatively small investment compared to what the government spends on research in other areas.  By contrast, defense research gets over $90 billion a year. Health research receives nearly $50 billion. I wonder how much further we might be in helping students become proficient in reading and math if we invested even a little bit more.

Read FutureEd’s analysis of the $7-billion federal School Improvement Grant Program here.

Jill Barshay is a senior reporter at The Hechinger Report, where she writes the weekly “Proof Points” column about education research and data. This column was initially published by The Hechinger Report.