Commentary

A Vision for the Next Generation of Testing

When schools were shuttered around the country two months ago, the pandemic did what nearly a decade of activist parents and testing skeptics could not do—put a system-wide pause on state-wide standardized testing. It wasn’t because the tests were too long, poorly aligned to classroom learning or because benchmark exams and test prep was robbing students of a deep and meaningful curriculum—charges that testing critics had been articulating for years.

Rather, classroom learning was shifting to distance learning and the federal government, whose mandates motor much of the state-wide testing, offered waivers to the states. And just like that, the $1.7 billion testing system got a hole blown through it as big as Texas.

In this pandemic pause, there’s an opportunity to look at some new ideas and to build a vision for a new generation of assessment. The PARCC and Smarter Balanced consortium tests were a giant step forward for most of the states that adopted them. Even those states that balked were forced to up their game.

But the promise of a “next generation” of assessment remains unfulfilled. How can assessments be better aligned to curriculum? How can tests be used to promote deeper learning? How can we test what is important rather than determine what is important by how easy it is to test it?

The assessment system devised by the International Baccalaureate program is worth scrutinizing for answers to these questions. IB schools were established in 1968 as a way to provide children of peripatetic diplomats and international business people with a consistent, internationally-recognized, high school curriculum and assessment system that would be acceptable to top universities around the world.

The 5,000 or so IB schools now use an interdisciplinary approach to education for 1 million students around the globe. Four different IB programs enroll students from age 3 to 19. Although the program started in Geneva, the United States has the largest number of IB programs (2,010 out of 5,586) which are offered in both private and public schools, some of which are serving middle- and low-income communities.

Although formative assessments are given throughout the school year, the program has an annual summative assessment at the end of the school year, though not all IB students take it. Prepping, which is called “Reading Period,” isn’t about test-taking tricks or learning to fill in bubbles but rather reading deeply into material covered in their structured curriculum. Middle schoolers take a multi-media online test and submit both a project and a portfolio of work around art and design. High schoolers submit classwork and take tests in a myriad of subjects which usually consist of writing essays, conducting multi-step calculations and short answers. (This year, the IB program assessed students on classroom work alone.)

These tests force students to actually show their ability to think and integrate ideas on the spot. And they aren’t easy. A high school student sitting down to take her history exam might be asked to write essays on sample questions like these: “How successful was either Lenin (1917-1924) or Mussolini (1922-1943) in solving the problems he faced?” Or “To what extent do you agree with the view that war accelerates social change?”

A student might culminate his study of chemistry by taking a test with this question: “The effect of some drugs used to treat cancer depends on geometrical isomerism. One successful anti-cancer drug is cisplatin, whose formula is PtCl (NH ) 2 32. Describe the structure of cisplatin by PtCl (NH ) 2 32 referring to the following:

….the meaning of the term geometrical isomerism as applied to cisplatin

….diagrams to show the structure of cisplatin and its geometrical isomer

…… the types of bonding in cisplatin.”

While these tests are challenging, they succeed in assessing higher order thinking skills on a wide range of students who hail from different social and geographical contexts, and to hold those students to the same standard in a transparent way. They are graded by teams of teachers and exam monitors who are trained and overseen by chief examiners. Each grader uses a ““weak criterion referencing” system, that is, setting the standard according to a description of what to look for in candidate performance with an eye to how top IB students scored in years past.

This kind of assessment costs a lot of money: $119 per student per subject. By comparison, states that use PARCC and Smarter Balanced test pay about $22 per student per test but some states pay as little as $9 per student per year for assessments.

Almost every state is girding itself for severe cutbacks in educational spending. But education spending is on cycle. It might be useful to use this crisis as an opportunity to plan for the future.

As they do that, states should ask: How does our current test system accurately measure the things that count? What better ways of measurement are available to us? And most crucially, how much are we willing to pay for a better test that assesses student learning, not simple recall, logic and test taking ability, the cheapest and easiest things to measure.

Peg Tyre is director of strategy at the Edwin Gould Foundation in New York City and a FutureEd senior fellow.

A Vision for the Next Generation of Testing

Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action.

The Churn

Events

FutureEd in the News