Commentary

Bursting the Bubbles in American Testing

They were designed to move American education into the next generation of assessment, replacing the “bubble tests” of the 20th Century with a more sophisticated, tech-centric approach that measured deeper thinking.

But the Partnership for the Assessment of College and Career Readiness (PARCC) and the Smarter Balanced Assessment Consortium have been battered by criticism from educators, parents and political leaders and now face an uncertain future.

The tests, developed with federal support and once embraced by as many as 46 states, have been caught in the backlash to the Common Core State Standards and test-based teacher evaluation systems. As states reject the concept of nationwide standards, they are also jettisoning the assessments designed to provide common testing and cut scores.

But the politics of education can’t change the fact that PARCC and Smarter Balanced represent significant steps forward in the way we assess students.

I was the testing director in Wyoming and my nonprofit organization has advised many state testing agencies, as well as the PARCC and Smarter Balanced testing consortia. So I know testing landscape in American education well. I recently shared what we learned about the Common Core in the course of our work in this blog post and I will follow up with another post on what we need to do next. Today’s piece focuses on PARCC and Smarter Balanced.

These tests, while far from perfect, have improved assessment quality more in the past few years than we saw in the previous 15 years.

Of course, both PARCC and Smarter Balanced had very lofty goals that were likely unreachable by any initiative given the time constraints and the challenges of trying to operate assessment programs across states. But they deliver several key advances, including:

Progress in technology-enhanced and other innovative item types to measure deeper levels of student thinking than can be done with most multiple-choice tests,
Intentionally connecting the achievement levels to meaningful outcomes such as college readiness,
The use of Evidence-Centered Design (ECD) as the basis for assessment design and validation,
Advances in automated scoring, and
Improved accessibility and fairness in large-scale assessments.

The initial invitations to develop these tests stressed the importance of “next generation” approaches that moved beyond the all-too-familiar multiple-choice items. PARCC and Smarter Balance responded with innovative solutions, such as technology-enhanced items and prose-constructed responses that require students to demonstrate deep thinking. Such items are expensive to develop, and I doubt that we could have seen such rapid gains without the infusion of federal funds and the collective buying power of states.

A Better Design Approach

The tests also use Evidence-Centered Design, a complex theoretical framework developed by Bob Mislevy and colleagues at the Educational Testing Service in New Jersey.

While it is easy to get wonky about ECD, the key takeaway is that it promotes more meaningful and useful interpretation of results than we can get when simply writing items to standards.

ECD had been applied to only a limited number of large-scale assessments—notably as part of the redesign of several of the Advanced Placement high school science tests—but both PARCC and Smarter Balanced relied on this approach as the foundation for design. Importantly, this work provides an example that states can use should they develop their own unique assessments. Such an approach allows test designers to better support claims about what students know and are able to do compared to approaches that superficially sample the standards with a handful of test items.

One of the major challenges of including direct writing and other performance tasks on large-scale assessments—something that most reformers value—is the cost of scoring. Since each paper needs to be read and scored by at least one trained rater, the costs can escalate rapidly.

One of the hopes for breaking that cost curve is using computers to score student papers. There are still challenges with employing artificial intelligence or other types of automated scoring for tasks requiring students to use specific content-based evidence, but there have been important advances in using computers in this way. Both PARCC and Smarter Balanced rely on automated scoring for substantial proportions of open-response items, and their advances have already been applied in other state contexts.

Bursting the Bubbles in American Testing

A Better Design Approach

Read More: What’s Next for the Common Core and Its Assessments?

The Churn

Events

FutureEd in the News