They were designed to move American education into the next generation of assessment, replacing the “bubble tests” of the 20th Century with a more sophisticated, tech-centric approach that measured deeper thinking.
But the Partnership for the Assessment of College and Career Readiness (PARCC) and the Smarter Balanced Assessment Consortium have been battered by criticism from educators, parents and political leaders and now face an uncertain future.
The tests, developed with federal support and once embraced by as many as 46 states, have been caught in the backlash to the Common Core State Standards and test-based teacher evaluation systems. As states reject the concept of nationwide standards, they are also jettisoning the assessments designed to provide common testing and cut scores.
But the politics of education can’t change the fact that PARCC and Smarter Balanced represent significant steps forward in the way we assess students.
I was the testing director in Wyoming and my nonprofit organization has advised many state testing agencies, as well as the PARCC and Smarter Balanced testing consortia. So I know testing landscape in American education well. I recently shared what we learned about the Common Core in the course of our work in this blog post and I will follow up with another post on what we need to do next. Today’s piece focuses on PARCC and Smarter Balanced.
These tests, while far from perfect, have improved assessment quality more in the past few years than we saw in the previous 15 years.
Of course, both PARCC and Smarter Balanced had very lofty goals that were likely unreachable by any initiative given the time constraints and the challenges of trying to operate assessment programs across states. But they deliver several key advances, including:
- Progress in technology-enhanced and other innovative item types to measure deeper levels of student thinking than can be done with most multiple-choice tests,
- Intentionally connecting the achievement levels to meaningful outcomes such as college readiness,
- The use of Evidence-Centered Design (ECD) as the basis for assessment design and validation,
- Advances in automated scoring, and
- Improved accessibility and fairness in large-scale assessments.
The initial invitations to develop these tests stressed the importance of “next generation” approaches that moved beyond the all-too-familiar multiple-choice items. PARCC and Smarter Balance responded with innovative solutions, such as technology-enhanced items and prose-constructed responses that require students to demonstrate deep thinking. Such items are expensive to develop, and I doubt that we could have seen such rapid gains without the infusion of federal funds and the collective buying power of states.
A Better Design Approach
The tests also use Evidence-Centered Design, a complex theoretical framework developed by Bob Mislevy and colleagues at the Educational Testing Service in New Jersey.
While it is easy to get wonky about ECD, the key takeaway is that it promotes more meaningful and useful interpretation of results than we can get when simply writing items to standards.
ECD had been applied to only a limited number of large-scale assessments—notably as part of the redesign of several of the Advanced Placement high school science tests—but both PARCC and Smarter Balanced relied on this approach as the foundation for design. Importantly, this work provides an example that states can use should they develop their own unique assessments. Such an approach allows test designers to better support claims about what students know and are able to do compared to approaches that superficially sample the standards with a handful of test items.
One of the major challenges of including direct writing and other performance tasks on large-scale assessments—something that most reformers value—is the cost of scoring. Since each paper needs to be read and scored by at least one trained rater, the costs can escalate rapidly.
One of the hopes for breaking that cost curve is using computers to score student papers. There are still challenges with employing artificial intelligence or other types of automated scoring for tasks requiring students to use specific content-based evidence, but there have been important advances in using computers in this way. Both PARCC and Smarter Balanced rely on automated scoring for substantial proportions of open-response items, and their advances have already been applied in other state contexts.
Finally, it is one thing to advance assessment technology, but it is another thing to ensure that all students are able to participate appropriately. I had the opportunity in 2014 to present at the National Council of Measurement in Education conference about the extent to which PARCC and Smarter Balanced were meeting the heightened fairness expectations articulated in the recently revised Standards for Educational and Psychological Testing.
This allowed me to dig into the research that both consortia had done. I found these consortia had conducted more research and development than any state assessment program had done to ensure fairness and accessibility. For example, both consortia had working groups dedicated to examining fairness and accessibility issues and both consortia made a commitment to employ Universal Design for Learning (UDL) principles in their item and test designs.
UDL, as applied to assessment design, is a framework for ensuring accessibility by design as opposed to trying to fix barriers on the back end with poor-fitting accommodations. Again, the assessments are not perfect and the extensive use of computers creates challenges for some students, but both consortia continue to strive for state-of-the-art solutions.
Rather than toss out these innovative assessments, states should embrace them or, at the very least, learn from their advances.
Scott F. Marion is president of the National Center for the Improvement of Educational Assessment.