When assessing complex, multidimensional standards like Common Core Mathematics and Next Generation Science Standards (NGSS), how can you cut costs of assessments, increase the quality of the assessment results, while improving convenience for states and districts? My answer is based on understanding the assessment industry is experiencing a strategic inflection point—fundamental changes in the business environment upending the assumptions of assessment company business models. This strategic inflection point has made conventional assessment development practices costly and inconvenient and assessment results less useful for students, parents, and educators.
Fundamental Business Environment Changes
Why do I think we are experiencing a strategic inflection point? Evidence of business environment changes driving an inflection point is often qualitative rather than quantitative– hunches, suppositions, conjectures. My hunch is one, but not the only, impetus driving change in the business environment for assessment companies is the creation of complex educational standards like Common Core Mathematics and NGSS. Ellen Forte has quipped the NGSS are the canary in the coal mine providing an early warning of the changes. Under the NGSS, what US students should know and be able to do has been described as complex and three-dimensional—a radical change from the unidimensional ability model driving conventional assessment development and the business models built around it.
Another impetus driving change in the business environment for assessment companies is the nonlinear growth in the cost of developing tasks aligned to these complex, multidimensional standards using conventional assessment development practice. My back-of-the-napkin calculation is a single NGSS aligned, phenomenon-driven science task costs $20,000. And if this is a large-scale assessment, states and districts have the cost and inconvenience of alignment and standard setting studies on top of the cost per task. This cost and inconvenience borne by the assessment customer is an unsustainable business model.
Unsustainable Assessment Development Practices
Why has conventional assessment development become unsustainably costly and inconvenient for states and districts? Assessment development involves a chain of activities called a value chain. For conventional assessment development, the first link in this chain is typically reviewing the domain standards (e.g., NGSS) and the last link is delivering score reports to students, parents, teachers, and administrators. Currently, assessment companies create customer value by harnessing teachers to move through these activities. For example, teachers in item writing workshops provide items eliciting students’ use of targeted knowledge and skills. Teachers in an alignment study provide an evaluation of how well the content represents the standards. Teachers in a standard setting study provide cut scores showing how much learning is needed to reach “proficient.”
But complex, multidimensional standards like Common Core Mathematics and NGSS have changed some of the activities in the conventional assessment development value chain from adding value to eroding value for states and districts. As already noted, a single NGSS aligned science task costs states or districts $20,000. Most of this cost is due to churn from frustrated teachers struggling to write complex, three-dimensional tasks using testing company practices developed for multiple-choice items. Alignment studies are also more costly and time consuming as teachers struggle with two- and three-dimensional alignment. Standard setting studies are not only costly and time consuming but also often accompanied by unwelcome drama.
Reimagined Assessment Development Activities
A handful of assessment experts, including Dan Lewis, Christy Schneider, and the team at Planful Learning and Assessment, are working to cut assessment costs, increase assessment quality, and improve convenience for states and districts by reimagining the conventional value chain link-by-link. Like the conventional value chain, this new value chain starts with identifying the targeted standards. But that’s where similarity ends and innovation begins.
- Before writing items, the targeted standards are unpacked using the evidence from the learning sciences and the wisdom of teachers to better define how knowledge and skills become more sophisticated with learning, to identify the content features driving item difficulty, and the performance features serving as evidence of learning.
- Using this information, range achievement level descriptors are developed for the targeted standards to support assessment development and aid eventual interpretation of assessment results.
- Not until this preparation is complete are the items, tasks, simulations, games, or small group activities developed, supported by reusable templates of the content and performance features, and then tried out.
- Next, embedded standard setting is used to estimate cut scores without the expensive, drama-filled standard setting meeting. Results of misfitting items loop back to continuously improve understanding of how knowledge and skills become more sophisticated, content features drive item difficulty, and the performance features serve as learning evidence.
- Reports are then delivered to students, parents, teachers, and other educators supported by range achievement level descriptors.
- Finally, results from assessment administration loop back to assessment development, using techniques like item difficulty modeling, to continuously improve assessment quality.
Across the new value chain, technology serves as a facilitator to enable efficiency, quality, and lighten the burden on teachers and administrators. Automated web searches using unpacking results harvest large numbers of aligned phenomena and other real-life, engaging examples to use in tasks, simulations, games, or small group activities. User-friendly interfaces and input devices support collecting new kinds of performance serving as more authentic and insightful evidence of learning. Machine learning algorithms ingest embedded standard setting and assessment administration results to drive continuous improvement loops.
How does this new value chain cut assessment costs, increase assessment quality, and improve convenience for states and districts? To cut assessment costs, reusable templates and automated web searches create efficiencies only possible because of the concrete, reusable, and sharable content and performance features unpacked before items, tasks, simulations, games, or small group activities are developed. To increase quality, unpacking and range achievement level descriptors create a stronger, inspectable, and improvable string of evidence between assessment development and eventual interpretation and use of assessment results. For state and district convenience, concrete content and performance features reduce teacher frustration and embedded standard setting decouples cut scores from drama-filled meetings.
In my next blog I will explain an important link in this new value chain—unpacking targeted standards using the evidence from the learning sciences and the wisdom of teachers to better define how knowledge and skills become more sophisticated with learning, to identify the content features driving assessment difficulty, and the performance features serving as evidence of learning. How does that support automated web searches harvesting large numbers of standards-aligned phenomena? How does that lead to reusable templates reducing teachers churn and cutting assessment cost? How does that improve assessment quality by creating a stronger, more convincing validity argument? Check back in two weeks or connect with me to talk about implementing this new value chain.