A Principled Response to Our Inflection Point

Are you wondering how to respond to what seems to be an inflection point for the assessment industry? Nearly 20 years ago, Sam Messick and Bob Mislevy and colleagues were wondering the same thing.  They were anticipating an inflection point in the assessment industry—fundamental changes in the business environment that challenged conventional test development.  The “principled” in the phrase “principled assessment design” is a reference to first-principles thinking.  They argued for applying first-principles thinking to test development practice as a response to the anticipated inflection point. As Mislevy, Steinberg, and Almond (2003) write: 

When unexamined standard operating procedures fall short, it is often worth the effort to return to first principles as we have done here—to formal analyses, explicit structures, and normative models. Initial applications of these ideas may be labor intensive and time consuming. However, practice will advance not just from presentations like this one but from working examples, reusable elements, and pieces of infrastructure from initial applications, all of which can be adapted to new projects.

But the arrival of inflection points is difficult to predict.  For example, Reed Hastings, the founder of Netflix, waited and waited for the inflection point for the video streaming model. As Hastings (quoted by Rita McGrath) explained, “In 1997, we said that 50% of the business would be from streaming by 2002. It was zero. In 2002, we said that 50% of the business would be from streaming by 2007. It was zero . . . Now streaming has exploded . . . We were waiting for all these years. Then we were in the right place at the right time.” As usual, Bob Mislevy and colleagues were ahead of their time. Their argument may have been premature in 2003 but I believe the argument is spot-on now. Evidence for an inflection point in the assessment industry surrounds us.

Principled Assessment Design

How is first-principles thinking applied to test development practice a response to an inflection point in the assessment industry? When reasoning from first principles, complex problems are decomposed into simpler elements and then reassembled completely from this foundation.  Reasoning from first principles allows assessment specialists to step outside of assumptions and customary practice and decide if conventional test development makes sense within this new environment.

Under principled assessment design, first-principles thinking is applied to decompose the evidentiary reasoning underlying the complex problem of supporting probabilistic inferences about complex constructs from incomplete evidence drawn from data difficult to separate and analyze. The Next Generation Science Standards are a perfect example of this complex problem. Principled assessment design decomposes this complex problem into the following five simpler elements:

  • A relatively fine-grained description of the knowledge and skills intended to be assessed and how those knowledge and skills become more sophisticated across learning.  This may be represented as a learning progression or some other model of learning and cognition. 
  • The features of content that have been identified as effectively eliciting the targeted knowledge and skills. These content features may be identified as either characteristic or variable. Characteristic content features are those that all content intended to elicit the targeted knowledge and skills should possess in some form because they are central to eliciting the knowledge and skills. Variable content features are features that can be manipulated to change the sophistication of the knowledge and skills the content is intended to elicit.
  • The features of learner performance or behavior that have been identified as evidence of status on the targeted knowledge and skills. As was the case for content features, features of learner performance may be identified as either characteristic or variable. Characteristic performance features are those kinds of performance that should always be present as evidence the targeted knowledge and skills have been elicited.  Variable performance features are those kinds of performance that change to reflect the sophistication of the knowledge and skills. 
  • Work product specifications and evidence rules describing how to capture the performance.  Work product specifications describe the structure and format of the task or items.  Writing, drawing, or response selection might be used to capture the performance.  Evidence rules describe how to code the work product as in the use of a key or rubric.
  • Finally, an evidentiary argument describing the reasoning from a) the captured and coded performance in the work product specifications, b) backward to the performance features in the context of the content features, and finally to c) an inference about the level of sophistication of the learner’s targeted knowledge and skills.

As prescribed under Principled Design for Efficacy, the description of the targeted knowledge and skills, content features, and performance features can often be found in the learning sciences literature associated with the content standard.  This can cut costs dramatically while maintaining the quality of Next Generation Science Standards and Common Core math performance tasks.  I will explain why I think this cuts costs in a coming blog.

Inflection Point Solution

But why so much work? Even facing an inflection point, why not just keep doing what we have been doing? Rebuilding conventional wisdom can be cumbersome and time consuming and people may be uncomfortable challenging dogma. As I argue elsewhere, conventional assessment design and development may be described as a kind of technology that routinizes decision making. This routinized approach, often labeled “best practice,” is characterized by the adoption of fixed design solutions and the insistence on using these for every assessment program. Under the past business environment, routinized assessment design and development allowed testing companies to save time and effort.

Adapting to change is an incredibly hard thing to do when it comes into conflict with the very thing that caused so much success. But today these routinized assessment design and development practices can cost assessment companies dearly.  Routine test design and development practices incorporate untested assumptions at a time of fundamental changes in the business environment for these companies.  These untested assumptions fail to consider changes in the role of assessment in education, the goals of assessment programs, the definition of constructs, and the changing needs of customers. The result is more costly and lower quality assessment of, for example, Next Generation Science and Common Core Mathematics Standards and annoyed state and district customers. Certainly, we need to revisit applying first-principles thinking to test development practice with the fundamental changes we are currently experiencing.

Visit us at Planful Learning and Assessment (https://planfullearning.com) to learn more about how to respond to the inflection point we are living through in the assessment industry.  We can help with principled assessment design and the Principled Design for Efficacy framework that honor the complex definition of the targeted constructs, recognize the systemic nature of assessment programs, and are responsive to the needs of district and state customers.

Leave a Reply

%d bloggers like this: