Effectively Developing a Validity Argument

woman in white long sleeve shirt sitting beside woman in gray long sleeve shirt

In this second blog in the series, we explain how to develop and represent a validity argument. We use a validity argument for the interpretation of scores from the AP History test to illustrate using graphs and outlines to represent a validity argument and assume teachers are the intended audience. This blog is a bit longer than our target of 1000 words but please stick with us.

The legal literature (Anderson, Schum, & Twining, 2005; Schum, 1994) describes two forms for representing the chain of interim and penultimate claims that constitute a validity argument: a graph and an outline. Each form is a distinct but complementary approach to recording and organizing claims, the evidence supporting the claims, and the relationships amongst the claims and evidence.  Each form has advantages and disadvantages for representing and evaluating a validity argument. Both the graph and outline lay the foundation for effectively communicating a validity argument, but neither form is a strong approach by itself for communicating a validity argument. They are the framework for the additional communication methods that we’ll be talking about next week.

Graphs

Usually when we think about a graph, we think about a visual display of data. In this case, a graph is a picture of the network of claims beginning with the definition of the intended target of assessment and concluding with the interpretation and use of assessment results for its given purpose. A graph represents the tangled hierarchy of relationships amongst the claims.  Both the macrostructure and the microstructure of the validity argument—which we will discuss below—can be represented in a graph. 

Macrostructure

The graph shown in Figure 1 presents the macrostructure of the validity argument for the interpretation and use of scores from the AP World History Exam.  The macrostructure graph focuses on the claims that constitute the validity argument.  At the bottom or foundation of the graph is the problem statement. For our example, the problem statement is educators need to evaluate if schools are preparing high school seniors to be proficient in applying historical thinking skills to historical content.  The problem statement is taken as given and not open to challenge.  At the top or culmination of the graph is the ultimate claim. For our example, the ultimate claim is student assessment results classified as proficient represent students’ best performance in proficient-level knowledge and skills in applying historical thinking skills to historical content learned during school instruction

The ultimate claim is supported by two penultimate claims that form the next link in the chain of reasoning from the problem statement to the ultimate claim. Each penultimate claim can be further decomposed into interim claims. Note that the earliest claims at the foundation of this chain are assumed to be factual.

Figure 1. A graph of the macrostructure for the validity argument supporting the interpretation and use of scores from the AP World History test.

In a graph, claims ideally proceed from the problem statement to the ultimate claim in chronological order of assessment development. As claims proceed from the problem statement to the ultimate claim, the preceding claim serves as the data or premise for the following claim. Because validity arguments are practical arguments, confidence that the preceding claim warrants or justifies the following claim is based on rationales and evidence that are more-or-less convincing for a given audience. 

The graph of only the macrostructure of the validity argument has the following advantages compared to an outline:

  • Better capture the macrostructure and microstructure of an argument; and,
  • More strongly support identifying potentially key claims that can be used to support or challenge a validity argument. The potential of penultimate or interim claims to support or challenge a validity argument depends in part on their relevance for a particular audience, (i.e., teachers or psychometricians).

Microstructure

A graph of the microstructure of the validity argument for the interpretation and use of scores from the AP World History test is shown in Figure 2. A graph of the macrostructure of a validity argument can be expanded to include the micro-arguments and represent the microstructure of the validity argument by inserting the following elements: warrant, qualifier, backing, and rebuttals. Please visit the Purdue Online Writing Lab for an explanation of these microstructure components. The rebuttals in Figure 2 are shown in red. In addition, two micro-arguments are circled in red. These two arguments are “magnets” for organizing a validity argument narrative for teachers. We will say more about that next week!

As you can imagine, documenting a validity argument at the microstructure level is arduous and time-consuming work. However, a graph that includes the microstructure of the validity argument has the following advantages:

  • Can be used to identify claims that enjoy strong backing or lack strong backing:
  • Can be used to identify opportunities for strong rebuttals; and,
  • Helps focus attack on the weakest point in an opponent’s argument or, more positively, building up support for a claim.

And remember, a graph of the microstructure of the validity argument is for internal use by the test development team not, in this form, for communicating to a stakeholder audience. 

Figure 2. Graph of the microstructure of the validity argument supporting the interpretation and use of scores from the AP World History test.

Outline

An outline is a method of organizing the claims and evidence. As shown in Figure 3, the title for the outline is the ultimate claim. Each of the penultimate claims is given as a heading in the outline. The order in which the penultimate claims are presented is arbitrary because each penultimate claim directly supports the ultimate claim. An outline of penultimate claims supporting the ultimate claim is given in Figure 3.

Student scores classified as proficient reflect customers’ shared understanding of proficient historical thinking skills for high school seniors.

I. Test takers are motivated to perform their best on the assessment.
II. Student-level results reflect status on historical thinking skills.
III. Cut scores identify student-level results that reflect status described in the PLDs.

 Figure 3. An outline for the validity argument supporting the interpretation of assessment performance level scores as proficiency in historical thinking skills for high school seniors.

The outline may be expanded by presenting the interim claims supporting each penultimate claim. The interim claims supporting the penultimate claim addressing student-level results is shown in Figure 4.  When presenting interim claims, the earliest or foundational interim claim is presented first and the list of interim claims progresses to the penultimate claim that directly supports the ultimate claim. Again, the interim claims may be presented in different order, with no single correct order, because the sequence of claims is rarely linear.  But interim claims are often presented in chronological order.

The strength of the outline method is its utility in organizing the evidence and arguments. The hierarchy of claims is explicit and the structure of the hierarchy is easily understood. The penultimate claims associated with the ultimate claim and the interim claims associated with each penultimate claim are arranged together. The disadvantage is the loss of information about the flow of justification from earlier claims to later claims. Confidence that earlier claims warrant or justify following claims is difficult to evaluate.

Student scores classified as proficient reflect customers’ shared understanding of proficient historical thinking skills for high school seniors.

I. Test takers are motivated to perform their best on the assessment.
II. Student-level results reflect status on historical thinking skills.
a. The measurement model aggregates evaluated responses for a student
score in a manner consonant with the standards.
b. Tasks/items elicit the nine skills that define the historical thinking skills
standards.
c. Rubrics/keys reflect performance differences that indicate status with
respect to historical thinking skills.
d. Historical thinking skills standards represent current best understanding of
learning and cognition in applying those skills.
e. A thorough review was completed of theory and research on learning and
cognition in historical thinking skills.
III. Cut scores identify student-level results that reflect status described in the PLDs.

Figure 4. The interim claims supporting the penultimate claim addressing student-level results.

Now you understand how to develop and represent a validity argument. How do you communicate that validity argument to an audience? Please check back on Tuesday, September 28, or sign up here, for the third in the series of six blogs. In the third blog, we present narratives as a tool to help the test developer communicate a validity argument to a nontechnical audience. See you then.—Michelle and Paul

2 thoughts on “Effectively Developing a Validity Argument

Leave a Reply

%d bloggers like this: