Challenging the Validity Argument

dreamy ethnic businessman thinking about project

In this blog, we turn the tools to develop and communicate a validity argument upside down and explain how to develop, represent, and effectively communicate arguments skeptical of the intended score interpretation and use! In past blogs, we have given you tools to effectively develop and communicate a validity argument.  Those past blogs assumed you are intending to support the ultimate claim about a test score interpretation and use.  But we believe to be fair the decision maker, the test user, should have tools to support a skeptic in challenging the ultimate claim. A skeptic may use the same set of tools presented in the earlier blogs to challenge this ultimate claim of test score interpretation and use. Continue reading to learn how to use each of these tools—graph, narrative, thesis, and themes– to effectively challenge, for a target audience, a test score interpretation and use.

Graph

The graph is a useful tool in constructing the narrative, thesis, and themes to challenge a test score interpretation and use, as well as supporting them.  For challenging a test score interpretation and use, the graph is still useful in identifying micro-arguments an audience may find more-or-less relevant.  But for challenging a test score interpretation and use, the important micro-argument element is the rebuttal.  Rebuttals are shown in red text in the microstructure graph of the validity argument for the interpretation and use of scores from the AP World History test shown in Figure 1.

Figure 1. Graph of the microstructure of the validity argument highlighting rebuttals in the validity argument for the interpretation and use of scores from the AP World History test.

Narrative

Like a supporting narrative, a narrative skeptical of the intended score interpretation and use also wants to tell a compelling story, for the intended audience, but that challenges, rather than supports, the intended score interpretation and use.  The narrative challenging the score interpretation and use can be created by using the same graph of the microstructure of the validity argument used in earlier blogs to support the interpretation and use of scores. To raise skepticism about an intended score interpretation and use, only one link, or micro-argument, in this chain has to be challenged to cast doubt on the entire validity argument. But this challenge must be supported by the evidence.

Like a supporting narrative, a narrative skeptical of the intended score interpretation and use is created by organizing the claims and warrants from the microstructure graph into a series of micro-arguments (link to blog 3) in which the warrant is taken as true and used as the major premise, followed by the earlier claim used as the minor premise, finally followed by the later claim used as the conclusion. If both the major and minor premise are accepted as true, then the claim serving as the conclusion must be accepted.  For example,

  • Major premise: Teachers’ classroom experience develops deep pedagogical knowledge.
  • Minor premise: Teachers who had taught AP World History for at least five years were recruited as item writers for the AP World History Exam.
  • Conclusion: Therefore, teachers who wrote AP History items had deep pedagogical knowledge of historical thinking skills at the targeted grade band.

The micro-argument would be the following: Teachers’ classroom experience develops deep pedagogical knowledge. If item writers are teachers who had taught AP World History for at least five years, then teachers who wrote AP History items had deep pedagogical knowledge of historical thinking skills at the targeted grade band. If all the micro-arguments are convincing for an audience, then the audience would adopt the validity argument and accept the ultimate claim.

In our third blog of the series, we explained how to use a narrative as a tool to help communicate a validity argument supporting the intended score interpretation and use of scores from the AP World History test.  The narrative supporting the intended score interpretation and use was organized around two “magnetic” micro-arguments each addressing a different penultimate claim: a) Student scores reflect the level of of students’ historical thinking skills applied to historical content, and b) The proficient cut score identifies the student score level reflecting the level of historical thinking skills that deserve college credit. The validity argument could be defeated by successfully casting doubt on both or either one of these penultimate claims supporting the ultimate claim that Students classified as Proficient or above deserve to be awarded college credit.

The narrative component challenging the penultimate claim that, Student scores reflect level of students’ historical thinking skills applied to historical content, could be constructed by attempting to defeat the warrant that, Experienced teachers’ pedagogical knowledge provides insight into student thinking and problem solving.  The rebuttal that, Many teachers who wrote AP items had been out of the classroom for years, challenges the applicability of the warrant to supporting the claim that, Teachers who wrote AP items understood how to manipulate content to elicit knowledge and skills at the targeted complexity level.

The narrative component could then incorporate the rebuttal in the following manner:

Even though teachers with at least five years of experience in the classroom in the past were recruited as item writers for the AP History Exam, many item writers had not set foot in an AP History class in years.  These teachers who had been out of the classroom for years lacked the deep pedagogical knowledge of and insight into student thinking and problem solving to write items that elicited test takers’ historical thinking skills. Students’ scores based on responses to these items might not reflect level of historical thinking skills applied to historical content. 

The narrative component challenging the second topic, the intended interpretation of performance levels, might read like this:

The standard setting committee that recommended the score students need to achieve to be awarded college credit consisted of teachers with years of experience teaching AP History.  But many of those teachers with deep pedagogical knowledge of and insight into student thinking and problem solving disagreed with the recommended Proficient level cut score on the AP World History Exam that would qualify students to be awarded college credit.

The overall narrative intended for teachers might read something like the following:

Even though teachers with at least five years of experience in the classroom in the past were recruited as item writers for the AP History Exam, many item writers had not set foot in an AP History class in years.  These teachers who had been out of the classroom for years lacked the deep pedagogical knowledge of and insight into student thinking and problem solving to write items that elicited test takers’ historical thinking skills. Students’ scores based on responses to these items might not reflect level of historical thinking skills applied to historical content. 

The standard setting committee that recommended the score students need to achieve to be awarded college credit consisted of teachers with years of experience teaching AP History.  But many of those teachers with deep pedagogical knowledge of and insight into student thinking and problem solving disagreed with the recommended level of performance on those kinds of items, making up the AP World History Exam, that would qualify students to be labeled Proficient and awarded college credit.

Students who performed well enough on the AP History exam to score at or above the Proficient score might not deserve to be awarded college credit.

Thesis

A narrative challenging the intended score interpretation and use may be converted into a thesis statement challenging the intended score interpretation and use. The intent is to challenge the ultimate claim by defeating the penultimate or interim claims the test developer believes are most relevant from the standpoint of the intended audience. A way to begin a challenging thesis statement, taking into account audience, is to restate the ultimate claim as the major premise of the syllogism followed by the subordinating conjunction “because.” For example, a thesis statement challenging the validity argument for the interpretation and use of results from the AP World History Exam might begin: Students who performed well enough on the AP History exam to score at or above the Proficient score might not deserve to be awarded college credit because… 

For example, a thesis statement intended for teachers and challenging the intended score interpretation and use of AP History Proficient level classifications might read as follows:

Students who achieve a Proficient level of applying historical thinking skills to historical content might not deserve to be awarded college credit because many items that appeared on AP History exams had been written by teachers who had been out of the classroom for years and might not understand the historical thinking skills and historical content currently taught to students. Furthermore, some experienced AP History teachers on the standard setting committee disagreed with the cut scores used to establish the Proficient level.

Themes

Themes serve to emphasize and reinforce certain parts of the validity argument. The test developer has made judgments about the relative plausibility of the narratives and theses that could be developed to reinforce or attack the ultimate conclusion.  With this background, the test developer must decide what themes can be established and used to reinforce the narrative and thesis. 

Reflection

Does this seem cynical? Remember the past harm done using test scores. This harm might have been avoided if a critic had been able to effectively challenge the test score use. The application of the tools presented in this blog to effectively challenge the claims motivating test score use may support balanced decision making and avoid some of the criticism of educational tests.

Please join us next week for the last in our blog series on validity when we examine how an informed assessment stakeholder might make a judgment about the validity for a particular score interpretation and use.

Leave a Reply

%d bloggers like this: