The Last Word—Making a Judgement About the Validity Argument

woman showing apple and bitten doughnut

If you have been following along over the last five blog posts, you may be experiencing some discomfort at the prospect of intentionally crafting the communication of a validity argument to support or challenge a particular score interpretation and use. Maybe you want to simply present the validity argument and let the chips fall where they may.

Can a truly neutral validity argument be constructed? This is an argument, after all. An argument is meant to persuade. In addition, validity arguments are typically constructed by the test developer, who has some skin in the game, rather than a disinterested observer.  In addition, validity arguments are constructed for some purpose. A good example is the construction of a validity argument by a state, often with help from a testing company, to pass federal peer review.

But what would a neutral argument look like? An argument neutral in both stance (i.e., supporting or challenging the intended score interpretation and use) and intended audience? The neutral argument would have to include all backing, warrants, rebuttals to each warrant, and claims relevant to all audiences. The stakeholder would then have to sit down, review all the backing, warrants, rebuttals to each warrant, and claims, identify what is important to them, and then make a judgement.  I am skeptical that a complete, neutral validity argument would ever be constructed. Even if such a thing was accomplished, I doubt a stakeholder would have the time and the means to make a well-informed judgment.

An alternative to attempting to construct a neutral validity argument is to offer both the supporting and challenging versions of the narratives, theses, and themes crafted for an audience to stakeholders who are members of that audience. The versions of these tools supporting the validity argument might be crafted by the test developer. The versions of these tools challenging the validity argument might be crafted by a third party with no interest in the outcome, or by a known test critic.  So long as the test developers and the test critics have done their jobs well, it is hard to see why we would want an approach that would require the ultimate decision maker, the stakeholder, to replicate, rather than assess, their work as a basis for judging the strength of the validity argument. 

For example, the test user might be presented with both the thesis supporting the intended sore interpretation and use and the thesis challenging the intended score interpretation and use.  These are shown side-by-side in Table 1. 

Table 1. The side-by-side comparison of the theses supporting and challenging the intended score interpretation and use.

Supporting ThesisChallenging Thesis
Students who achieve a Proficient level of applying historical thinking skills to historical content deserve to be awarded college credit because the Proficient level has been established using cut scores developed by experienced AP World History teachers and applied to student performance on items written by experienced AP World History teachers. Experienced AP World History teachers understand the historical thinking skills used by students to respond to historical content. Teachers’ insight into student thinking and problem solving is gained from years in the classroom.Students who achieve a Proficient level of applying historical thinking skills to historical content might not deserve to be awarded college credit because many items that appeared on AP World History exams had been written by teachers who had been out of the classroom for years and might not understand the historical thinking skills and historical content currently taught to students. Furthermore, some experienced AP World History teachers on the standard setting committee disagreed with the student level of performance on these items, the cut scores, used to establish the Proficient level.

Final Thoughts

As you read the blogs in this validity series, you may be uncomfortable with the obvious appeal to values. The tools presented in this blog series might feel like a marketing campaign.  The differences between these tools and a marketing campaign is the requirement that all narratives, theses, and themes be based in evidence and both sides of the argument be presented. The tools borrowed from the legal field help practitioners in educational measurement understand and make transparent and public the individual and cultural values underpinning stakeholders’ validity decisions.

You had better pay attention to the values of your audience because that audience is the arbiter of validity. Stakeholders will evaluate the validity arguments, both supporting and skeptical of the validity of the intended score interpretation and use, through the lens of their own values. There are as many decisions with regard to validity as there are stakeholders evaluating the validity argument.  There is no such thing as a valid test score, only a degree of support for a score interpretation and use, and there is no such thing as a strong validity argument, only a validity argument found more or less persuasive by a given stakeholder group.   

Thanks for following along.  We hope you enjoyed this series of blog posts on validity and maybe view validity differently now—Michelle and Paul

Leave a Reply

%d bloggers like this: