This is the first in a series of six blogs as a tutorial to help practitioners in educational measurement develop, represent, and effectively communicate validity arguments to stakeholder audiences. This first blog explains how to address stakeholder concerns when developing and communicating your validity argument.
Before planning how to develop and effectively communicate a validity argument to a stakeholder audience, you should identify the audience you intend to reach. This might seem obvious but is not so clear in the educational assessment literature. The assessment literature has paid little attention to the different values, conventions, and practices various stakeholders in the assessment system bring to evaluating validity arguments. Given that the validation process is usually led by psychometricians, validity arguments tend to reflect the kinds of evidence favored by psychometricians but not the types of evidence other stakeholders find compelling. This is further complicated as not all test uses require the same evidence; rather, evidence is context dependent.
Why do we believe different stakeholders find different types of validity evidence more-or-less compelling? We talk about two sources of evidence for our belief. First, we review the long tradition of practitioners from different academic fields relying on the conventions, practices, and values of their particular field to evaluate evidence and arguments. Second, we present the results from a study we completed examining the evidence six different stakeholder audiences—policymakers, teachers, lawyers, parents, students, and psychometricians—found most relevant under three different test uses.
Some of the early research on practitioners from different academic fields was done by Stephen Toulmin. Toulmin concluded that the arguments made by practitioners from different academic fields depended on the socially constructed authority of the field. This authority was expressed as the conventions, practices, and values of that field.
Using discourse analysis, Hyland (2009) described a similar phenomenon that occurs across different academic fields. Different fields have distinctive ways of asking questions, addressing a literature, criticizing ideas and presenting arguments. These differences across fields even influence the verbs selected to describe findings from the literature. “It turns out, in fact, that engineers show, philosophers argue, biologists find and linguists suggest” (Hyland, 2008).
But what about the people who are going to be interpreting and using assessment results? Do these different stakeholder audiences really bring different perspectives, practices, and values to evaluating validity arguments? Sam Messick thought they did. Messick (1981) said that different communities from which test users may be drawn bring different, but perhaps overlapping, conventions, practices and values to the evaluation of validity and can reach different conclusions.
We did our own study looking at the relevance of validity evidence across five different stakeholder groups (teachers, parents, psychometricians, policymakers, and lawyers) in each of three test use scenarios that vary in terms of stakes (high-stakes for students, low-stakes, and high-stakes for teachers). Stakeholders were asked to evaluate the relevance of the evidence from “Not at all” to “Very.” Survey questions were grouped into five types of evidence: content, response process, internal structure, relationship to other variables, and internal structure.
We found multiple differences across stakeholder group in what types of evidence is considered relevant. Most frequently the differences were between psychometricians, teachers, and/or parents. Figure 1 shows the results for the use of test scores for the assessment of college readiness of high school students. Keep these findings in mind for our subsequent blogs.
Note: “Not at all relevant” was coded 0, “Very relevant” was coded 4.
Figure 1. Results for the relevance of evidence for the use of test scores for the assessment of college readiness of high school students.
Some of the differences might be expected. Here are some examples:
- Psychometricians rated having item writers who were current or former educators as less relevant across all three scenarios whereas teachers rated them highly.
- Lawyers rated talk-aloud studies with students with disabilities and English learners higher than other groups.
- For all three scenarios, parents rated parent attitude surveys as more relevant than the other groups.
We also found interactions between stakeholder group, test use scenario, and what types of evidence is considered relevant. Here are some examples:
- Psychometricians rated having an item writer guide and internal item review as less relevant than teachers in the context of formative assessment or assessment used for teacher evaluation purposes.
- In two cases—evidence about the relationship to other assessments and student growth—psychometricians rated the evidence as less relevant compared to parents. In the other case, the ratings were flipped. Psychometricians rated evidence about prediction as more relevant than parents.
- Psychometricians also rated response time as less important compared to most other stakeholder groups for the college entrance and formative assessment scenarios.
But wait, at some level practitioners from different academic fields will agree on what are convincing arguments, right? Sure, evidence and rationales may be expressed in broad enough ways to be supported by practitioners across most academic fields. But we agree with Thomas Kuhn that when expressed in that broad way, the evidence and rationales are powerless to settle difficult or contentious arguments. When expressed more precisely, these evidence and rationales diverge into field-dependent conventions, practices, and values. As an example, most if not all stakeholders would agree assessments should be fair. But the evaluation of whether this particular assessment use is fair would quickly devolve into field-dependent conventions, practices, and values when teachers, parents, students, lawyers, and psychometricians are all in the same room.
We hope we have convinced you that different stakeholders find different types of validity evidence more-or-less compelling. If we have been successful, then you understand why you should identify the audience you intend to reach before you begin planning how to develop and effectively communicate a validity argument. In next week’s blog, we will explain how to develop and represent a validity argument for use by the test developer and internal test development team members.