Understanding the Human Side of Validity
The truth
is that a language test is never just a set of questions — it’s a decision-making
tool that carries real consequences for real people. When we talk about validity,
we often think of the scientific side: statistics, reliability coefficients, or
evidence that a test measures what it claims to measure. These are essential —
they form the technical backbone of test design (Messick, 1989).
But the
fact is that validity also has an ethical dimension. Every test operates
within a social, cultural, and political ecosystem, not in a sterile
lab. As Cronbach (1984) put it, testing has always been “an impartial way to
perform a political function — that of determining who gets what.” In other
words, tests distribute opportunities. They open some doors and close others.
So, when we
design or use a language test — whether it’s for student placement,
certification, or university admission — we are, consciously or not,
participating in a system of social policy.
Validity
Beyond Numbers: The Consequences of Testing
Samuel
Messick (1989) reminded us that validity is not only about whether test scores accurately
represent ability but also about whether their uses and consequences are
justifiable. This means we must ask:
- What happens to test-takers
because of this test?
- Are all groups treated fairly,
regardless of language background, culture, or socioeconomic status?
- What values are we reinforcing
through our assessments?
For
instance, if a teacher certification test emphasizes grammatical precision over
communicative effectiveness, it might favour candidates with formal academic
training but disadvantage highly communicative bilingual teachers who excel in
real-world interaction. That’s not just a design issue — it’s an ethical
issue.
The Four
Ethical Dimensions of Valid Test Use
Messick
(1980, 1988) and later scholars (Fulcher & Davidson, 2007; O’Sullivan &
Weir, 2011) proposed four areas we must consider ensuring test validity is not
only scientific but also ethical and contextually appropriate:
1. Construct
Validity — Are We Measuring What Matters?
Before
using a test, we must confirm that it truly measures the ability it claims to
measure. For example, if we use an oral interview to assess communicative
competence, are the ratings consistent and truly reflective of interactional
skills? Do tasks allow teachers or students to demonstrate real conversational
ability, or are they constrained by unnatural test formats?
Teachers
can think of this like checking whether the “window” we use to look at language
ability is clean and accurately focused — not distorted by cultural bias or
irrelevant factors.
2. Value
Systems — Whose Values Shape the Test?
Every
assessment reflects someone’s values — those of developers, users, or
institutions. A test emphasizing grammatical accuracy reflects one worldview; a
test focusing on negotiation of meaning reflects another.
The truth
is that no test is value-free. When designing or selecting an assessment,
teachers should ask: Do the values behind this test align with what I
believe language learning is about?
For
example, if your students value oral fluency and interaction, a test that
rewards only written accuracy may feel disconnected and demotivating. Aligning
the test’s design with shared classroom values fosters fairness and
transparency.
3. Practical
Usefulness — Is the Test Fit for Purpose?
A valid
test must be useful — meaning the results support the decision they’re
meant to inform. Suppose a multiple-choice vocabulary test is used to hire
administrative assistants. Does it really reflect their ability to communicate
effectively, answer phone calls, or assist clients? Probably not.
As Messick
(1989) explained, predictive correlations aren’t enough. We need construct
relevance — evidence that the test measures something truly connected to
the tasks test-takers will perform in real contexts.
4. Consequences
— What Happens After the Test?
Every test
has ripple effects — what we call washback (Alderson & Wall, 1993).
A well-designed oral exam can motivate students to practice meaningful
communication. A poorly aligned grammar test, on the other hand, can narrow
instruction to rote drills and memorization.
Teachers
should actively anticipate both positive and negative consequences
before implementing a test. Ask:
- What teaching behaviours might
this test encourage?
- What student attitudes might it
shape?
- Are there alternative ways to
achieve the same goals without testing — such as portfolios,
self-assessment, or project-based evaluation?
Balancing
Fairness, Culture, and Context
Ethical
considerations are also deeply cultural. What counts as “fair” or “appropriate”
in one society might not hold the same meaning in another (McNamara & Ryan,
2011). In some contexts, collective values like harmony and respect for
authority may outweigh individual rights. In others, individual privacy and
consent are central.
That’s why,
as bilingual educators, we must adapt ethical principles to our specific
context — always balancing the rights of learners, the responsibilities of
institutions, and the values of society.
Taking
Responsibility as Test Developers and Users
Spolsky
(1981) once warned that tests should come with a label like “Use with care.” He
wasn’t joking. Just like medicine, a test can heal or harm, depending on how
it’s used.
Our
responsibility, then, is twofold:
- Provide solid evidence that our tests measure the
intended abilities fairly and meaningfully.
- Advocate for responsible use, ensuring that decisions based
on test results serve learners, not systems alone.
Testing is
not just a technical craft — it’s a moral and professional duty. Every
score represents a story, a student, a future.
π§© In Practice: Bringing Ethical
Validity into the Classroom
For
bilingual teachers, applying this perspective means:
- Designing assessments that
mirror authentic communication and classroom realities.
- Involving learners in
discussions about why they are being assessed and how
results will be used.
- Reviewing test items for
potential bias (e.g., unfamiliar cultural references).
- Reflecting on how your tests
influence students’ motivation, self-concept, and opportunities.
The goal is
simple but profound: to ensure that our assessments measure ability while
respecting humanity.
π§ Key Takeaway
Validity is
not only a statistical property — it’s a moral commitment. When
bilingual teachers design, develop, or apply tests, they become not just
evaluators, but guardians of fairness and learning integrity.
As
educators, the fact is that our greatest achievement is not just measuring what
students know — but ensuring that how we measure it uplifts, empowers, and
includes them.
π References
Alderson,
J. C., & Wall, D. (1993). Does washback exist? Applied
Linguistics, 14(2), 115–129.
Cronbach,
L. J. (1984). Essentials of Psychological Testing (4th ed.). Harper
& Row.
Fulcher,
G., & Davidson, F. (2007). Language Testing and Assessment: An Advanced
Resource Book. Routledge.
McNamara,
T., & Ryan, K. (2011). Fairness and Validation in Language Assessment:
Selected Papers from the 19th Language Testing Research Colloquium. Peter
Lang.
Messick, S.
(1980). Test validity and the ethics of assessment. American Psychologist,
35(11), 1012–1027.
Messick, S.
(1989). Validity. In R. L. Linn (Ed.), Educational Measurement
(3rd ed., pp. 13–103). Macmillan.
O’Sullivan,
B., & Weir, C. J. (2011). Language Testing and Validation: An
Evidence-Based Approach. Palgrave Macmillan.
Spolsky, B.
(1981). What does it mean to know how to use a language? Language
Testing, 1(1), 5–21.
No comments:
Post a Comment