Validity as a Unified Concept
The truth
is that, for many years, teachers and researchers talked about different
types of validity — content validity, criterion-related validity, and
construct validity — as if they were separate entities. But the field has
changed. Today, validity is understood as a single, unified concept that
connects what a test measures, how it is used, and what
consequences arise from its use.
Samuel
Messick (1980, 1988) was a pioneer in this shift. He argued that it’s not
enough to show that a test measures what it claims to measure (construct
validity). We must also consider the values, ethics, and social
consequences of how test scores are interpreted and used. In other words, validity
isn’t just about evidence — it’s also about impact.
Messick’s
framework includes two dimensions:
- The source of justification – this refers to the type
of evidence we use (empirical or consequential).
- The function or outcome – this refers to whether we
are interpreting the test results or using them to make decisions.
When these
dimensions intersect, they create a matrix in which construct validity
appears in every cell — meaning it’s always essential, no matter the purpose.
Let’s make
this concrete: imagine you give an oral interview test to assess
speaking proficiency.
- To interpret a
candidate’s score, you need evidence that the test measures oral language
ability (construct validity) and that the way you interpret “proficiency”
aligns with fair educational values.
- To use that score — say,
to decide whether a teacher is employable — you must also justify the utility
of using that test for hiring and consider the social consequences
of such a decision.
And the
fact is that these consequences are never neutral. A decision to hire (or not
hire) a teacher based on test scores can have positive effects (e.g., ensuring
competent teachers) but also negative ones (e.g., unfairly limiting someone’s
career due to cultural or linguistic bias).
Messick’s
key message is simple but powerful: Validity is not just a technical
property of a test — it’s an ethical responsibility. (Messick, 1988;
Bachman & Palmer, 1996)
Understanding
and Detecting Test Bias
When we
speak of test bias, we’re dealing with fairness — whether a test gives
every test-taker an equal chance to demonstrate their true ability.
Even when a
test appears valid for a large group, it might be biased against
subgroups that differ in characteristics unrelated to the language ability
being tested (Nitko, 1983).
For
example, imagine a reading comprehension test where students with literature
backgrounds consistently outperform others.
- If the test is meant to measure
general reading ability, this might indicate bias — because the
content privileges students with specific prior knowledge.
- But if the goal is to measure literary
reading skills, then the test is fair — it’s just being used in a
context-specific way.
So, bias
isn’t always about difference — it’s about unfair difference. The
truth is that differences in performance don’t automatically prove bias; they
must be interpreted considering the test’s purpose.
Bias can
appear in many subtle forms — through:
- Content that reflects sexist, racist,
or culturally exclusive assumptions.
- Unequal predictive power (when test scores predict
success better for one group than another).
- Unfair testing conditions, such as intimidating settings
or culturally unfamiliar tasks (Nitko, 1983).
The Role
of Culture in Testing
Culture is
perhaps the most complex source of bias in language testing. As Duran (1989)
explained, people from non-dominant language backgrounds often bring different
cultural experiences and cognitive styles, which can shape how they
interpret test items or respond to tasks.
The
challenge for teachers and test developers, then, is not to deny that
cultural differences exist, but to understand and design around them.
For bilingual educators, this means choosing texts, topics, and testing
contexts that feel authentic and inclusive to all learners — not just those
from the “mainstream” background. (Duran, 1989; Chen & Henning, 1985;
Zeidner, 1986)
The
Influence of Background Knowledge
Another
subtle yet powerful source of bias is background knowledge — what
test-takers already know about the topic being tested.
Studies
have shown that students familiar with the topic of a reading or listening
passage often score higher, not because they know more English, but because
they understand the content better (Alderson & Urquhart, 1985; Hale, 1988).
This
creates an important distinction:
- If the goal is to measure general
language ability, then content familiarity should not affect
performance.
- But if the goal is to measure language-for-specific-purposes
(LSP) — for example, English for engineers — then background knowledge
becomes part of what’s being assessed.
In short,
we must always ask: What exactly is this test measuring? And the truth
is that this question determines whether content knowledge is a source of bias
or a legitimate part of the construct.
Cognitive
Characteristics and Fairness
Finally,
researchers have found that learners’ cognitive styles — like field
independence (how well one can separate details from background
information) or ambiguity tolerance (comfort with uncertainty) — can
affect language test performance (Brown, 1987).
While
evidence is still emerging, these findings remind us that human cognition is
not uniform, and our assessments must reflect this diversity. Future
research may reveal more about how personality traits, motivation, or emotional
factors interact with test performance — offering deeper insights into fairness
and validity.
Bringing
It All Together for Classroom Practice
For
bilingual teachers designing or adapting assessments, these principles
translate into practical guidelines:
- Always define what you want
to measure and check whether your test truly captures that construct.
- Consider how you interpret
and use scores — and reflect on who might be affected by those
interpretations.
- Review test content for cultural
inclusiveness and linguistic accessibility.
- Remember that validity and
fairness are human concerns — they live not in statistics alone but in
our choices as educators.
The truth
is that designing a fair and valid language assessment is not just about
psychometrics — it’s about empathy, awareness, and responsibility. Every test
is a mirror of our educational values.
📚 References
Alderson,
J. C., & Urquhart, A. H. (1985). Reading in a Foreign Language.
Longman.
Bachman, L.
F., & Palmer, A. S. (1996). Language Testing in Practice: Designing and
Developing Useful Language Tests. Oxford University Press.
Brown, H.
D. (1987). Principles of Language Learning and Teaching. Prentice Hall.
Chen, Z.,
& Henning, G. (1985). Item bias in language tests. Language Testing, 2(1),
1–15.
Duran, R.
P. (1989). Assessment and cultural bias in testing. Review of Educational
Research, 59(4), 573–594.
Messick, S.
(1980, 1988). The Meaning of Test Validity. Educational Testing Service.
Nitko, A.
J. (1983). Educational Tests and Measurement: An Introduction. Harcourt
Brace Jovanovich.
Zeidner, M.
(1986). Are English language aptitude tests culturally biased? Language
Testing, 3(1), 82–95.
No comments:
Post a Comment