When teachers hear “authentic tests,” they often imagine “real-life” activities, like ordering coffee or making a hotel reservation. And yes, that’s part of it—but the truth is that authenticity goes deeper.
In
assessment theory, the “real-life” approach to authenticity focuses
mainly on face validity—that is, how believable or “real” the test looks
to teachers and learners. This is sometimes confused with content validity,
which refers to how well the test content represents the knowledge or skills
it’s meant to measure (Mosier, 1947).
For
example, if you want to test a learner’s skill in forming addition problems, a
test made up of all possible addition combinations would be “valid by
definition.”
In language
testing, however, things aren’t that simple. A test may look real but still
fail to measure what matters. That’s where construct validity comes
in—the idea that a test must measure the underlying ability (the
“construct”) it claims to assess.
๐ง 2. Construct Validity: Asking the
Hard Questions
Construct
validity forces us to ask tough but important questions:
- Does an “authentic” test
measure different skills than an “inauthentic” one?
- Do all test takers use the same
mental strategies when responding to test tasks?
- And can a single test truly
capture the same ability across different individuals—or even across time
for the same person?
Research
(Messick, 1988; Alderson, 1983) shows that test-takers don’t process tasks in
identical ways. Even the same person may approach a test differently from one
day to the next. That means that test authenticity affects construct
validity—because test performance depends on both the task and the individual’s
interaction with it.
Douglas and
Selinker (1985) introduced the idea of “discourse domains”—personal
communication patterns that each learner develops over time. A test is only
valid, they argued, when it taps into the discourse domains the learner already
uses. In other words, the test must speak the learner’s language world.
The fact is
that no two learners bring the same communicative background to a test—and that
makes designing valid tests both challenging and fascinating.
๐งฉ 3. The Role of Purpose: What Are We
Trying to Measure?
Not all
language tests need to measure everything about communication.
Sometimes, we’re only interested in one area—say, grammar, reading academic
texts, or professional writing. As Stevenson (1982) wisely noted, there’s no
single “correct” goal of testing. What matters is clarity of purpose and
alignment between the test and that purpose.
However, we
must be cautious: if we create tests that isolate only small parts of
communication (like grammar drills), we risk losing the authentic, integrated
nature of real language use. Authentic tasks tend to activate multiple
aspects of communicative competence—grammar, pragmatics, discourse, and
sociolinguistics—all working together.
In short:
if your goal is to assess communicative ability, your test must itself be
communicative and authentic.
๐ฃ️ 4. Comparing Two Approaches:
Real-Life (RL) vs. Interactional Ability (IA)
There are
two main approaches to defining authenticity and language ability:
Approach |
Focus |
Example Tests |
Strength |
Limitation |
Real-Life (RL) |
Emphasizes tasks that simulate real-world
contexts (e.g., interviews, travel conversations). |
ILR and ACTFL Oral Proficiency Interviews |
Easy to relate to real-life communication |
Tends to treat proficiency as one overall
ability |
Interactional Ability (IA) |
Emphasizes how individuals interact
with the task and the language context. |
Bachman & Palmer’s Oral Interview of
Communicative Proficiency |
Focuses on multiple components of ability |
More complex to design and score |
In the RL
approach, language proficiency is treated as a single, global skill—a
person is “intermediate,” “advanced,” etc., overall. In the IA approach,
proficiency is viewed as multi-componential, involving grammatical,
pragmatic, and sociolinguistic competences (Bachman & Palmer, 1982a). Each
can be measured and reported separately.
๐งฉ 5. Why These Differences Matter for
Teachers
These
distinctions may seem abstract, but they have very practical implications for
bilingual teachers designing assessments.
- If you view language as one
unified skill (RL), your test will focus on global performance and real-world
contexts.
- If you view language as
multiple interacting abilities (IA), your test will include tasks that target
different components—grammar accuracy, pragmatic fluency, sociolinguistic
awareness, etc.—and score them separately.
Neither
approach is “better.” What matters is the alignment between your teaching
goals and your test design. If your goal is to measure students’ full
communicative competence, then your test should involve authentic, interactive
tasks that mirror genuine communication.
And the
fact is that every test is a balancing act between authenticity, practicality,
and purpose.
๐ 6. What This Means for Classroom
Practice
When
bilingual teachers design tests:
- Clarify the construct — What exact skill or ability
do you want to measure?
- Decide the degree of
authenticity —
Should tasks simulate real-life interactions or focus on specific
sub-skills?
- Ensure construct validity — Make sure tasks truly engage
the intended ability, not something else (like test-taking tricks).
- Use multiple measures — Combine global and analytic
scoring when possible.
- Reflect and validate — Regularly review if your
test results match what you observe in learners’ language use.
๐งญ Final Reflection
Authenticity
and validity are not abstract testing terms—they are ethical commitments.
They remind us that every assessment should respect how real people use real
language in real contexts. When we design authentic assessments, we don’t just
test language—we honour communication as a living, human act.
๐ References
Alderson,
J. C. (1983). The effect on test method on test performance: Theory and
practice. In J. W. Oller (Ed.), Issues in language testing research
(pp. 67–92). Newbury House.
Bachman, L.
F. (1988). Problems in examining the validity of the ACTFL oral proficiency
interview. Studies in Second Language Acquisition, 10(2), 149–164.
Bachman, L.
F., & Palmer, A. S. (1983a). The construct validation of tests of
communicative competence. Language Testing, 1(1), 1–20.
Douglas,
D., & Selinker, L. (1985). Principles for language tests within the
“discourse domains” theory of interlanguage. Language Testing, 2(3),
205–226.
Messick, S.
(1988). The once and future issues of validity: Assessing the meaning and
consequences of measurement. Educational Measurement: Issues and
Practice, 7(4), 5–20.
Mosier, C.
I. (1947). A critical examination of the concepts of face validity. Educational
and Psychological Measurement, 7(2), 191–205.
Stevenson,
D. K. (1982). Communicative testing and the foreign language learner. Canadian
Modern Language Review, 38(2), 284–292.
No comments:
Post a Comment