Monday, 13 October 2025

๐ŸŒ Authenticity and Validity in Language Assessment

 When teachers hear “authentic tests,” they often imagine “real-life” activities, like ordering coffee or making a hotel reservation. And yes, that’s part of it—but the truth is that authenticity goes deeper.

In assessment theory, the “real-life” approach to authenticity focuses mainly on face validity—that is, how believable or “real” the test looks to teachers and learners. This is sometimes confused with content validity, which refers to how well the test content represents the knowledge or skills it’s meant to measure (Mosier, 1947).

For example, if you want to test a learner’s skill in forming addition problems, a test made up of all possible addition combinations would be “valid by definition.”

In language testing, however, things aren’t that simple. A test may look real but still fail to measure what matters. That’s where construct validity comes in—the idea that a test must measure the underlying ability (the “construct”) it claims to assess.

๐Ÿง  2. Construct Validity: Asking the Hard Questions

Construct validity forces us to ask tough but important questions:

  • Does an “authentic” test measure different skills than an “inauthentic” one?
  • Do all test takers use the same mental strategies when responding to test tasks?
  • And can a single test truly capture the same ability across different individuals—or even across time for the same person?

Research (Messick, 1988; Alderson, 1983) shows that test-takers don’t process tasks in identical ways. Even the same person may approach a test differently from one day to the next. That means that test authenticity affects construct validity—because test performance depends on both the task and the individual’s interaction with it.

Douglas and Selinker (1985) introduced the idea of “discourse domains”—personal communication patterns that each learner develops over time. A test is only valid, they argued, when it taps into the discourse domains the learner already uses. In other words, the test must speak the learner’s language world.

The fact is that no two learners bring the same communicative background to a test—and that makes designing valid tests both challenging and fascinating.

๐Ÿงฉ 3. The Role of Purpose: What Are We Trying to Measure?

Not all language tests need to measure everything about communication. Sometimes, we’re only interested in one area—say, grammar, reading academic texts, or professional writing. As Stevenson (1982) wisely noted, there’s no single “correct” goal of testing. What matters is clarity of purpose and alignment between the test and that purpose.

However, we must be cautious: if we create tests that isolate only small parts of communication (like grammar drills), we risk losing the authentic, integrated nature of real language use. Authentic tasks tend to activate multiple aspects of communicative competence—grammar, pragmatics, discourse, and sociolinguistics—all working together.

In short: if your goal is to assess communicative ability, your test must itself be communicative and authentic.

๐Ÿ—ฃ️ 4. Comparing Two Approaches: Real-Life (RL) vs. Interactional Ability (IA)

There are two main approaches to defining authenticity and language ability:

Approach

Focus

Example Tests

Strength

Limitation

Real-Life (RL)

Emphasizes tasks that simulate real-world contexts (e.g., interviews, travel conversations).

ILR and ACTFL Oral Proficiency Interviews

Easy to relate to real-life communication

Tends to treat proficiency as one overall ability

Interactional Ability (IA)

Emphasizes how individuals interact with the task and the language context.

Bachman & Palmer’s Oral Interview of Communicative Proficiency

Focuses on multiple components of ability

More complex to design and score

In the RL approach, language proficiency is treated as a single, global skill—a person is “intermediate,” “advanced,” etc., overall. In the IA approach, proficiency is viewed as multi-componential, involving grammatical, pragmatic, and sociolinguistic competences (Bachman & Palmer, 1982a). Each can be measured and reported separately.

๐Ÿงฉ 5. Why These Differences Matter for Teachers

These distinctions may seem abstract, but they have very practical implications for bilingual teachers designing assessments.

  • If you view language as one unified skill (RL), your test will focus on global performance and real-world contexts.
  • If you view language as multiple interacting abilities (IA), your test will include tasks that target different components—grammar accuracy, pragmatic fluency, sociolinguistic awareness, etc.—and score them separately.

Neither approach is “better.” What matters is the alignment between your teaching goals and your test design. If your goal is to measure students’ full communicative competence, then your test should involve authentic, interactive tasks that mirror genuine communication.

And the fact is that every test is a balancing act between authenticity, practicality, and purpose.

๐Ÿ“˜ 6. What This Means for Classroom Practice

When bilingual teachers design tests:

  1. Clarify the construct — What exact skill or ability do you want to measure?
  2. Decide the degree of authenticity — Should tasks simulate real-life interactions or focus on specific sub-skills?
  3. Ensure construct validity — Make sure tasks truly engage the intended ability, not something else (like test-taking tricks).
  4. Use multiple measures — Combine global and analytic scoring when possible.
  5. Reflect and validate — Regularly review if your test results match what you observe in learners’ language use.

๐Ÿงญ Final Reflection

Authenticity and validity are not abstract testing terms—they are ethical commitments. They remind us that every assessment should respect how real people use real language in real contexts. When we design authentic assessments, we don’t just test language—we honour communication as a living, human act.

๐Ÿ“š References

Alderson, J. C. (1983). The effect on test method on test performance: Theory and practice. In J. W. Oller (Ed.), Issues in language testing research (pp. 67–92). Newbury House.

Bachman, L. F. (1988). Problems in examining the validity of the ACTFL oral proficiency interview. Studies in Second Language Acquisition, 10(2), 149–164.

Bachman, L. F., & Palmer, A. S. (1983a). The construct validation of tests of communicative competence. Language Testing, 1(1), 1–20.

Douglas, D., & Selinker, L. (1985). Principles for language tests within the “discourse domains” theory of interlanguage. Language Testing, 2(3), 205–226.

Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. Educational Measurement: Issues and Practice, 7(4), 5–20.

Mosier, C. I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7(2), 191–205.

Stevenson, D. K. (1982). Communicative testing and the foreign language learner. Canadian Modern Language Review, 38(2), 284–292.

 

No comments:

Post a Comment

๐ŸŒ Designing Fair and Valid Language Assessments: Weighting, Item Order, and Time Constraints

  1. Understanding Weighting: Balancing What Matters When we talk about weighting in language testing, we’re really talking about how muc...