Monday, 13 October 2025

🧩 Understanding the Difference Between Language Ability and Performance

 🌱 1. Why This Distinction Matters

The truth is that one of the biggest challenges in language testing is telling apart what a learner can do (their underlying ability) and what they do in each test (their performance).

This issue has been a core dilemma in the field for decades. As Carroll (1968) clearly put it, “we cannot test language competence directly; we can only observe it through performance.” In other words, every time a student speaks, writes, or listens during a test, we are seeing a glimpse of their ability, but never the whole picture.

Spolsky (1973) expanded on this idea, asking a question that still matters today: What does it really mean to know a language — and how can we make someone show that knowledge?

For teachers, this means that tests don’t measure knowledge directly. They measure how knowledge is revealed through language behaviour — and that behaviour can change depending on the context, the topic, the task, and even the student’s mood.

🎯 2. The Risk of Confusing Behaviour with Ability

The fact is that many test designers (and sometimes teachers) mistake performance for ability. Upshur (1979) warned that when we interpret test results only as predictions of future behaviour — like saying, “this student will do well in real-life communication” — we risk overlooking what the test measures.

The problem is that behaviour is not the same as the underlying ability.

  • Behaviour is what we see (the student’s responses, their fluency, their pronunciation).
  • Ability is what we infer (their knowledge, strategies, control of grammar and vocabulary).

When we confuse the two, we limit our interpretation — and our test becomes less valid and less useful.

Messick (1981a) called this confusion the “operationist approach” — if what we observe is the construct we want to measure. Cronbach (1988) criticized this view too, arguing that tests should not be equated with the abilities they are meant to represent. Instead, we must look deeper — at the processes behind performance — to design better, fairer assessments.

🔍 3. Why “Direct Tests” Aren’t Always Direct

You might have heard that “direct tests” (like oral interviews or writing tasks) are automatically more valid because they show “real” language use. The truth is that this belief can be misleading.

Yes, a speaking test looks authentic — but as researchers like Cronbach (1988) remind us, appearance is not evidence. A direct test may show performance in a controlled situation, but it still doesn’t give full access to the person’s inner ability.

So, when we assess a student speaking about familiar topics, we are observing a small slice of their language world — one that depends heavily on test conditions. That’s why we say language tests are always indirect indicators of ability (Bachman & Palmer, 1996). What we see in a test task is a performance — what we need to infer from it is the ability behind it.

⚖️ 4. Why “Face Validity” Isn’t Enough

Many researchers — Carroll (1973), Lado (1975), Bachman (1988a), and others — have criticized the idea that a test is valid simply because it looks right. Stevenson (1985b) called this “the treacherous appearance of validity.” In other words, just because a test seems authentic doesn’t mean it measures what it claims to measure.

For bilingual teachers, this is crucial: a test that “feels communicative” isn’t automatically a good measure of communicative ability. We must go beyond face value and examine content relevance, construct validity, and evidence of reliability.

🌍 5. The Myth of “Real-Life” Authenticity

It’s tempting to think we can design a test that perfectly mirrors “real-life” communication. But language in real life is infinitely variable and context dependent. As Spolsky (1986) noted, every utterance depends on who’s speaking, to whom, where, why, and under what conditions.

Imagine designing a test for taxi drivers at an international airport (Bachman, 1990). You might think the language is simple — directions, prices, greetings. But those interactions involve bargaining, politeness strategies, cultural expectations, and situational adjustments. There’s no single “correct” sample of this real-life behavior that can represent all possibilities.

So, even when we aim for authenticity, we must accept that tests can only simulate, not replicate, real communication. The goal is representativeness, not perfect imitation.

🧭 6. What Teachers Can Do

Here’s the empowering takeaway: When designing your own language tests, you can create valid and meaningful assessments if you remember these principles:

  1. Define the construct clearly — what specific ability are you trying to measure?
  2. Design tasks that reflect that ability, not just the surface behaviour.
  3. Interpret results carefully — remember that a test performance is a sample, not a complete portrait.
  4. Support your interpretation with clear reasoning and evidence (e.g., through consistency, relevance, and alignment with your teaching goals).
  5. Avoid overreliance on “real-life appearance” — instead, ensure your tasks are relevant, fair, and connected to your learners’ context.

As Cronbach (1988) wisely summarized, we must look beyond the surface: “For understanding poor performance, for remedial purposes, for improving teaching methods, and for carving out more functional domains, process constructs are needed.”

In other words — test the process, not just the product.

📚 References

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford University Press.

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford University Press.

Carroll, J. B. (1968). The psychology of language testing. Cambridge University Press.

Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun (Eds.), Test validity (pp. 3–17). Lawrence Erlbaum.

Messick, S. (1981). Evidence and ethics in the evaluation of tests. Educational Researcher, 10(9), 9–20.

Spolsky, B. (1986). Language testing: Art or science? Language Testing, 3(2), 147–153.

Upshur, J. A. (1979). Functional language testing. Canadian Modern Language Review, 35(2), 233–246.

 

No comments:

Post a Comment

Understanding Test Impact and Washback in Language Education

  1. What Are “Impact” and “Washback”? When we talk about test impact or washback , we are referring to the ways that assessments influen...