Monday, 13 October 2025

🌍 Understanding How Language Tests Are Classified

 1. Why Classifying Tests Matters

The truth is that, as bilingual teachers, we often find ourselves asking:

“What kind of test should I use for this group of students?”

“Would a proficiency test be better than a diagnostic one?”

And the fact is that these are not easy questions — they’re like comparing apples and oranges. Both have value, but their purpose, structure, and interpretation are different. To make sound assessment choices, we need to understand what makes one test distinct from another.

According to Bachman (1990), language tests can be classified by five key features:

  1. Their intended use or purpose,
  2. The content they measure,
  3. The frame of reference used for interpreting results,
  4. The scoring procedure, and
  5. The testing method or technique employed.

Let’s explore each one in simple, teacher-friendly terms.

2. Intended Use: The “Why” Behind a Test

Every test begins with a purpose — a clear reason it exists. In educational contexts, language tests help us make decisions about learners:

  • Selection, entrance, or readiness tests → used for admission decisions.
  • Placement and diagnostic tests → used to identify levels and learning needs.
  • Progress, achievement, and mastery tests → used to check how well students are learning and meeting objectives.

For example, if you want to know which students are ready to move to the next level, a progress or achievement test is ideal. But if your goal is to find out which areas they still struggle with, a diagnostic test will be more useful.

The truth is that one test can sometimes serve more than one purpose — but whenever we do this, we must make sure that each use has been proven valid for that specific decision (Fulcher & Davidson, 2007).

3. Content: What the Test Is Based On

The content of a language test tells us what knowledge or skills are being measured. Two main approaches guide this:

  1. Theory-based (Proficiency) Tests
    • Built on a theory of language ability, such as grammatical or communicative competence.
    • Example: TOEFL measures general proficiency regardless of the course a learner has taken.
  2. Syllabus-based (Achievement) Tests
    • Designed around the specific content of a course or curriculum.
    • Example: A final exam that evaluates what was taught in your English 4B syllabus.

The distinction matters because the theory behind the test determines what kind of language ability it captures. For instance, a proficiency test built on grammar knowledge may look like a grammar-based achievement test — but it will differ from one that evaluates functional communication (Bachman, 1990).

There are also language aptitude tests, such as the Modern Language Aptitude Test (Carroll & Sapon, 1959), which measure a person’s potential to learn languages, not their current proficiency.

4. Frame of Reference: How Results Are Interpreted

Once we have test scores, we must decide what those numbers mean. There are two main frames of reference:

A. Norm-Referenced (NR) Tests

  • Compare one learner’s performance to others.
  • Scores are interpreted in relation to a “norm group.”
  • Example: A TOEFL score of 578 is one standard deviation above the mean of 512 — this means the test taker performed better than about 84% of the reference group.

This type of interpretation is useful when you want to rank students or make comparative decisions, such as scholarship selection.

B. Criterion-Referenced (CR) Tests

  • Compare a learner’s performance to a predefined standard or level of mastery.
  • Example: A writing test might require 85% accuracy in task achievement to be considered “proficient.”
  • All students who meet that level receive an “A,” regardless of how others perform.

The beauty of CR tests is that they align with learning outcomes — they tell you what the learner can do, not how they compare to others (Glaser, 1963; Nitko, 1984).

In real classrooms, many assessments combine both perspectives. You might use CR criteria for grading but still compare averages across classes for institutional reporting.

5. Scoring Procedure: How We Judge Performance

Scoring can be objective or subjective — and the difference lies in who makes the judgment.

  • Objective scoring: There’s a single correct answer. No judgment is needed. Examples: multiple-choice, gap-fill, dictation.
  • Subjective scoring: The scorer evaluates performance based on set criteria. Examples: essays, oral interviews, or speaking tasks.

Even so, all tests involve human judgment at some stage — from design to marking. What matters most is transparency and consistency in how criteria are applied (Pilliner, 1968).

6. Testing Method: The Format and Task Type

Finally, tests differ in their method — the way tasks are presented, and responses are elicited.

Some are performance-based, like oral interviews or writing tasks, which mirror real-life language use. Others use selected-response formats, such as multiple choice, where learners choose the right answer among options.

Each method serves a purpose: performance tasks test authentic communication, while selected-response items allow efficient scoring and wide coverage of skills (Jones, 1985; Wesche, 1985).

As teachers, our goal is to select or design tasks that reflect the real-world language use we want to assess — and ensure that our chosen method truly measures what we intend it to.

7. Bringing It All Together

In the end, choosing the right type of test isn’t about picking the “best” label — it’s about aligning purpose, content, interpretation, scoring, and method so that your assessment truly serves learning.

The fact is that good testing is good teaching. When we design assessments with validity, fairness, and clarity, we not only measure performance — we nurture growth.

πŸ“š References

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford University Press.

Carroll, J. B., & Sapon, S. M. (1959). Modern Language Aptitude Test (MLAT). The Psychological Corporation.

Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge.

Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18(8), 519–521.

Nitko, A. J. (1984). Educational assessment of students. Merrill Publishing.

Pilliner, A. E. G. (1968). Test theory: The essential issues. Educational Review, 21(1), 20–27.

Wesche, M. (1985). Performance testing in second language learning. Language Testing, 2(1), 41–57.

 

No comments:

Post a Comment

🌍 Designing Fair and Valid Language Assessments: Weighting, Item Order, and Time Constraints

  1. Understanding Weighting: Balancing What Matters When we talk about weighting in language testing, we’re really talking about how muc...