1. Why Classifying Tests Matters
The truth
is that, as bilingual teachers, we often find ourselves asking:
“What kind
of test should I use for this group of students?”
“Would a
proficiency test be better than a diagnostic one?”
And the
fact is that these are not easy questions — they’re like comparing apples and
oranges. Both have value, but their purpose, structure, and interpretation are
different. To make sound assessment choices, we need to understand what makes
one test distinct from another.
According
to Bachman (1990), language tests can be classified by five key features:
- Their intended use or purpose,
- The content they measure,
- The frame of reference used for
interpreting results,
- The scoring procedure, and
- The testing method or technique
employed.
Let’s
explore each one in simple, teacher-friendly terms.
2.
Intended Use: The “Why” Behind a Test
Every test
begins with a purpose — a clear reason it exists. In educational contexts,
language tests help us make decisions about learners:
- Selection, entrance, or
readiness tests
→ used for admission decisions.
- Placement and diagnostic tests → used to identify levels
and learning needs.
- Progress, achievement, and
mastery tests
→ used to check how well students are learning and meeting
objectives.
For
example, if you want to know which students are ready to move to the next
level, a progress or achievement test is ideal. But if your goal is
to find out which areas they still struggle with, a diagnostic test
will be more useful.
The truth
is that one test can sometimes serve more than one purpose — but whenever we do
this, we must make sure that each use has been proven valid for that
specific decision (Fulcher & Davidson, 2007).
3.
Content: What the Test Is Based On
The content
of a language test tells us what knowledge or skills are being measured.
Two main approaches guide this:
- Theory-based (Proficiency)
Tests
- Built on a theory of
language ability, such as grammatical or communicative competence.
- Example: TOEFL measures
general proficiency regardless of the course a learner has taken.
- Syllabus-based (Achievement)
Tests
- Designed around the specific
content of a course or curriculum.
- Example: A final exam that
evaluates what was taught in your English 4B syllabus.
The
distinction matters because the theory behind the test determines what
kind of language ability it captures. For instance, a proficiency test built on
grammar knowledge may look like a grammar-based achievement test — but
it will differ from one that evaluates functional communication
(Bachman, 1990).
There are
also language aptitude tests, such as the Modern Language Aptitude
Test (Carroll & Sapon, 1959), which measure a person’s potential
to learn languages, not their current proficiency.
4. Frame
of Reference: How Results Are Interpreted
Once we
have test scores, we must decide what those numbers mean. There are two
main frames of reference:
A.
Norm-Referenced (NR) Tests
- Compare one learner’s
performance to others.
- Scores are interpreted in
relation to a “norm group.”
- Example: A TOEFL score of 578
is one standard deviation above the mean of 512 — this means the test
taker performed better than about 84% of the reference group.
This type
of interpretation is useful when you want to rank students or make comparative
decisions, such as scholarship selection.
B.
Criterion-Referenced (CR) Tests
- Compare a learner’s performance
to a predefined standard or level of mastery.
- Example: A writing test might
require 85% accuracy in task achievement to be considered “proficient.”
- All students who meet that
level receive an “A,” regardless of how others perform.
The beauty
of CR tests is that they align with learning outcomes — they tell you what
the learner can do, not how they compare to others (Glaser, 1963; Nitko,
1984).
In real
classrooms, many assessments combine both perspectives. You might use CR
criteria for grading but still compare averages across classes for
institutional reporting.
5.
Scoring Procedure: How We Judge Performance
Scoring can
be objective or subjective — and the difference lies in who
makes the judgment.
- Objective scoring: There’s a single correct
answer. No judgment is needed. Examples: multiple-choice, gap-fill,
dictation.
- Subjective scoring: The scorer evaluates
performance based on set criteria. Examples: essays, oral
interviews, or speaking tasks.
Even so,
all tests involve human judgment at some stage — from design to marking. What
matters most is transparency and consistency in how criteria are applied
(Pilliner, 1968).
6.
Testing Method: The Format and Task Type
Finally,
tests differ in their method — the way tasks are presented, and
responses are elicited.
Some are performance-based,
like oral interviews or writing tasks, which mirror real-life language use.
Others use selected-response formats, such as multiple choice, where
learners choose the right answer among options.
Each method
serves a purpose: performance tasks test authentic communication, while
selected-response items allow efficient scoring and wide coverage of
skills (Jones, 1985; Wesche, 1985).
As
teachers, our goal is to select or design tasks that reflect the real-world
language use we want to assess — and ensure that our chosen method truly
measures what we intend it to.
7.
Bringing It All Together
In the end,
choosing the right type of test isn’t about picking the “best” label — it’s
about aligning purpose, content, interpretation, scoring,
and method so that your assessment truly serves learning.
The fact is
that good testing is good teaching. When we design assessments with
validity, fairness, and clarity, we not only measure performance — we nurture
growth.
π References
Bachman, L.
F. (1990). Fundamental considerations in language testing. Oxford
University Press.
Carroll, J.
B., & Sapon, S. M. (1959). Modern Language Aptitude Test (MLAT). The
Psychological Corporation.
Fulcher,
G., & Davidson, F. (2007). Language testing and assessment: An advanced
resource book. Routledge.
Glaser, R.
(1963). Instructional technology and the measurement of learning outcomes. American
Psychologist, 18(8), 519–521.
Nitko, A.
J. (1984). Educational assessment of students. Merrill Publishing.
Pilliner,
A. E. G. (1968). Test theory: The essential issues. Educational Review, 21(1),
20–27.
Wesche, M.
(1985). Performance testing in second language learning. Language Testing, 2(1),
41–57.
No comments:
Post a Comment