1. Why Tests Exist: Decisions Drive Assessment
The truth
is that every test — no matter how complex or simple — exists to support
decisions. In language education, those decisions help us understand who
our students are, what they can do, and how well our programs are
working.
As Bachman
(1981) notes, decisions in assessment fall into two broad categories:
- Micro-evaluation: Decisions about individuals
— such as students or teachers.
- Macro-evaluation: Decisions about programs
— such as course design, teaching methods, or institutional effectiveness.
Understanding
this distinction is essential for bilingual teachers who want to create fair,
valid, and actionable assessments. After all, the goal of a good test isn’t
just to measure — it’s to inform, guide, and improve both teaching and
learning.
2.
Decisions About Students (Micro-Evaluation)
a. Selection
and Readiness
Sometimes
we need to decide who can enter a program or who is ready to
start a new stage of learning. These decisions are supported by entrance
or readiness tests.
For
example, a university may require a language proficiency test such as
the TOEFL to determine whether a student has the linguistic ability to
succeed in an academic environment. In other cases, a readiness test helps
teachers identify if learners possess the foundational skills necessary for
effective instruction.
In simple
terms, selection tests answer questions like: “Is this learner ready to benefit
from what comes next?”
These
decisions must always be fair, transparent, and based on valid
indicators — not just on intuition or a single score.
b. Placement
Once
learners have been accepted into a program, teachers often face a practical
question: “Which level is right for this student?”
That’s
where placement tests come in. These tests group learners homogeneously,
based on language ability, needs, or goals. They can be designed in two main
ways:
- Theory-based tests, which draw on models of language
proficiency (e.g., communicative competence).
- Curriculum-based tests, which are aligned with the specific
objectives of a course or syllabus.
If your
students come from diverse backgrounds, a theory-based placement test might be
more practical. If your syllabus is well-structured and sequenced, a
curriculum-based (criterion-referenced) test may serve better. In other words,
your test design depends on both your learners’ diversity and your
program’s structure.
Example: A language school might use a
multi-level test aligned with CEFR bands (A1–C2) to ensure students are placed
at the right level. Teachers then adjust instruction based on these results —
not to label learners, but to empower them with appropriate challenges.
c. Diagnosis
Diagnostic
testing is like a doctor’s check-up for language learning. It tells us where
students are strong and where they struggle, helping teachers tailor
their lessons.
Even
placement or readiness tests can have diagnostic value if we analyse response
patterns carefully. However, true diagnostic tests are designed specifically
to provide fine-grained information about language components — such as
grammar, pronunciation, or discourse organization.
Analogy: If readiness tests tell us whether
students can start running, diagnostic tests tell us which muscles need
training first.
Diagnostic
tests can be syllabus-based (focused on course content) or theory-based
(aligned with models of proficiency), depending on what kind of feedback you
need.
d. Progress
and Grading
In every
classroom, teachers need to know whether learning is happening — and at what
pace. That’s the purpose of progress and achievement tests, which
help guide both instruction and feedback.
This type
of testing supports formative evaluation — a process of ongoing
feedback that helps teachers adjust teaching methods and helps students
reflect on their learning (Nitko, 1988). For example:
- If students are moving too
slowly, they may lose motivation.
- If content moves too quickly,
comprehension suffers.
Short,
low-stakes quizzes or “unit tests” can be powerful tools for maintaining this
balance. These are often included in textbooks and are known as achievement
or mastery tests.
At the end
of a course, however, we often conduct summative evaluation, which
typically involves a combination of test results, class performance, and
teacher judgments. Such assessments should always align with the course
objectives, not with abstract theories of language ability.
Key
Point: If a test
measures learning within a specific course, it’s an achievement test; if
it measures general ability across contexts, it’s a proficiency test.
3.
Decisions About Teachers
Teachers,
too, are part of the evaluation process. Hiring, training, and development
decisions often depend on proficiency and pedagogical competence.
A language
teacher’s communicative ability must go beyond general proficiency. It
includes:
- Metalinguistic awareness — talking about language
clearly.
- Pedagogical communication
skills —
explaining, correcting, and modelling effectively.
- Instructional flexibility — adapting to different
learner levels and cultural backgrounds.
Therefore,
the same test used for students may not be valid for evaluating teachers.
Teachers’ assessments must capture both linguistic control and instructional
expertise (Bachman, 1981).
4.
Decisions About Programs (Macro-Evaluation)
Language
tests can also reveal how well a program is working.
When we
assess a program in progress, we’re doing formative evaluation — using
results from achievement tests to make timely improvements. For example, if
students consistently perform poorly on a grammar component, the syllabus or
materials might need adjustment (Scriven, 1967; Millman, 1974).
In
contrast, summative evaluation looks at the program as a whole — often
comparing it to others. Here, achievement tests alone aren’t enough. We also
need proficiency measures that assess broader communicative outcomes
beyond classroom learning (Bachman, 1989).
Illustration: A course may succeed in teaching
students to perform classroom dialogues, but if they struggle in real-world
conversations, the program’s communicative goals are not truly met.
5.
Illustrative Examples: How Decisions Shape Test Design
Upshur
(1973) offered helpful visual models showing how decisions determine the role
of testing:
- Program 1: No testing — no feedback, no
accountability.
- Program 2: Testing only at the exit point
— good for measuring achievement, but weak in diagnosing learning
problems.
- Program 3: Testing before and after
instruction — helps with both selection and achievement but still lacks
flexibility for students who fail.
- Program 4: Multi-level program — combines
entrance, placement, progress, and remedial testing for a comprehensive,
adaptive approach.
Each step
shows how assessment supports decision-making. The more complex and
responsive the program, the more carefully designed the tests need to be.
6. Final
Reflection
In the end,
the number and type of tests you use depend on the number and type of
decisions you need to make.
If you
design an assessment instrument, always ask yourself:
- What decision will this test
inform?
- Who will benefit from the
information?
- How can I ensure it is fair,
valid, and reliable?
And the
fact is that — when used thoughtfully — assessment becomes not a judgmental
act, but a collaborative process of discovery between teacher, student,
and program.
📚 References
Bachman, L.
F. (1981). Fundamental considerations in language testing. Oxford
University Press.
Bachman, L.
F. (1989). The development and use of criterion-referenced tests of language
ability. Language Testing, 6(1), 1–32.
Millman, J.
(1974). Toward the development of criterion-referenced tests. Review of
Educational Research, 44(4), 463–473.
Nitko, A.
J. (1988). Educational assessment of students. Prentice Hall.
Scriven, M.
(1967). The methodology of evaluation. In R. W. Tyler, R. M. Gagné, & M.
Scriven (Eds.), Perspectives of curriculum evaluation (pp. 39–83). Rand
McNally.
Upshur, J.
(1973). The context for language testing. Newbury House.
No comments:
Post a Comment