Monday, 13 October 2025

🧩 Understanding the Types of Decisions in Language Assessment

 1. Why Tests Exist: Decisions Drive Assessment

The truth is that every test — no matter how complex or simple — exists to support decisions. In language education, those decisions help us understand who our students are, what they can do, and how well our programs are working.

As Bachman (1981) notes, decisions in assessment fall into two broad categories:

  • Micro-evaluation: Decisions about individuals — such as students or teachers.
  • Macro-evaluation: Decisions about programs — such as course design, teaching methods, or institutional effectiveness.

Understanding this distinction is essential for bilingual teachers who want to create fair, valid, and actionable assessments. After all, the goal of a good test isn’t just to measure — it’s to inform, guide, and improve both teaching and learning.

2. Decisions About Students (Micro-Evaluation)

a. Selection and Readiness

Sometimes we need to decide who can enter a program or who is ready to start a new stage of learning. These decisions are supported by entrance or readiness tests.

For example, a university may require a language proficiency test such as the TOEFL to determine whether a student has the linguistic ability to succeed in an academic environment. In other cases, a readiness test helps teachers identify if learners possess the foundational skills necessary for effective instruction.

In simple terms, selection tests answer questions like: “Is this learner ready to benefit from what comes next?”

These decisions must always be fair, transparent, and based on valid indicators — not just on intuition or a single score.

b. Placement

Once learners have been accepted into a program, teachers often face a practical question: “Which level is right for this student?”

That’s where placement tests come in. These tests group learners homogeneously, based on language ability, needs, or goals. They can be designed in two main ways:

  1. Theory-based tests, which draw on models of language proficiency (e.g., communicative competence).
  2. Curriculum-based tests, which are aligned with the specific objectives of a course or syllabus.

If your students come from diverse backgrounds, a theory-based placement test might be more practical. If your syllabus is well-structured and sequenced, a curriculum-based (criterion-referenced) test may serve better. In other words, your test design depends on both your learners’ diversity and your program’s structure.

Example: A language school might use a multi-level test aligned with CEFR bands (A1–C2) to ensure students are placed at the right level. Teachers then adjust instruction based on these results — not to label learners, but to empower them with appropriate challenges.

c. Diagnosis

Diagnostic testing is like a doctor’s check-up for language learning. It tells us where students are strong and where they struggle, helping teachers tailor their lessons.

Even placement or readiness tests can have diagnostic value if we analyse response patterns carefully. However, true diagnostic tests are designed specifically to provide fine-grained information about language components — such as grammar, pronunciation, or discourse organization.

Analogy: If readiness tests tell us whether students can start running, diagnostic tests tell us which muscles need training first.

Diagnostic tests can be syllabus-based (focused on course content) or theory-based (aligned with models of proficiency), depending on what kind of feedback you need.

d. Progress and Grading

In every classroom, teachers need to know whether learning is happening — and at what pace. That’s the purpose of progress and achievement tests, which help guide both instruction and feedback.

This type of testing supports formative evaluation — a process of ongoing feedback that helps teachers adjust teaching methods and helps students reflect on their learning (Nitko, 1988). For example:

  • If students are moving too slowly, they may lose motivation.
  • If content moves too quickly, comprehension suffers.

Short, low-stakes quizzes or “unit tests” can be powerful tools for maintaining this balance. These are often included in textbooks and are known as achievement or mastery tests.

At the end of a course, however, we often conduct summative evaluation, which typically involves a combination of test results, class performance, and teacher judgments. Such assessments should always align with the course objectives, not with abstract theories of language ability.

Key Point: If a test measures learning within a specific course, it’s an achievement test; if it measures general ability across contexts, it’s a proficiency test.

3. Decisions About Teachers

Teachers, too, are part of the evaluation process. Hiring, training, and development decisions often depend on proficiency and pedagogical competence.

A language teacher’s communicative ability must go beyond general proficiency. It includes:

  • Metalinguistic awareness — talking about language clearly.
  • Pedagogical communication skills — explaining, correcting, and modelling effectively.
  • Instructional flexibility — adapting to different learner levels and cultural backgrounds.

Therefore, the same test used for students may not be valid for evaluating teachers. Teachers’ assessments must capture both linguistic control and instructional expertise (Bachman, 1981).

4. Decisions About Programs (Macro-Evaluation)

Language tests can also reveal how well a program is working.

When we assess a program in progress, we’re doing formative evaluation — using results from achievement tests to make timely improvements. For example, if students consistently perform poorly on a grammar component, the syllabus or materials might need adjustment (Scriven, 1967; Millman, 1974).

In contrast, summative evaluation looks at the program as a whole — often comparing it to others. Here, achievement tests alone aren’t enough. We also need proficiency measures that assess broader communicative outcomes beyond classroom learning (Bachman, 1989).

Illustration: A course may succeed in teaching students to perform classroom dialogues, but if they struggle in real-world conversations, the program’s communicative goals are not truly met.

5. Illustrative Examples: How Decisions Shape Test Design

Upshur (1973) offered helpful visual models showing how decisions determine the role of testing:

  • Program 1: No testing — no feedback, no accountability.
  • Program 2: Testing only at the exit point — good for measuring achievement, but weak in diagnosing learning problems.
  • Program 3: Testing before and after instruction — helps with both selection and achievement but still lacks flexibility for students who fail.
  • Program 4: Multi-level program — combines entrance, placement, progress, and remedial testing for a comprehensive, adaptive approach.

Each step shows how assessment supports decision-making. The more complex and responsive the program, the more carefully designed the tests need to be.

6. Final Reflection

In the end, the number and type of tests you use depend on the number and type of decisions you need to make.

If you design an assessment instrument, always ask yourself:

  • What decision will this test inform?
  • Who will benefit from the information?
  • How can I ensure it is fair, valid, and reliable?

And the fact is that — when used thoughtfully — assessment becomes not a judgmental act, but a collaborative process of discovery between teacher, student, and program.

📚 References

Bachman, L. F. (1981). Fundamental considerations in language testing. Oxford University Press.

Bachman, L. F. (1989). The development and use of criterion-referenced tests of language ability. Language Testing, 6(1), 1–32.

Millman, J. (1974). Toward the development of criterion-referenced tests. Review of Educational Research, 44(4), 463–473.

Nitko, A. J. (1988). Educational assessment of students. Prentice Hall.

Scriven, M. (1967). The methodology of evaluation. In R. W. Tyler, R. M. Gagné, & M. Scriven (Eds.), Perspectives of curriculum evaluation (pp. 39–83). Rand McNally.

Upshur, J. (1973). The context for language testing. Newbury House.

 

No comments:

Post a Comment

Understanding Test Impact and Washback in Language Education

  1. What Are “Impact” and “Washback”? When we talk about test impact or washback , we are referring to the ways that assessments influen...