I concluded my speaking tests at October’s end. For one week I asked nearly 450 students the same questions.
“What is this number?”
“What is a difficulty you have overcome?”
“What if you had 1,000,000 won?”
Some students provided pure comedy.
Ian: I am in your home. What must I do?
Student 1: You must clean my room and make me food.
Student 2: You must do my English homework.
Student 3: You must bring me chicken.
But outside of the occasional chuckle, speaking testing is my least favorite duty as a native English teacher.
I understand the rationale. Korean high schools grade students nearly exclusively through performance tests. Whether that is a midterm, a final exam, a speaking test, an essay contest, a listening test, or some kind of project presentation, students’ grades almost exclusively boil down to their ability to perform in clutch moments.
Contrast this with American high schools. While tests are no doubt important in high school, test-anxious students can still raise their grade through scores on homework, classroom participation, portfolio projects, and essays. In other words, they can supplement their scores with consistent classroom engagement and diligent work outside of class. Korean students, on the other hand, study for hours upon hours to take grade-defining tests that last anywhere from 1-50 minutes.
I recently had an enlightening conversation with my co-teacher about evaluation differences between the Korean and American education systems. In Korea, a bulk of students’ scores come from multiple choice and short answer exam questions. On the other hand, American grades tend to have much more balance between objective scores (like multiple choice) and more subjective grading measures (like essays and projects).
Multiple-choice and short answer questions have their benefits. For one, they are very easy to grade. It’s so easy, that most schools do it with scantron machines. Second, assuming that teachers craft questions carefully, multiple choice questions are a fair assessment of a student’s knowledge. Ideally, every multiple choice question has one unambiguous correct answer. Therefore, students either know the answer or they don’t (though some students can benefit from a 20-25% boost from random guessing). Therefore, one reason Korea leans so heavily on such assessment methods is that many perceive it as more fair and equitable than comparable subjective means of grading
However, multiple choice and short answer questions also present downsides. For one, a solid score on a multiple choice test only reflects the student’s ability to correctly identify correct answers. In other words, it is a test of fact recognition rather than deep understanding. It is quite possible that some students who score very high in multiple-choice tests could struggle to synthesize the theories and concepts of a given subject into a coherent argument or explanation (something an essay prompts them to do). In other words, asking students to write essays forces them to demonstrate deeper conceptual understanding than asking them to recognize correct pieces of information.
Another downside is the poverty of creativity. There are countless ways students can demonstrate their knowledge of concepts aside from the recognition of correct multiple-choice answers. They can write an essay, prepare a project, give a speech, or create some form of media. Moreover, such projects demand a higher level of understanding. One must take the time to explain a concept rather than simply recognize a concept’s correct components.
However, projects, speeches, and essays are not without flaws. The biggest concern with such methods of assessment is subjectivity. In multiple-choice tests, one either gets the right answer or the wrong answer. Assuming well-constructed questions, there is no room for debate. However, when grading an essay or an oral interview, there is much more room for interpretation.
“What components make up the grade?”
“What weight should each component receive?”
“How do we differentiate a student’s performance on each component?”
Most teachers answer these questions in the form of rubrics – point scales that spell out the expectations of a given project or essay. However, despite all efforts toward objectivity, rubrics will always contain some form of judgment.
For example, I borrowed this example from an elementary writing rubric:
A score of 4 – Piece was written in an extraordinary style and voice. It is very informative and well-organized.
A score of 3 – Piece was written in an interesting style and voice. It was somewhat informative and organized.
What exactly differentiates “extraordinary style and voice” from “interesting style and voice”? Unfortunately, rubrics are almost always irreducible to the teacher’s subjective interpretation. In other words, the teacher will give higher scores to the essay that they “like more”.
For the most part, this is not too problematic. Many teachers can differentiate amazing essays from good essays from average essays from poor essays with acceptable reliability.
But other subjective biases can also creep into the grading process.
One such example is name-grading. If a teacher knows the name on the essay they are grading, they may unconsciously skew their scoring based on their impression of that student.
“This student is always engaged and active in class. They have a terrific attitude. While this explanation could be clearer, I will give them the benefit of the doubt.”
“This student always chats while I am trying to give instructions. They disrupt class more often than other students. Their essay has some problems here. I better dock some points.”
This is one reason college students should attend professors’ office hours when they can. Building rapport and showing initiative may give a student’s paper an edge when it comes to grading.
Other subtle biases can also come into play. Teachers may give preferential treatment to essays with cleaner handwriting, essays with strong introductions and conclusions (but maybe weaker bodies), or even the essays they read first and last. Anchoring may lead teachers to cluster essay scores around score of the first paper read.
So while more subjective measures of assessment provide students with more creative outlets to demonstrate their learning, teachers are more prone to cognitive biases than scantron machines.
So one reason I dislike speaking tests is that I know I will not be completely objective and fair. It’s far too difficult.
It is impossible to avoid name-grading as I must speak with individual students one-on-one. I see their faces and know their names.
Moreover, a speaking test requires careful listening. I can always read an essay a second time if I am unsure of which score to give. But in an oral interview, I have one opportunity to listen. If my mind drifts for even 10 seconds, I have no choice but to construct an educated guess on students’ performance.
I have tried to address these biases in the past. During my first speaking test, I attempted to grade objectively (and ultimately disheartened my students with a C+ overall average).
The second time around, I chose to acknowledge my biases and perform statistical corrections to ensure relative uniformity in the average score and standard deviation among classes. While this provided the illusion of objectivity and equitable grading, I also knew that not all classes were completely equal. As a result, I likely artificially inflated the grades of under-performing classes while unjustly docking points from high-performing ones.
While discussing grading with my co-teacher, he made an interesting point. In Korea, teachers tend to constrain their scores on subjective evaluation inside a limited range. While this might give undeservedly high scores to low performers, it would also reduce the risk of students complaining over their scores. It is far easier to dispute a difference of ten points over comparable performances than it is to dispute a one-point difference.
So this time around, I adopted a new grading plan. I set my base score at 70 points. Students who sit down and say “my name is Min-ho” and nothing more would earn this score. From there, I judge their pronunciation, grammar, vocabulary, comprehension, and fluency on a scale from 15-20. A score of 15 reflects no grammar, nearly unintelligible pronunciation, and agonizingly delayed fluency. On the other hand, scores of 20 reflect performances nearly matching that of a native speaker. The average score consequently increased from around a 78 to a 90.
Obviously, this system is not perfect. In some ways, I am a bit of a coddler. However, I tend to have warm opinions of my students. Most students do their work in class, speak when I ask them to, and behave themselves (within reason). If my speaking test is the only score they receive in my class, I want their semester-long efforts reflected in their score.
Despite my concerns, I can take comfort in the fact that my scores only reflect 5-10% of their English grade. Moreover, I only need to sacrifice two weeks a year for this test. I can spend the rest of the time doing what I love – challenging my students to speak English and exercise their creativity.