Tuesday, May 27, 2014

Algorithms marking essays? That diploma idea deserves a failing grade

This was written by Paula Simons who is a columnist with the Edmonton Journal. Simons tweets here. This post was originally found here.

by Paula Simons

The word “essay” comes from a Latin root, meaning to put something on trial, to put something to the test. That’s why we ask students sitting Grade 12 diploma exams in Language Arts and Social Studies to write timed essays as part of their test. Composing an essay doesn’t just test your ability to use correct English. It tests your ability to think critically. It tests your ability to make an argument, supported with facts. It tests your ability to critique conventional wisdom and articulate original insights.

Learning to write a cogent essay in an hour isn’t just excellent training for would-be newspaper columnists. Not every graduate will need to do trigonometry or balance a chemical equation in adult life. But we all need to know how to marshal facts to advance a convincing argument, whether we’re fighting a traffic ticket, negotiating a raise, or convincing skeptical friends to try a new restaurant.

Nonetheless, Alberta Education is apparently giving serious consideration to contracting out the marking of diploma essays to an American computer program that uses complex algorithms to predict and assign student grades. A number of American states are already using such programs to assess “high stakes” essay tests.

In January, LightSide, a company founded by graduate students from Carnegie Mellon University in Pittsburgh, presented a research report to Alberta Education. The confidential report, obtained by the Journal’s Andrea Sands, claims its software was 20 per cent more reliable than Alberta’s human markers.

LightSide’s system doesn’t detect grammatical errors. It has no capacity to fact-check, nor to analyze critical thinking.

“We cannot evaluate whether the points made in a series of claims lead naturally from one to another,” reads the study, written by LightSide CEO Elijah Mayfield. As well, says Mayfield, “on-topic responses that fail to address subtle factual nuances, misinterpret a particular relationship between ideas, or other factual errors will likely be scored highly if they are otherwise well-written.”

How does the computer know if something is well-written? It compares the essay it's evaluating to the hundreds of “training samples” in its memory.

“Computers can’t read a student’s essay, but they’re excellent at making lists — compiling, for a given essay, all of the words, phrases, parts of speech, and other features that characterize a student’s work,” says LightSide’s website. “Our software compares the differences in the features of a weak essay and a strong essay — as evaluated by a human reader. It then identifies the small things that might only appear in the strongest essays — vocabulary keywords, structural patterns of sentences, use of transition sentences ... If the writer has all of the little things right, just like the previously high-scoring example essays, they probably should receive the same score.”

In other words, if the essay’s style, vocabulary and syntax pattern match those of sample essays that earned high grades from human markers, the software will award similar grades, even if an essay is ungrammatical, illogical, or full of factual errors.

It’s quite fascinating, linguistically speaking. Certainly, we shouldn’t romanticize human markers, who bring their own biases, incompetencies, and idiosyncrasies to the grading process. A tired or frustrated or overwhelmed marker may not give consistent scores. Over thousands and thousands of student essays, machine marking may indeed offer statistical superiority.

But writing for a human marker, however flawed, is different than writing for a soulless algorithm. Writing is an intimate act of communication. When you write an essay, you write for an audience. You write to be understood, to entertain, to provoke, to connect, to share your insights and passions. Young essayists deserve the dignity of writing for sentient readers. Even if we accept the premise that machine marking yields more “accurate” results, it robs students of the relationship between writer and reader, of the right to be heard and understood.

How serious is Alberta Education about this? It depends on whom you ask — and when. Premier Dave Hancock praised the notion in the legislature last month, lambasted it at an Edmonton Journal editorial board Wednesday, then gave it guarded approval in an interview Thursday. Given the current Sturm und Drang in the Education portfolio it would seem madness for Hancock and beleaguered Education Minister Jeff Johnston to start another fight with Alberta teachers right now. But who knows what a new premier (or new education minister) might decide next fall?

If diploma exams need more consistent scoring, perhaps Alberta Education should invest in better marker recruitment, training and compensation. If we expect Alberta students to take diploma exams seriously, we should all do the same.

No comments:

Post a Comment

There was an error in this gadget

Follow by Email