Tuesday, October 16, 2012

Standardized test scores are like a broken clock

Have you ever heard someone use standardized test scores to judge schools?

The Alberta Government recently released an information bulletin that boasted Alberta student performance results continue to rise:
The overall percentage of students who attained a standard of excellence on Grade 3, 6, and 9 provincial achievement tests (PATs) increased to 20.2 per cent from 19.5 per cent in the previous year. The percentage of students who met the acceptable standard also rose slightly to 75.5 per cent from 75.2 per cent. One of the highlights of the results is the percentage of students who achieved the standard of excellence in Science 6 and Science 9.
Many Albertans might take these standardized test score results as prima facia evidence that things are well. Many Albertans may be satisfied with this information and confidently move on with their regularly scheduled day, thinking that Alberta schools are not only doing well, but they are improving.

What if we are wrong? What if these scores are giving us false confidence? What if standardized test scores aren't telling us what we think they are telling us?

When some Albertans boasted about these results on Twitter, I responded with:
Assessing an education system via standardized test scores is like assessing a car by kicking the tires.
Some challenged me by asking:
Wouldn't the analogy be, "like assessing a car by comparing its gas mileage relative to motor size and tank capacity?"
My response: 

No. 

The assumption made by this analogy is that we think we know what standardized test scores tell us: we assume these scores are our window into the schools -- therefore we assume we can use these scores to judge the quality of teaching and learning that goes on in a school.

But what if these unquestioned assumptions about standardized testing are wrong?

Seth Godin writes:
The worst kind of clock... is a clock that's wrong. Randomly fast or slow. 
If we know exactly how much it's wrong, then it's not so bad. 
If there's no clock, we go seeking the right time. But a wrong clock? We're going to be tempted to accept what it tells us. 
What are you measuring? Keeping track of the wrong data, or reading it wrong is worse than not keeping track at all.
Standardized test scores are like a broken clock because we assume that these scores tell us what we need to know about our schools -- we assume that these scores reflect teaching and learning and therefore assume that if the numbers are rising that must be a good thing.

But what if this is misguided? What if our reliance on standardized tests to judge the quality of the teaching and learning in schools is like relying on a broken clock for time?

Consider this:
  • Standardized test scores are a remarkable way of assessing the socioeconomic status of students and their families. Study after study has shown that out-of-school factors account for an overwhelming proportion of the variances in scores. That means that standardized tests tend to tell us more about what kids bring to school than what they do at school. Here's a Canadian example and an American example.
  • There is research that suggests there is a statistical association between high scores on standardized tests and relatively shallow thinking.
  • Standardized tests tend to measure what is easily measurable, which turns out to be what matters the least. There is a big difference between measuring what we value and valuing what we measure. When we narrow what matters to what can be measured by a standardized test, we fall victim to the MacNamara Fallacy which basically looks like this: (1) Measure whatever can be easily measured on a standardized test. (2) Disregard whatever can't be easily measured or given an arbitrary quantitate value. (3) Presume that what can't be measured easily isn't important. (4) Say what can't be easily measured doesn't even exist.
  • There is research that suggests that when teachers are held accountable for their students' standardized test scores, they tend to become so controlling in their teaching style that the quality of students' performance actually declines.
To fully grasp why this is true, there's a lot to know about the arcane underpinnings of standardized tests; however, testing guru Daniel Koretz gives us a single principle that summarizes what we need to know:
Never treat a test score as a synonym for what children have learned or what teachers have taught.
Again, this too can be true for lots of reasons, but Alfie Kohn gives us a single principle that summarizes what we need to know:
A right answer on a test does not necessarily indicate understanding and a wrong answer does not necessarily indicate a lack of understanding.
I would imagine there are times when standardized test scores might reflect the teaching and learning that goes on in a school, but remember, even a broken clock is right twice a day.

Standardized tests look good from afar but are far from good at reflecting what matters most when it comes to teaching and learning. The closer you look at standardized tests, the more you realize that their utility and convenience comes at an alarming and unacceptable cost. Ask yourself if what we're learning from standardized tests is worth the price.

I would rather no information - no data - nothing! than the grossly misleading and misused data that is extracted from standardized testing. As long as the public is fed standardized test scores, we will be tempted to accept what they tell us -- but if the public had no information about their schools, they would be forced to seek it out which might lead more people to actually step foot in their local schools.

12 comments:

  1. Your critique of standardized testing in the Alberta context is not complete. In your comments you neglect to mention that the Provincial Achievement Testing Program is blueprinted from each specific Program of Study it is testing. For instance, the Grade 6 Science PAT consists of 50 multiple choice questions derived from the five units of study throughout the year: Air & Aerodynamics; Flight; Sky Science; Evidence and Investigation; and Trees and Forests. There is a balanced representation from each of these topics on the test, ranging between 18% and 25% for each of the five areas. There is a 40%-60% split in knowledge and skill based questions, respectively. The design and development of all Provincial Achievement Tests is completed primarily by veteran teachers who are, at the time of development, actually teaching the grade and subject in question. The work involved in the creation of these tests includes item development, test validation, and field testing. The development of one test is the cumulative work of hundreds and hundreds of Alberta-Teacher-Hours. Based on this level of diligence, your analogy of a “broken clock" is clearly inappropriate. These are not measurement tools based on arbitrary outcomes prepared by bureaucratic statisticians.

    You quote Daniel Koretz in order to question both the validity and reliability of the results derived from these tests. I am certainly not going to dispute that there are a great number of factors involved in an individual student's achievement on a standardized test, including but not limited to socio-economic factors; cultural biases; literacy levels; and the reality that on any given day there are a multitude of circumstances that could have taken place that would directly influence a student's ability to write a multiple choice exam. However, each specific Provincial Achievement Test is written by over 38,000 students in the Province (From Alberta Education web site, the 2011 PATs ranged from a low of 38,083 Grade 9 students writing Math, to a high of 39,542 Grade 6 students writing the Math test). Given the tens and tens of thousands of students writing these tests, it would be exceedingly difficult to argue that the data garnered from the analysis of the cumulative results cannot be used to generalize the state of "learning" in the Province of Alberta.

    Before I conclude, I do want to make it clear that I am not arguing that the Provincial Achievement Testing program in Alberta is the only way to measure student learning. Even Alberta Education itself has never made such a statement. Advocating such a one-sided view on achievement would overlook almost ten months of rich and ongoing assessment that takes place in every classroom, every day! However, saying that we should “Never treat a test score as a synonym for what children have learned or what teachers have taught" is equally inappropriate on the far other end of the spectrum.

    In closing, I want to commend the author of the analogy you quoted, "Assessing an education system via standardized test scores is like assessing a car by comparing its gas mileage relative to motor size and tank capacity". Seems like a fine metaphor to me ... using one small amount of data to make a very general & overarching assessment, but not enough to make a conclusive decision. I would certainly be interested in a vehicle that can get me over 700 km on a 50 litre tank of gas, but once I had this information I would need to dig deeper to make a final decision. Similarly, if I see a school jurisdiction has consistently achieved above "Provincial Average" I would be interested to see & know what they are doing, but I would not immediately conclude this jurisdiction was the "poster child" for student achievement.

    Call it capitulation disguised as moderation if you like, but taking the view that standardized testing does not measure anything at all goes hand-in-hand with believing standardized testing measures everything.

    ReplyDelete
  2. Ron:

    How does any of the Alberta context that you mention address Campbell's Law, the MacNamara Fallacy, shallow thinking, narrowing of curriculum and teaching to the test?

    It feels like you are so determined to show that standardized tests reflect teaching and learning that you haven't addressed these factors.

    ReplyDelete
  3. Thanks for your suggestion well written article with lot of helpful information.Improve Mileage

    ReplyDelete
  4. We carry name brand ground effects and body kits for all sport compact cars and trucks. Browse our large selection of Ground Effects, we carry thousands of ground effects / body kits for hundreds of vehicles. Clean urethane lip kits to aggressive full replacement poly urethane bumpers. Select your car and take a look at our exclusive selection. Ground Effects is your source for your ground effects needs!

    http://groundeffects.com/

    [url=http://www.cctvdrainsurveysinlondon.co.uk/drain-cleaning-in-london/]london drain cleaning[/url]
    [url=http://www.cctvdrainsurveysinlondon.co.uk/drain-cleaning-in-london/]drain cleaning london[/url]
    [url=http://www.cctvdrainsurveysinlondon.co.uk/drain-cleaning-in-london//]drain cleaning in london[/url]

    ReplyDelete
  5. We carry name brand ground effects and body kits for all sport compact cars and trucks. Browse our large selection of Ground Effects, we carry thousands of ground effects / body kits for hundreds of vehi
    [url=http://www.cctvdrainsurveysinlondon.co.uk/drain-cleaning-in-london/]london drain cleaning[/url]

    ReplyDelete
  6. We carry name brand ground effects and body kits for all sport compact cars and trucks. Brows

    [url=http://groundeffects.com/]camaro ground effects[/url]

    ReplyDelete
  7. elect your car and take a look at our exclusive selection. Ground Effects is your source for your ground effects needs
    This is a link
    < camaro ground effects=” http://groundeffects.com”> camaro ground effects< /http://groundeffects.com>

    ReplyDelete
  8. < camaro ground effects=” http://groundeffects.com”> camaro ground effects< /http://groundeffects.com>

    ReplyDelete
  9. [url=http://www.cctvdrainsurveysinlondon.co.uk/drain-cleaning-in-london/]london drain cleaning[/url]

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. CCTV drain survey London are becoming more and more necessary as many mortgage lenders are requiring them prior to the approval of loans. It is also very good practice if you are looking to buy any property if you want to avoid any expensive repair bills for blocked or collapsed drains. Of course leaking drains can cause more than damage to just themselves, permanent damage could also be inflicted on the foundations of the house also.
    cctv drain surveys london
    cctv surveys london
    cctv drain surveys in London

    ReplyDelete
  12. Many of the older properties in London and even some of the newer ones are prone to drainage problems. With everything from root growing into pipes, your own home waste blocking the pipe, to leaves and gravel being washed into the storm drains, there are a million and one things that can cause pipes inside and outside your home to be blocked.
    london drain cleaning

    ReplyDelete

There was an error in this gadget

Follow by Email