From the Principal's Office: Call for Skepticism and Caution When Using Test Scores in Teacher Evaluations

“We need to be careful that the tests we use in a properly designed teacher-appraisal system do, in fact, contribute to a valid (that is accurate) inference about a teacher’s quality.” W. James Popham, Evaluating America’s Teachers: Mission Possible?

North Carolina took the plunge this year and started using test scores as part of teacher and principal evaluations. The state has even invented a "new" kind of test, called a "Measure of Student Learning" in order to make sure there is plenty of test data to go around. What is particularly telling is how "carefully" the state crafted the term "Measures of Student Learning." It's as if somehow, not calling it a test, makes it not a test. State level educational logic never ceases to amaze me. Of course, the state then started calling these "Measures of Student Learning" something else. They started calling them "Common Exams." Notice again, the careful use of the word "exam" rather than "test." It's almost as if you don't call it a test, it isn't a test, but apparently state level policymakers haven't heard the old saw about a rose still being a rose even if it has another name.

Besides North Carolina's struggle with what to call their newly implemented tests, there's still the question of what the unintended consequences of having thousands of teachers "teaching to the test" is going to do for students in our state. Ultimately, being able brag that your students "Have the best scores in the world" is most likely what politicians and state level education officials are after. That's why they see salvation through test scores as the means to the "Educational Promised Land." Ultimately, there's a flawed logic driving this whole accountability and testing movement: it's the whole idea that learning can be entirely reduced to bubble sheet answer sheets and taken in a single sitting. And, that teachers can't be trusted to tell when a student has demonstrated that they have learned or not.

In my years as an educator, I have been amazed how trusting and accepting educators in North Carolina are when it comes to the latest policy flowing down from on high. It's as if they accept that those at the state level know more than they do, or somehow have access to magical information they do not have. So, when they implement something like the use of test scores in evaluations, many educators accept that the powers that be at the state level know what they are doing, so they trust them. Given the history of reform ideas and educational policy that travels down from on high, this "trust" is highly misplaced. I like to think that state level education officials mean well, but what often has happened during my career, these ideas when implemented locally have sometimes been a disaster and have been sometimes downright bad for kids. Instead of being so trusting, I submit that all educators in the schools and districts need to become skeptics and ask tough questions of our state-level, and federal level policymakers. We should never accept the "trust me, this will work" answer.

It is in this spirit of skepticism, I turn to Popham's book, Evaluating America's Teachers: Mission Possible? and our state's venture into making high stakes testing even more high stakes. In spite of what our state-level policymakers say, I am not fully satisfied that North Carolina's tests are adequate measures of educator effectiveness, and a healthy skepticism is still in order. This whole push to add test scores to teacher and principal evaluations has been a rush from the start. Depending on when you asked questions, how the tests were to be implemented has changed multiple times throughout the last two years. Never mind the fact that not a single teacher in North Carolina even saw the test before they were implemented. In their rush to have "test data" it's as if our state level policymakers think "any old data will do." They have failed to take the time to establish whether any of these tests really tell us anything about teaching quality.

In light of our state's push into "higher stakes testing," I think Popham reminds us of some important key issues and ideas about tests and teacher evaluations that state politicians and policymakers seem to forget.

“Tests are not valid or invalid. Instead, it is a test-based inference whose validity is at issue.” In other words, it isn't the test that’s valid or invalid, it is the inferences drawn from those tests that have these qualities. It boils down to whether you can actually make an inference based on the test or not. The question is whether North Carolina's tests, which have been implemented haphazardly and a thrown-together-manner, actually tell us anything at all about the quality of teaching in our classrooms. Can I honestly say Teacher A is a good teacher because she added "this much" value to her students Measures of Student Learning? Seems to me that it puts a great deal of faith in a single test.
“Tests allow us to make inferences about a test taker. This inference, depending on the appropriateness of the test as a support for the inference being made, may be valid or invalid.” As Popham points out, the inference we make about the learner may be valid or invalid depending on the “appropriateness of the test” in its role to support the inference being made. As we know, the word validity is the extent to which that inference, or conclusion, is well-founded or corresponds to the real world. This boils down to whether the inference we draw about a student is valid or not. For example, should we infer, based on a student’s test scores that he is not proficient in the subject, we must be satisfied that the test we are using is the “appropriate measure,” and we must also make sure the conclusion we draw considers all real world facts. Ignoring a student’s socio-economic status, or even whether he experienced a death in the family, can make our inference about he student’s proficiency invalid. Then there's the whole issue about making an inference about a teacher or principal's effectiveness using this same test. Has North Carolina sufficiently established the appropriateness of their Measures of Student Learning, End of Grade Tests, End of Course Tests, as instruments that allow for making inferences about teacher and principal quality? I'm not sure they have.

As North Carolina moves forward with a teacher and principal appraisal instrument that uses test scores to determine effectiveness, all educators need to educate themselves and scrupulously ask questions of policymakers.

As Popham suggests, “If heavy importance is being given to students’ performances on state tests for which there is no evidence supporting such an evaluative usage, then teachers (I would add principals too) might wish to engage in further study of this issue so that, armed with pertinent arguments, they can attempt to persuade educational decision makers that more appropriate evidence should be sought.” In other words, all educators, administrators, and teachers need to study how North Carolina or any state is using test scores to determine educator effectiveness.

Administrators owe it to their teachers, and themselves, to understand that some of these tests were never designed to determine educator effectiveness, so that data needs to be viewed with skepticism. Test scores in North Carolina currently are only 1/6th of the teacher evaluation, and effective administrators are going to keep this in mind and not let the allure of numbers numb them to the other 5 standards.

cross posted at the21stcenturyprincipal.blogspot.com

J. Robinson has decades of experience as a K12 Principal, Teacher, and Technology Advocate. Read more at The 21st Century Principal.