(This is the first of two posts on the new teacher evaluations, focusing on the overall scoring of the evaluations and the role of standardized exams. The second post will take up the question of appeals.)
The 2010 law that established a new framework for the evaluation of New York educators was a complex piece of legislation, and last week’s agreement to clarify and refine that law with additional legislation added another layer to that complexity. The complexity is unavoidable. It is important to have evaluations based on multiple measures of teacher effectiveness, just as it is important to evaluate students based on multiple measures of their learning: more measures and more forms of evidence produce more robust, more accurate and fairer evaluations. Further, multiple measures allowed New York to avoid placing inordinate weight on standardized exams and value-added algorithms, as other states have done to very negative consequences. And it was essential that the bulk of the evaluations be established locally through collective bargaining, with the law only providing a general framework. These objectives necessarily led to a high level of complexity.
But with that complexity, New York is on the road to teacher evaluations that will engage educators in meaningful professional dialogue, provide them with essential supports, and give them the tools to hone their craft. With evaluations based on multiple measures, evaluations will be more comprehensive, more accurate and fairer, and in sharp contrast to other states such as Florida and Tennessee, the role of standardized testing in the evaluation will be minimized. With collective bargaining playing a key role in the shaping of “on the ground” evaluations, teacher unions have the input that will allow us to protect the educational integrity and fairness of the evaluation process.
Unfortunately, complexity has provided a fertile ground for commentaries on the New York teacher evaluation framework that reach alarmist conclusions, with arguments built on a foundation of misinformation and groundless speculation. A widely circulated piece by Long Island Principal Carol Corbett Burris, published on the Washington Post’s Answer Sheet blog, is in the thrall of this alarmist alchemy. Burris decries the law and last week’s agreement as allowing “test scores… to trump all.” Under its scoring, a teacher could be “effective” in all components of the evaluation and yet still receive an overall rating of “ineffective.” The law, Burris concludes, is creating an evaluation system in which schools and students will “lose great teachers.” At the Bridging Differences blog, Diane Ravitch has now taken up Burris’ argument, repeating her main points as gospel.
To comprehend why Burris’ conclusions are problematic, it is necessary to understand the general framework of the teacher evaluation as laid out in the 2010 law and last week’s agreement. Teacher evaluations are based on a 100 point scale, with 60 points of the evaluation based on measures of teacher performance and 40 points based on measures of student learning. These two general categories are further divided into different components.
The measures of teacher performance must include supervisory observations of lessons that utilize a research-based framework of teaching such as the Danielson framework, and these observations must account for at least 31 of the 60 points. But they can also include a variety of other measures which may account for as much as 29 of the 60 points, including peer observations that use the same research based framework for teaching and portfolios of artifacts of teacher performance such as lesson plans and student work. Districts such as Rochester have already negotiated the inclusion of peer review in their measures of teacher performance.
The measures of student learning are divided into two: measures developed from state assessments, worth 20 points, and measures developed from local assessments, worth 20 points. For those teachers who teach English Language Arts and Mathematics in grades 4 through 8, the state assessment measures will be a value added growth measure derived from the state’s standardized exams in those subjects. These are a minority of all teachers. A different set of measures called student learning outcomes will be used as the state measure for all other teachers, based either on existing standardized exams or on local assessments aligned with the common core standards. The local assessment measures may be based on entirely different assessments of student learning, such as performance assessments, provided that they are “rigorous and comparable across classrooms.” Or, as a result of last week’s agreement, they may be a new measure, different from than the state’s value-added growth measure, but still based on the results of the states’ standardized exams.
Here then is a schematic of the general framework of teacher evaluations in New York:
MEASURES OF TEACHER PERFORMANCE
(60 of 100 points)
MEASURES OF STUDENT LEARNING
(40 of 100 points)
|Minimum of 31 Points||Up to 29 points||20 Points||20 points|
|Supervisory Observations||Other Measures such as Peer Observations and Portfolios of Artifacts of Teacher Performance||For Teachers of ELA and Math, Grades 4 through 8:
Value-Added Growth from State Standardized Exams
|For All Teachers:
Growth on Local Assessments, such as Performance Assessments
|For All Other Teachers:
Growth Measures on “Student Learning Outcomes”
|For Some Teachers:
Different Measures of Growth from State Standardized Exams
In all of the complexity of these multiple measures, there is one essential point to remember: 80% of the total evaluation – the measures of teacher performance and the measures of student learning based on local assessments – are set through collective bargaining at the district level. This provides teacher union locals with an essential and necessary input into teacher evaluations, allowing us to ensure that they have educational integrity and are fair to teachers.
We are now in a position to see the full dimensions of the misinformation and groundless speculation in Burris’ argument. Three central issues present themselves.
First, Burris incorrectly assumes that the entire 40 points in the measures of student learning will be derived from standardized state exams. But the use of value-added growth measures from state standardized exams need not take up more than 20% of the total teacher evaluation – and then only for a minority of teachers, those teaching English Language Arts and Mathematics, grades 4 through 8. Standardized state exams can only be used as the basis for the local measures of student learning if the union local agrees to their use in collective bargaining. I know of no significant New York district where the local union has agreed to the use of standardized state exams as the basis for the local measures of student learning. In New York City, the UFT has taken the position that under no circumstances would we agree to the use of standardized state exams for the local measures of student learning; at the point that the negotiations for the evaluation system for the 33 Transformation and Restart schools broke down over the appeals system last December, we were developing high school and middle school performance assessments for the local measures.
Indeed, insofar as the state’s Student Learning Outcomes could be operationalized with local performance assessments aligned with the common core standards, standardized exams could well play NO role in the evaluation of the majority of teachers. The reality of teacher evaluations under the New York law is thus significantly better than what is found in other state evaluation systems – and dramatically at odds with Burris’ vision of standardized state exam scores “trumping” everything else.
This reality provides important context for the vexing issue of scoring bands. At the behest of Governor Cuomo, the New York State Education Department set overall scoring bands for the teaching evaluation system which are quite stringent: very low scores in both the state and local components of measures of student learning (0, 1 or 2 out of a possible 20 in both components) will lead to an overall ineffective rating, regardless of how a teacher scored on the measures of teacher performance. If both components were based solely on standardized test scores, using unreliable value-added models with high margins of error, as Burris incorrectly claims, these scoring bands would have the potential of producing unfair ratings among outlier cases. But with at least one of these two components being a local assessment that, as it is collectively bargained, should be an authentic assessment of student learning, this objection does not hold. Teachers and their unions have always said that we wanted to be responsible for student learning – our objection was to the idea that standardized exams provided a true measure of that learning. With the inclusion of authentic assessments of student learning, student achievement must be a vital part of our evaluation.
A compelling approach to the issue of using value-added scores in teacher evaluations is found in the Hechinger Report blog post of Columbia University sociologist Aaron Pallas. Pallas sensibly suggests that where value-added models of standardized test scores are included in a teacher evaluation, the scoring needs to take into account the margin of error in a teacher’s score.
Second, Burris’ commentary ignores the ways in which the New York teacher evaluation law turns over the scoring of different components of the evaluation to local collective bargaining. On the measures of teacher performance, worth 60 points, the selection both of the research-based teaching framework for observations and of the HEDI (Highly Effective, Effective, Developing and Ineffective) scoring cut points for that framework are the subject of local collective bargaining, as are the selection, weighting and scoring of measures other than supervisory observations. On the measures of student learning, both the selection and the scoring of the local assessment are the subject of collective bargaining. The law thus gives local unions the means to prevent the very sort of scenario Burris plays out in her piece, where a teacher is effective on all the measures of teacher performance and all the measures of student learning, yet still receives an overall rating of ineffective. But Burris simply ignores the collective bargaining requirements and speculates that a scoring range for the measures of teacher performance will be established that, conveniently, produce the results that makes her scenario work. Is it really necessary to note that teacher union leaders with substantial experience in collective bargaining know how to do simple math, and would not agree in collective bargaining to scoring bands for teacher performance that would produce such an incongruous and unfair result?
Third, in her descriptions of different hypothetical teachers who would be harmed by the new teacher evaluation framework, Burris suggests that teachers who teach students with learning challenges, such as English Language Learners and students with special needs, would be harmed by the new teacher evaluation. That would be true if the measures of student learning were based on the extraordinarily naïve notion that all students can meet the same academic standards in the same time frame, regardless of their learning challenges and their prior learning. But there is no evidence that such educational naiveté informs the New York teacher evaluation framework. When the UFT was working on developing performance assessments as the local assessments for the 33 Transformation and Restart schools, one of our agreements with the NYC DoE was the development of a system of weighting that would account for the academic challenges of a teacher’s students. And certainly one of the very first ‘validity’ tests of a value-added model of growth would be its ability to account for those academic challenges.
While a change of the complexity required by the new teacher evaluation system is daunting, it should not lead us to romanticize a failed evaluation status quo. As it now stands, evaluations are based on a single measure, the principal’s subjective rating of the teacher. In arriving at this rating, the principal may employ any framework or standards s/he finds fitting to observe and rate the teacher. A teacher who is fortunate to have an educator with integrity as his/her principal will have little to fear from this evaluation process. Yet even in these cases, the teacher rarely receives the feedback and support that will allow him/her to grow as a professional educator. This is especially the case for novice teachers and teachers experiencing difficulties in their classrooms, as they seldom receive the support they need to develop their craft and become skilled teachers. And in an era when the powers that be no longer deem educational experience and accomplishment to be the requisite qualities of a principal, far too many teachers find themselves with an unqualified principal, and are victimized by a politicized evaluation process that has precious little to do with education. The evaluation status quo is failing New York students and teachers, as the next post in this series – on the appeals process – will make clear. Change is necessary.
 For now, these two components are both 20 points. The law envisions that once the State Education Department has developed a valid value-added model for measuring growth in student learning, which it has yet to do, the state component can grow to 25%, while the local component would shrink to 15%.