Log in  |  Search

Tweed’s “Value Added” Project: Ideology Trumps Education

New York Measuring Teachers by Test Scores: so reads the headline on the front page of this morning’s New York Times which announces the NYC Department of Education’s secretive pilot project to use value added statistical measures of student standardized test scores to examine the performance of teachers. The teachers and their schools will not be informed that they are the subjects of this study.

The DoE’s “value added” project is a fundamentally flawed exercise which can not possibly deliver what it promises. It is being pursued, with the full knowledge of its flaws, because technocratic ideology trumps sound educational practice at Tweed. Moving forward with such a flawed project is extraordinarily irresponsible because “value added” — the idea that one should measure how much academic progress students have made, rather than just their absolute academic standing — holds promise as an useful tool in the repertoire of schools and educators. But the way in which it is being recklessly pursued by Tweed will cast discredit on the entire enterprise.

The DoE has no contractual or legal authority to use test score data in the evaluation of teachers, and the UFT will oppose it with all the means at our disposal. This is a line in the sand for the UFT.

To understand just how intellectually dishonest this exercise is, consider the following. This pilot project is based on student scores on the annual New York State ELA and Math standardized exams, grades 4 through 8. [The initial year of testing, grade 3, provides a baseline, leaving only grades 4 through 8 — the years in which the exam is given on an annual basis — for the measurement of progress.] This means that the pilot can only be applied to the teachers of grades 4 and 5 in an elementary school, and to ELA and Math teachers of grades 6 through 8 in a middle school, a small fraction of all teachers. More importantly, since the ELA and Math exams are given in January, the students will have had at least two different teachers in the interval between exams — one in the spring term of one school year and the other in the fall term of the next school year. Even assuming that the exams are an accurate and complete measure of student learning — and there is ample evidence that they are not — a student’s progress from one exam to the next is thus dependent upon at least two teachers. In some instances, a student would have another two teachers for Academic Intervention Services in the 37.5 minute tutoring, and a fifth teacher if he attended a summer program. How could one possibly isolate an individual teacher’s contribution to a student’s progress using this method?

When confronted with this problem by the UFT and other educators and experts on “value added” it consulted, Tweed decided to move ahead nonetheless, by simply dividing the progress of the student between the two primary teachers. It does not require an advanced degree in statistics but rather simple common sense to understand that this defeats the purpose of the entire exercise. If a student has a really phenomenal, accomplished teacher in the spring term of one year, followed by a struggling novice teacher in the fall term of the next year, her test scores are likely to be flat or even regress, given that the struggling teacher is teaching in the period leading up to the exam. What Tweed’s method does is simply divide up the total progress between two teachers who are making very different contributions to the student’s academic progress, reverting both to the mean, as the progress attributed to the accomplished teacher is lowered and the progress attributed to the struggling teacher is raised. In short, while Tweed claims that it is measuring the contribution of individual teachers, its method is clearly incapable of distinguishing those contributions.

The defense of this procedure offered by Tweed is that they have have done a statistical analysis which concluded that averaging out the progress among teachers provides an accurate measurement of individual teacher contributions. The exact nature of this analysis was never explained; we were supposed to take this conclusion on faith, even as it defied elementary logic. No doubt, if one did aggregate analyses, averaging the contribution would not present a difficulty, precisely because the differences among teachers would cancel each other out in the aggregate. But the claim made on behalf of this project is that it will accurately measure the individual teacher contribution, and that is clearly not possible when one can not differentiate between the contributions of two or more teachers to an individual student’s progress.

To those who have followed the development of “value added” models of measuring academic progress, this fundamental flaw comes as no surprise. Experts such as Bill Sanders, senior research fellow at the University of North Carolina who devised the first value added model for education in Tennessee, point out that in their current state of development, these models are unrefined tools for individual differentiation and specification, at best identifying the outliers — the very best and very worst performers. And that is when there is one to one correspondence between a single teacher and the period of a student’s preparation for the exam. According to Sanders, value added models provides the most accurate data when one is looking at aggregate categories, such as the growth of teacher skill over time. [He finds that, on average, teacher skill increases through the first decade of performance, and then plateaus.] Sanders has said he won’t participate in a value added project which is not done in collaboration between the school district and its teachers, and he has had nothing to do with the New York City pilot.

One serious problem with the use of value added models for individuated, disaggregated analysis is that in order to perform such exercises, one must assume that which is clearly not the case — that students are randomly assigned to different classes and teachers. Princeton economist Jesse Rothstein has done a statistical analysis which shows that this assumption is false, using an elegantly simple falsification hypothesis — a fifth grade teacher should not have any effect on a fourth grade test score. The fact that a statistical analysis shows precisely such a relationship is explicable only by the fact of school life that every teacher understands, that students are not randomly assigned to classes and teachers. This is one very important reason why value added models produce meaningful statistics on an aggregate, rather than an individual, scale: if you consider all of the students in a school, you have eliminated the problem of the uneven distribution of students among classes and teachers, and if you consider all of the students in a district, you have eliminated the problem of the uneven distribution of students among schools.

We are now in the midst of an era of the great misuse of standardized exams. The experts who design such exams — psychometricians and psychologists — are outspoken in their insistence that exams are designed for different purposes, and that it is entirely illegitimate to take a test designed to diagnosis problems in a student’s reading comprehension, for example, and use it to reach a judgment on whether the student has mastered the skills she needs in English Language Arts, or to make a high stakes decision about a student’s promotion or graduation. With this misuse of standardized exams, the DoE is now attempting to extend this illegitimate use of standardized exams to make high stakes decisions about teachers, a purpose for which they were clearly not designed.

What is remarkable about Tweed seizing on such a fundamentally flawed and intellectually dishonest project is what it says about its estimation of the ability of its administrators, after six years of Klein’s Children First agenda. Millions upon millions of public and private dollars have been spent on its much vaunted Leadership Academy, but now Tweed is looking for ways to circumvent the professional educational judgments that the graduates of that academy make on teaching quality. There is a technocratic ideology that guides all of Tweed’s ventures, one that saw the development of ARIS — its new multi-million dollar computer database — as a way to directly observe and evaluate everyone in the system from a computer terminal in Tweed, without having to rely upon any intervening human judgment. In the hands of Tweed, ARIS is eerily reminiscent of the panopticon [literally, all seer] of the 19th century English utilitarian Jeremy Bentham, a 21st century ‘virtual’ version of Bentham’s prison architecture which allows an omniscient power to track each and every subject. Tweed’s technocratic ideologues are so intent upon having that power in the numbers on their desktop computer terminals that they will pursue that goal even in the face of the knowledge that those numbers can not possibly be accurate and complete.

Teacher evaluation is the subject of collective bargaining and teacher tenure is a matter of state law. The UFT will not open our agreement to consider any role for such a fundamentally flawed project in the evaluation of teachers, and we will defend with all our means the tenure law. Just this last year, the state legislature passed and Governor Spitzer signed a law which laid out three grounds for making decisions on teacher tenure — [1] supervisory evaluation, [2] peer review and [3] the ability of teachers to use data to inform their instruction. That is as it should be. Teaching is a demanding, difficult craft that is not reducible to a technocrat’s numbers.



  • 1 jd2718
    · Jan 21, 2008 at 8:44 pm


    today’s Times says:

    The United Federation of Teachers, the city’s teachers’ union, has known about the experiment for months, but has not been told which schools are involved, because the Education Department has promised those principals confidentiality.

    Can you clarify something? I am assuming that the UFT knew the DoE was collecting data, not that they were plotting to evaluate teachers.


  • 2 “You Gotta Keep the Devil Way Down in the Hole,” Fighting Poverty and Tweed « Ed In The Apple
    · Jan 21, 2008 at 9:54 pm

    […] Chester Finn, originally strong supporters of market forces have had second thoughts and Leo Casey at Edwize skewers the Department plan. If we drive “merit” dollars to teachers who the […]

  • 3 Leo Casey
    · Jan 21, 2008 at 10:31 pm

    The DoE told us that they wanted to do an “academic study” on value added models over the summer. Randi made it clear from the very first time this was raised in a meeting that we would be in total opposition to the use of data from that study to evaluate teachers. When they told us how they wanted to do the study, we raised the sort of objections I discussed above, to no avail. The spectre of using this study for teacher evaluations and tenure was raised by the DoE first last week, in a speech the deputy Chancellor gave in Washington DC.

  • 4 dr_dru
    · Jan 21, 2008 at 10:32 pm

    It says in the times that the UFT knew. What exactly does that mean? The rank and file certainly did not know? So why were we not told? Secrecy is NOT the same as leadership.

    · Jan 22, 2008 at 10:26 am

    Dr. Dru:

    Perhaps you should read the comment immediately above yours before you write.

  • 6 “This is a line in the sand for the UFT.” « PREA Prez
    · Jan 22, 2008 at 12:10 pm

    […] Casey, writing in Edwize, denied that report, stating, The DoE told us that they wanted to do an “academic study” on […]

  • 7 Should Student Test Scores Measure a Teachers Value? - City Room - Metro - New York Times Blog
    · Jan 22, 2008 at 12:40 pm

    […] city’s teachers union expressed outrage at the plan on its blog, saying the plan uses unreliable data and promising to fight if the city moves to make the […]

  • 8 jd2718
    · Jan 22, 2008 at 1:54 pm

    Social Studies Teacher:

    Extend some good faith. Dr. Dru and Leo were writing at the same time (look at the timestamps). The way the Times wrote that passage was misleading. I’m surprised there were only two questions, but Leo did answer pretty quickly.


  • 9 Blogboard: Finger-Pointing in NYC
    · Jan 22, 2008 at 6:44 pm

    […] United Federation of Teachers’ blog this morning Leo Casey of the United Federation of Teachers came down hard on New York’s pilot project that would use standardized test scores to evaluate teacher […]

  • 10 dr_dru
    · Jan 22, 2008 at 7:02 pm


    As Jonathan wrote; when I started to write, Leo’s post was not there. It was there by the time I had posted, as you can see by the time stamp.

    My criticism is still valid though. If Randi made such objections, why not let us know about it before the article came out? What would be the downside of that?


  • 11 Peter Goodman
    · Jan 22, 2008 at 8:21 pm

    Research linking teacher performance to pupil achievement is not new …


    From my experience researchers do NOT support using this research to rate teachers.

    The DOE is also supporting the Roland Freyer “pay the kids for performance” project …

    Who knows why Cerf decided to go public with project … Randi and Leo responded expeditously and vigorously … a one day story? or, do we do battle? the balls in their court.

  • 12 Eduwonk.com: NY Teacher Madness
    · Jan 24, 2008 at 12:38 pm

    […] This NYT story Monday has sparked all sorts of accusations and counter-accusations around the web. Joel Klein is a devil! The union is awful! It’s Tuskegee all over again! Basically, the NYC Department of Education is […]

  • 13 Apologies Or Apologists? More On Tweed's "Value Added" Pilot Project | Edwize
    · Jan 26, 2008 at 8:18 pm

    […] at the Quick and the Ed, Kevin Carey offers to apologize to us for Ed Sector’s caricatures of our arguments on Tweed’s “value added” pilot project. But Carey is unwilling to take us at our […]

  • 14 Peeing into the Wind: Why is Klein Picking Losing Fights With the Teachers Union? « Ed In The Apple
    · Jan 27, 2008 at 6:36 pm

    […] folks over at Edwize, Leo Casey and City Sue, skewered the concept, and, the blogosphere in general was […]

  • 15 This And That | Edwize
    · Feb 9, 2008 at 12:47 pm

    […] take a look at this defense of their value-added pilot project, which we previously discussed here. Of special note is the last paragraph, which alludes to the fact that the period being measured […]

  • 16 Measuring School Performance | Edwize
    · Sep 19, 2008 at 8:53 am

    […] the crux of the matter is this: even if students were randomly assigned to classes and schools [and Jesse Rothstein’s study has demonstrated that they are not], the classes can be quite skewed, especially in small schools […]