Leaps of Logic and Sleights of Hand:
The Misuse of Educational Research In Policy Debates

Did the New York Times sensationalize its account of an analysis of value-added measures of teacher performance it recently featured on its front page, misleading its readers about its policy implications? Have commentators such as the Times’ own Nicholas Kristof and bloggers such as Ed Sector’s Kevin Carey seized upon the Times’ misleading narrative to confirm pre-existing policy biases, rather than do their own careful reading of what is universally acknowledged to be a rather complex study? Was Mayor Bloomberg’s cynical use of the analysis and Kristof’s column in his State of the City address to teacher bash and union bash, as he cited them to justify his mass closure of PLA schools and his refusal to negotiate meaningful appeals of ineffective ratings, not the logical conclusion of this misrepresentation of educational research?

An email exchange I had with one of the co-authors of the study, Raj Chetty of Harvard, provides interesting evidence that the answer to all of these questions is yes.

Chetty sent me an unsolicited email as a result of a post I wrote here on Edwize. In my comments, I had noted that the analysis of Chetty and his co-authors, John Friedman of Harvard and Jonah Rockoff of Columbia,  had received a great deal of prominent media attention despite the fact that it had not yet been submitted to a peer-reviewed journal for publication, much less actually published. Instead, shortly before the Times article appeared, the paper had been uploaded on Harvard and National Bureau of Economic Research web sites as a working paper. The sheer complexity of the analysis, combined with the unorthodox way in which it was released, meant that its authority could be used to support policy proposals without fear of challenge, regardless of how poorly founded and unsupported these proposals were by the study itself. This, I concluded, was “the politicization of educational research.” The causalities were the quality of discussion and decision-making around policy choices.

Chetty wrote to me, professing an unawareness of recent political events that shaped the discussion of value-added measures of teacher performance. “Our goal as researchers,” he avered, was “simply to report the most rigorous scientific evidence on a subject in order to help the debate be guided by data.” And “while the headlines in the press may sometimes suggest otherwise,” he concluded, “the main message of the paper is simply that great teachers have great value, and that test score impacts can be one useful input into identifying such teachers.”

I wrote back, thanking Chetty for his email and explaining that it would have been easier to accept his statement of intentions if he and his co-authors had not been quoted in the Times and on the Newshour supporting policy prescriptions – the mass firings of teachers with low value-added scores – that their own analysis did not support. I specifically referenced the fact that their working paper discussed the problems of the high-stakes use of value-added scores, asking whether the potential benefits of using it in teacher evaluations outweighed the possible costs. Chetty and his co-authors has characterized these problems as  “important issues” that “must be resolved before one can determine whether VA should be used to evaluate teachers.” How could this caveat be reconciled, I asked, with the following passages in the Times:

“The message is to fire people sooner rather than later,” Professor Friedman said.

Professor Chetty acknowledged, “Of course there are going to be mistakes – teachers who get fired who do not deserve to get fired.” But he said that using value-added scores would lead to fewer mistakes, not more.

And on the Newshour:

RAY SUAREZ: Is — could it also be concluded from your study that it ought to be easier to fire ineffective teachers? And I’m really sorry the union leader isn’t here with us right now when I’m asking this question.

But is that part of your conclusion?


I think — you know, let me make an analogy here. Suppose you are managing a baseball team, say, the Boston Red Sox, and you’re trying to do as well as you can. You have players with different batting averages. One approach you might take is to bring the hitting coach out and try to raise the batting averages of the players you have.

But I think it also makes a lot of sense – and this will make sense to sports fans – that, on occasion, you might decide to let some of the players with lower batting averages go, and try to get somebody else who might do better. And so I think it makes sense to use a combination of those tools.

Here, I think the stakes are even much bigger. We’re talking about the future of our children, rather than winning a baseball game. So I think it does make sense to consider those policies seriously.

Chetty wrote back, responding to my counterposing of the qualifying lines from their paper and the quotes of he and his co-authors in the media:

I think the PBS statements you quote do summarize my current reading of the evidence fairly; the Times quotes are out of context and thus I agree give an incorrect impression.  I agreed to do the PBS interview partially to have the opportunity to state our findings more clearly, recognizing that people might misinterpret the articles in the press.

While I continue to have some clear differences with the way in which Chetty and his co-authors extrapolate their findings and draw policy inferences from their study, it is important that they themselves were dissatisfied with the way their study and policy recommendations were represented in the Times‘ article.




  • 1 Stuart Buck
    · Jan 17, 2012 at 1:01 pm

    the unorthodox way in which it was released,

    There was nothing unorthodox about this at all. The overwhelming majority of peer-reviewed economics and education papers these days are released as working papers first.

  • 2 Leo Casey
    · Jan 18, 2012 at 9:32 am

    The issue, Stuart, is not that it was first released as a working paper, but that it was released as a working paper and to the media at the same time. Consequently, there was no opportunity for others working in the field to digest an analysis which is innovative in important parts, very dense, and very complex, an opportunity that was necessary for them to contribute to an informed conversation about the strengths and weaknesses of the analysis, and to discuss what policy proposals could be reasonably based on its foundation. I explained this point at some length in the original post, and did not think it necessary to repeat it in the same detail here.

  • 3 Matthew Ladner
    · Jan 18, 2012 at 9:59 am


    Do you have any peer review research to suggest that student learning would suffer if schools let teachers with low value added scores go?

  • 4 Leo Casey
    · Jan 18, 2012 at 12:32 pm

    My friend Mr. Ladner, a charter member of Jay Greene’s union, the United Cherry Pickers. Nice of you to stop by.

    No, I do not think that when value added systems such as the one employed by the NYC DoE have margins of error greater than 50 points (+ or – 28 points) or when others value added systems have as many a 25% of the teachers go from the top to the bottom and the bottom to the top quartiles in one year, we need to use them to fire teachers, so that we can then do peer review research that demonstrates their negative impact on learning and students. But then I am also inclined to not jump out the window of my office on the fourteenth floor to demonstrate that there would be negative effects from taking that action.

  • 6 Matthew Ladner
    · Jan 19, 2012 at 12:52 am


    So NBER working papers are bad if they reach conclusions that Leo doesn’t like, but Leo’s assertions are to be taken on faith because…they make sense to Leo.

    I’ll come back in a few years to check on you to see if there has been any progress.

  • 7 Leo Casey
    · Jan 19, 2012 at 9:06 am


    To borrow a line from Barney Frank, talking to you is like talking to a kitchen table. You don’t like the facts around various value-added systems — their level of statistical noise and their statistical unreliability — so you just ignore them.


