No sooner had I written about "story time" than the LA Times journalists on the education beat announced "Story time!"
An article published recently on using test scores to rate individual teachers has stirred the education community. It attracted Andrew Gelman's attention and there is a lively discussion on his blog, which is where I picked up the piece. (For discussion on the statistics, please go there and check out the comments.)
In reading such articles, we must look out for the moment(s) when the reporters announce story time. Much of the article is great propaganda for the statistics lobby, describing an attempt to use observational data to address a practical question, sort of a Freakonomics-style application.
We have no problems when they say things like: "There is a substantial gap at year's end between students whose teachers were in the top 10% in effectiveness and the bottom 10%. The fortunate students ranked 17 percentile points higher in English and 25 points higher in math."
Or this: "On average, Smith's students slide under his instruction, losing 14 percentile points in math during the school year relative to their peers districtwide, The Times found. Overall, he ranked among the least effective of the district's elementary school teachers."
Midway through the article (right before the section called "Study in contrasts"), we arrive at these two paragraphs (my italics):
On visits to the classrooms of more than 50 elementary school teachers in Los Angeles, Times reporters found that the most effective instructors differed widely in style and personality. Perhaps not surprisingly, they shared a tendency to be strict, maintain high standards and encourage critical thinking.
But the surest sign of a teacher's effectiveness was the engagement of his or her students — something that often was obvious from the expressions on their faces.
At the very moment they tell readers that engaging students makes teachers more effective, they announce "Story time!" With barely a fuss, they move from an evidence-based analysis of test scores to a speculation on cause--effect. Their story is no more credible than anybody else's story, unless they also provide data to support such a causal link. Visits to the classes and making observations do not substitute for factual evidence.
This type of reporting happens a lot. Just open any business section. They all start with some fact, oil prices went up, Google stock went down, etc. and then it's open mike for story time. All of the subsequent stories are not supported by any data; the original data creates an impression that the author uses data but has nothing to do with the subsequent hypotheses. So be careful!
"Visits to the classes and making observations do not substitute for factual evidence." I disagree with this and so does most everyone over the age of 12 months.
Here is an experiment: Look at your computer. Now close your eyes. By your statement above, your computer does not exist because you have no factual evidence of its existence.
If you went into a classroom and observed someone shooting another person, this observation is factual and would be admissible in a court of law.
Ethnographic research bases its research method on observation; sometimes even participation and observation together. The strength of the conclusions draw from the observation is what is limited, however the observations are still facts.
Posted by: Chris P | 09/02/2010 at 09:27 AM
Chris: Yes but you are reading a statistics blog, and statisticians want to see data.
Your example is a caricature of what I'm saying. If you and I enter a room and find a dead man, shot, that is a fact. However, you and I may not agree that a particular teacher has been "engaging" her students. Nor can we agree that a teacher highly rated for "engagement" is necessarily a teacher who will have high value-added performance.
The hypothesis in question can easily be tested through a randomized experiment. Even if such an experiment is deemed too expensive, they could easily find 10 people to rate teachers on "engagement", and then match the average ratings back to the value-added data, to query the hypothesis that higher engagement is the "surest way" to get to high value-add.
By that, I do not disagree that there are situations in which data cannot be gathered and statistically tested as described above, and then other methods have to be used. I would caution against making super-confident statements on causality when such limited methods are used.
Posted by: Kaiser | 09/03/2010 at 01:04 AM
I agree with you in general: unfounded leaps from correlation to causation are a sign of bad reporting, and can be very misleading. But this actually strikes me as a very reasonable statement. Engagement was apparently observed by multiple reporters as the primary commonality between the highly effective teachers. Risk of confirmation bias? yes. But if that's what the reporters discovered in their investigation, that's valid information to share. Plus, "sign" is a correlative word: they found that the best way they could subjectively identify a quality teacher was by whether they engaged the classroom: That's correlation. Formal experiments are preferable to ad-hoc ones, but ad-hoc is a big jump from "story time".
Posted by: Paul | 10/15/2010 at 01:01 PM
Just realized this post is a month and a half old: Andrew just linked to it. Posting on old content seems...impolite some how, so apologies. :)
Posted by: Paul | 10/15/2010 at 01:22 PM
I know I've been beating this drum quite a bit, but this is not just a case of rolling out out some data then jumping into a narrative; this is a case of rolling out a simplistic account of highly confounded data then jumping into a narrative.
How confounded?
"A study designed to test this question used VAM methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that VAM results are based on factors other than teachers’ actual effectiveness."
(from EPI)
Joseph and I have more on this at Observational Epidemiology but be warned, it's not pretty.
Posted by: Mark Palko | 10/15/2010 at 03:41 PM
Paul: Thanks for the comment. I don't have a rule that you can't comment on old posts. I was loathe to get into details about the reporter's "observational study"/ethnographic research as it distracts from my larger point. But there are many problems with the study: (1) it is not a blinded study: you go into the classroom knowing which teacher scored high and who scored low, with the explicit goal of explaining the difference; it would be much more credible if the researcher does not know a priori how the teachers scored; (2) all the study can show is which teacher engages students and who doesn't; to claim that the study could prove a causal link between engagement and test scores is surely too much; (3) the usual statistical concerns of design and sample sizes.
For me, the theory that engagement leads to higher test scores doesn't even pass the common-sense test; it smells of an educator's ideal rather than cold-hard reality. Compare one teacher who teaches to the test, making students do "mock tests" every day of the year, and one who teaches understanding of the materials, and does not feel bound by the test curriculum: whose students will do better in the standardized tests? While we might not like the answer, it is still the answer.
Mark: I will put up a post eventually about the problems of VAM. Thanks for pointing that out. I encourage readers to look at Mark's posts on this topic. My silence does not constitute endorsement of their methodology.
Posted by: Kaiser | 10/17/2010 at 09:36 AM