« Social networking | Main | The punch line »



Interesting, thanks for the reminder.

However, I think in this case, there is an important point that you have overlooked. What they did here was to split 12 large schools into 47 smaller schools. This means that the underlying 'base' of students (or the pool from which they are drawn) has remained the same for each comparative group. BTW, one has to assume that the average percentage of graduation was calculated on the basis of individuals (not taking averages of several schools) and thus, the population size per school was probably the same.

One of the things that I am missing in the analysis is the expenditure per student prior and after the split, as this might be substantially different. Provided the OPEX per student is similar, than this 'experiment' would actually proof that small schools are much more effective than larger ones at teaching students.


It's a nice demonstration of the principle, but Dr Wainer spoils his point a bit by going on to say:

"the regression line shows a significant positive slope; overall, students at bigger schools do better. This too is not unexpected, since very small high schools cannot provide as broad a curriculum or as many highly specialized teachers as can large schools."

He has not presented the evidence that that's the reason for the positive slope, only told a just-so story about why it might be the reason. The small-schools movement have their own just-so stories that "explain" why their ideas are better, like smaller schools allowing teachers to know their students better. It doesn't make it true. Dr Wainer should have just let the graph speak for itself, instead of trying to tell a story about the graph that it wasn't equipped to confirm.

Alternatively, if he wanted to make the graph confirm the story, he ought to have graphed "breadth of curriculum" or "number of highly specialized teachers" against school size and PSSA score. The extra data would have been welcome context, but words alone are not; which I think I will take as a valuable check on my own tendency to lard my graphs about with words "adding context".

John S.

Andrew Gelman has a nice example of this. He shows a map of kidney cancer deaths by U.S counties. Shade the counties with the highest death rates, and sparsely populated Midwestern counties stand out. He asks his students to speculate as to why this might be. Lack of access to health care? Polluted groundwater?

Then he shows another map on which counties with the lowest death rates are shaded. Once again, sparsely populated Midwestern counties stand out again. See Section 3 of this paper:


This example is also included in his book "Bayesian Data Analysis".

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter