The hazard of casual analysis of hazards
A skewed view of ten Indian states

Bill Gates should hire a statistical advisor

My coworker pointed me to a Huffington Post article claiming a Bill Gates byline that contains some highly dubious analysis and a horrific chart. We presume Gates was fed this information by some analysts but even so, one wishes he wouldn't promote innumeracy. But then, he has a history: Howard Wainer demolished analysis by his foundation used to channel lots of dollars to the "small schools" movement a few years ago; I wrote about that before.


First, the offensive chart:


 Using double axes earns justified heckles but using two gridlines is a scandal!  A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)

I can't understand the choice of scale for the score axis. The orange line, for instance, seems to have a positive slope. In any case, since these scores are "scaled", and the "standard error" is about 1 (this number is surprisingly hard to find, even on Google), it would appear that between 300 and 400 on the score axis, there are 100 units of standard error. By convention, three units of standard error away from the average is considered rare (events). There is no conceivable way that the average score could jump by that much.


 The analysis is also flawed. Here's the key paragraph:

Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries... For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.

This argument contains several statistical fallacies:

  • Comparing apples and oranges: a glaring piece of missing information is whether other countries have increased their per-student spending on education, and if so, how fast the growth is compared to that in the U.S. Without this, the analysis makes no sense.
  • Confusing correlation and causation: so spending increased while test scores stagnated.  In order to conclude that there is something wrong with the spending, one must first believe that spending has a causal effect on test scores. Observe that this is not a conclusion from the data; it is an assumption going into the analysis, neither supported nor disputed by the data since the data merely show a (lack of) correlation. This is another instance of "story time": we see data, we see conclusion, we are misled into thinking that data supports conclusion but in fact, the data is an irrelevant distraction. (For other instances of "story time", see this link to my book blog.)
  • Fallacy #1 and fallacy#2 combined: even if you believe that spending affects test scores, it is still a stretch to say that spending in U.S. schools affects the gap in test scores between U.S. students and foreign students. In the world where foreign countries are frozen in time, maybe so but where foreign countries are investing in education, one can't say anything about the test score gap without first knowing what's going on overseas.
  • Assumption invalidating the analysis: In a short breath, the analyst admits the possibility of (a) spending increase together with flat scores and (b) score increase together with flat spending. One model under which both of those possibilities coexist is one in which test scores are independent of spending. If so, why would one even look at a plot of these two quantities?
  • The dilemma of being together (a la Chapter 3 of Numbers Rule Your World): sorry to say but the spending on pupils is likely to have a highly skewed distribution depending on school district. Also, the average test scores is likely to have high variability across school districts. Thus, using an average for the entire country muddies the water.
  • Needless to say, test scores are a poor measure of the quality of education, especially in light of the frequent discovery of large-scale coordinated cheating by principals and teachers driven by perverse incentives of the high-stakes testing movement.

 In the same article, Gates asserts that quality of teaching is the greatest decisive factor explaining student achievement. Which study proves that we are not told. How one can measure such an intangible quantity as "excellent teaching" we are not told. How student achievement is defined, well, you guessed it, we are not told.

It's great that the Gates Foundation supports investment in education. Apparently they need some statistical expertise so that they don't waste more money on unproductive projects based on innumerate analyses.



Feed You can follow this conversation by subscribing to the comment feed for this post.


Just checking: are these dollars inflation-adjusted?


This feels like lazy criticism. There is no reason to suspect "coordinated cheating" because the NAEP tests are not "high-stakes" and not tied to school or state funding in any way. In fact, NAEP results are not even released at the school level. And they are actually fairly well designed and are pretty good metrics in my opinion.


Chris: You're picking on one of my seven or eight points and putting words in my mouth. My comment on cheating in standardized testing does not mention NAEP. It is well known in educational circles that "test scores" do not measure "quality of education"; it may measure one aspect of it but to equate the two is folly.

PP: I don't know whether it's inflation adjusted or not, probably not.


Wasn't the whole point with this graph to show that spending and test scores are have varied independently, i.e. are not correlated? Then I guess it is a matter of power of the test of correlation and how much a change of test scores is considered "large enough" to talk about. Now, we have to take their word for it that there is no correlation between math scores and spending (although it looks like there might be). Am I missing something?

I agree on the points of the weird scale of test scores and the comparisons to other countries though.

Jackie Conrad

I can see two possible talking points here:
1. The cost of K-12 education has increased to keep up with the cost of HEALTH INSURANCE in the US. In Europe, socialism has not driven the cost of salaries+fringes up since there is universal health care.
2. Schools have done well to have a slight upward trend in NAEP especially when you consider the INCREASE of students living in poverty.



I'm pretty sure health care costs have gone up in the socialist countries you mention (like my country, Canada) as part of the costs are related to increases in obesity, cancer, and an aging population. However, I'm sure that our increases have been no where near as bad as the total increases in the US.


Yes, the charts are inflation adjusted. They aren't adjusted for land prices which have risen well over inflation for decades, or for health care costs as indicated by Jackie, or by rising costs of supplies (ie computers in schools).


This blog article provides some nice insights on some of the fundamental statistical problems of the graph and claims/inferences made about it by Gates. provides some further critique of Gates interpretation.

Stephen O. Jambor, Ph.D.

AND (in addition to the statistical methods questions), there is still the more fundamental question regarding "validity" (both construct & criterion).
It has not been demonstrated that these high stakes tests (of achievement) are in fact not behaving as though they were "masked measures of intelligence".
Until/unless it can be shown that these alleged achievement tests are in fact assessing something above the givens (IQ), we are chasing after student-level factors, something that is not entirely within the school's grasp to control!


Gates has made many of these misleading claims before and been debunked before:

The comments to this entry are closed.