A second lease on life
Comment on a comment

Serving donuts

David Leonhardt's article on the graduation rates of public universities caught my attention for both graphical and statistical reasons.

Nyt_gradrate David gave a partial review of a new book "Crossing The Finish Line", focusing on their conclusion that public universities must improve their 4-year graduation rates in order for education in the U.S. to achieve progress.  This conclusion was arrived at through statistical analysis of detailed longitudinal data (collected since 1999).

This chart is used to illustrate this conclusion.  We will come to the graphical offering later but first I want to fill in some details omitted from David's article by walking through how a statistician would look at this matter, what it means by "controlling for" something.

The question at hand is whether public universities, especially less selective ones, have "caused" students to lag behind in graduation rate.  A first-order analysis would immediately find that the overall graduation rate at less selective public universities to be lower, about 20% lower, than at more selective public universities.  

A doubter appears, and suggests that less selective schools are saddled with lower-ability students, and that would be the "cause" of lower graduation rates, as opposed to anything the schools actually do to students.  Not so fast, the statistician now disaggregates the data and look at the graduation rates within subgroups of students with comparable ability (in this instance, the researchers used GPA and SAT scores as indicators of ability).  This is known as "controlling for the ability level".  The data now shows that at every ability level, the same gap of about 20% exists: about 20% fewer students graduate at the less selective colleges than at the more selective ones.  This eliminates the mix of abilities as a viable "cause" of lower graduation rates.

The researchers now conclude that conditions of the schools (I think they blame the administrators) "caused" the lower graduation rates.  Note, however, that this does not preclude factors other than mix of abilities and school conditions from being the real "cause" of lower graduation rates.  But as far as this analysis goes, it sounds pretty convincing to me.

That is, if I ignore the fact that graduation rates are really artifacts of how much the administrators want to graduate students.  As the book review article pointed out, at the less selective colleges, they may want to reduce graduation rates in order to save money since juniors and seniors are more expensive to support due to smaller class sizes and so on.  On the other hand, the most selective colleges have an incentive to maintain a near-perfect graduation rates since the US News and other organizations typically use this metric in their rankings -- if you were the administrator, what would you do?  (You didn't hear it from here.)

Back to the chart, or shall we say the delivery of 16 donuts?

First, it fails the self-sufficiency principle.  If we remove the graphical bits, nothing much is lost from the chart.  Both are equally impenetrable.

A far better alternative is shown below, using a type of profile chart.


Finally, I must mention that in this particular case, there is no need to draw all four lines.  Since the finding of a 20% gap essentially holds for all subgroups, no information is lost by collapsing the subgroups and reporting the average line instead (with a note explaining that the same effect affected every subgroup).  

By the way, that is the difference between the statistical grapher - who is always looking to simplify the data - and the information grapher - who is aiming for fidelity. 

Reference: "Colleges are lagging in graduation rates", New York Times, Sept 9, 2009; "Book review: (Not) Crossing the Finish Line", Inside Higher Education, Sept 9 2009.


Feed You can follow this conversation by subscribing to the comment feed for this post.


Very well-written post. ALthough I did have a clue about what "controlled for" ment, you have explained in a very clear way within the context of this statistics. Thanks!

Jon Peltier

Well done. When I saw the original graphic, I was thinking of ways to slice the data differently.

The only thing left out of each analysis is what percentage of each level of student is present at each class of college.


This blog makes frequent claims of self-sufficiency, so for this post I tried a little experiment. Without reading the Times article or graphic, and without reading the blog post, I went straight to the Junk Charts redesigned chart to see if I could decipher it.

The Junk Chart plots "College Selectivity" against "Ability Levels", each on an apparently arbitrary 1-4 scale. I assumed ability levels referred to college students, but could not figure out what the chart was trying to show. I first guessed admission or application rates, but the percentages seemed much too high. Something to do with college enrollment, maybe? Or maybe ability has something to do with physical ability versus disability?

Finally I gave up and looked at the Times graphic. Only then did I understand what the Junk Chart version was trying to show.

Does the Times graphic pass a real-world self-sufficiency test? Yes, it explains a coherent set of data, and the viewer does not need to look elsewhere or read anything else to understand it. Does the Junk Charts version pass the same self-sufficiency test? No, the chart doesn't even have a title, so even after several minutes of staring at it I was unable to understand what it was trying to show.

This is an example of a recurring pattern at Junk Charts: visual displays of information that are only legible if you read the accompanying blog post. The chart claims self-sufficiency but is meaningless without a long written explanation telling readers what it shows and how good it is.

I fail to see how an incomplete and illegible redesign can be seriously considered an improvement over the original graphic. The Junk Charts version is ugly and incompetently produced, with confusing and arbitrary labels, and as an information display it cannot stand alone. If the Junk Chart version actually ran in the paper it would look like a mistake and completely confuse readers. By any measure it is a failure.

You should try making a complete graphic sometime, one that is truly self-sufficient and legible without 600 words of annotation. These hypothetical redesigns are an embarrassment.


All the reworked graph needs is a bit of work to make it apparent that the four lines represent different abilities. Maybe just coloured lines and a clearer legend.

A would try a clustered bar chart. Some people don't like them, but here it would probably work well.

Jon Peltier

All Kaiser's chart requires is changing "Ability Levels" to "Student Ability Levels" and a label indicating that the vertical axis measures Graduation Rates.

Should he have included these labels? Sure. Is their omission cause for panning the entire blog? Come on.

John Munoz

Hi Kaiser,

You're doing great work here, keep it up!

It takes only minutes to put together a bad chart, but it can take hours, even days to put together a great chart. The Times' doughnuts weren't in keeping with their typically excellent visualizations and you rightly pointed that out and offered an alternative.

Your critiques of most charts are spot on, generally helpful, and written in a positive tone, which I find more productive than taking someone to task for a pie chart. It's so easy to bash the bad stuff (and there's no shortage of junkcharts out there), but it takes real fortitude to show people how to turn trash into treasure.

-John Munoz

Michael MacAskill

J, it seems your entire critique is could be summed up by "the y axis on this graph is missing a title". That would have been a fair comment. But from that one error, you trash the entire blog?

I second John Munoz' comment above. Keep up the good work. It clearly takes a lot of time and effort to produce each post, but very little to launch an unreasonable (and anonymous) attack.

Michael MacAskill
Regular reader

Dave Nash

This is a great blog full of thoughtful analysis. Good graphics are hard work and these discussions, whether I agree or not, are a tremendous help to my thinking.


I agree that these particular small multiples were not effective, but I am concerned about the use of the continuous line to connect the categories. I dont' feel that these types of lines should be used unless we have not only strict continuity, but dependence, meaning the same system is responding as such, and not a collection of independent ones, in this case, thousands of independent students. You might have stated that the lines are curve estimations, but they aren't really,
points are just being connected. I feel that the average viewer will miss that point. There is no question that the re-interpretation is effective, but is it true to the data? Ultimately a language needs to be developed to handle these kinds of issues. At the moment we can only do our best


Great blog full of thoughtful analysis. Good graphics are hard work and these discussions.


ManData: Your point is brought up every time I do this sort of plot. There are many people who agree with you in reserving line charts for continuous data. I don't subscribe to this custom... I treat these as "profile plots" and the name gives the intended purpose, which is to compare the profile of different groups, with profile taking a very general meaning. It's certainly not natural but I would love to see more usage. Only through use will readers learn about such charts!

H. Simpson

Donuts: is there anything they can't do?
-Homer Simpson


Regarding showing of each subgroup vs. the whole in the final chart with an explanation, I think that showing each segment (I suppose you could think of them as "ability quartiles", although the number of students in each group is not noted) adds to the strength of the conclusion. This roughly follows the Tufte-esque principle of increasing the detail to clarify.

Markin Ambuh

I read three times but still cannot understand. This donut thing is complicated.

The comments to this entry are closed.