R power, math stats power
A Harvard mess 2

A Harvard mess 1

There are some innocent-looking charts that throw up more and more problems, the more you look at it.  This example comes from a magazine sent to Harvard alumni.  We have all heard that their endowment fund suffered some horrific losses in the last few months and so the magazine editor thought it useful to describe the potential impact on the different departments.

It's a safe bet our readers would not think to present two related data series as a combination of one bar chart and one column chart. 

As the chart stands, the intended message is completely lost.  It takes a bit of fishing to learn that the Radcliffe Institute has a tiny stake in the endowment fund but supports over 80% of its operating funds from the endowment.

Now looking a little deeper, we find that the scales of the two charts are not standardized so the length of a bar and the length of a column cannot be directly compared.  Nor can the grid-lines as each section accounts for 10% in the bar chart but 20% in the column chart (to make it worse, the larger section represents the smaller percentage!)

Looking further still, we find that "Other" accounts for some 15% of the endowment but apparently consists of entities that do not have operating funds and thus goes missing in the column chart.  In our versions below, we will ignore the "Other" category completely; this is equivalent to assuming that we have allocated to the individual schools a proportional share of "Other"'s share.

Not to mention arranging the schools by alphabetical order.

Much of this mess became apparent when we put the two charts into a uniform setting, like this:


A scatter plot provides good information, especially if there is clustering  although we can debate whether it is fit for general publication.


More in our next post.

Reference: "The Endowment: Each school's stake", Harvard Magazine, Jan-Feb 2009.

PS.  The initial post switched the axis labels on the two bar charts. Thanks to Jon for pointing this out.


Feed You can follow this conversation by subscribing to the comment feed for this post.


Looks like you've swapped the data from the two original charts in your bar chart reinterpretations.

I don't see why having a common scale is important here, since the two charts show qualitatively different kinds of information: one chart measures % of the whole, while the other has a metric that is independent for each school. That this metric is also expressed as a percentage seems beside the point to me; using a common scale suggests a relationship where there isn't one.

Also, it might read better to sort by % of operations funded by endowment, since there's more interesting variation in that series. Or alternatively, the % of total endowment chart could be expressed in endowment dollars, allowing us to use a log axis to fit the wide range of data.

Jon Peltier

A very good first cut at straightening out the mess. I didn't even notice the bar chart mislabeling, because I went right to the XY chart.


Jon - thanks for noticing the mislabeling. I have fixed the problem now.

Agree with your point about the scale and the metrics. That's why I put up a second post to talk about their choice of metrics.

Sorting is always an issue when two data series are displayed. There can only be one order and you can only make half the people happy.

The comments to this entry are closed.