Note: I am in the middle of a holiday and so posting will be limited.
Andrew posted a pretty chart that caught my attention. This is the sort of sophisticated chart that rewards careful reading.
Below is a guide to reading the chart:
From Bernard L., another exemplary effort by the Times. This one really got me excited.
Jon's comment on the previous post pretty much anticipated this post. The prior post concentrated on graphical matters. However, the biggest issue with that chart is the choice of metrics. If the idea is to explore the potential adverse effect of a sharp decline in endowment investment performance, then it is not clear why one should be comparing the proportion of endowment funds and the proportion of operating revenues paid for by endowment funds. A missing element from these two series is the relative size of the budgets of these different departments.
The next chart shows the proportion of each school's operating costs accounted for by endowment funds together with the total size of its operating costs.
We can turn the ratio around and directly compute how much of the total amount of endowment funds distributed for operating costs is accounted for by each school. This is really the simplest metric that gets to the question.
There are really two possible worries: for the School of Arts and Sciences, who pays for just about $1 billion costs with endowment funds, any significant reduction in distribution will leave a gaping hole; for a department like Radcliffe that pays for over 80% of its operating budget out of endowment funds, obviously a reduction in distribution can cause problems but we are talking about a base of $18 million rather than $1 billion.
Reference: Harvard University Financial Report, 2008; Harvard Fact Book 2007
PS. The first source did not contain any data on operating budgets so the first set of graphs (now replaced) did not show what I intended to show. The new ones used the right data and had the right order of magnitude in terms of budgets ranging from millions to 1 billion. The 2008 data are not available as of yet.
Chris said: "his graph breaks a few rules, but it has a clear message".
The shocking, out of the box column certainly grabs attention, and it is probably true that football coaches earn too much money. But the chart really falls down on this one issue:
What's the median salary of these football coaches?
Reference: "Academic salaries", PHD Comics.
Oftentimes, picking the right scale for a chart makes all the difference. The following chart showed up in the New York Times Magazine some time ago. Readers will immediately recognize this as "infotainment" rather than a serious attempt to convey the data.
The data came from a study by the Center on Education Policy which counted the amount of instruction time spent on various subjects at a sample of elementary schools in the U.S.
A simple bar chart would make a nice graphic, as shown on the right. Instead of sorting by decreasing minutes, we pulled out "lunch" and "recess" since they belong to a separate category.
Our main focus, though, is on the scale. The original report - and thus the original graphic - used minutes per week. We contend minutes per day (or even hours per day) to be more user-friendly. This is because any number makes sense only in comparison to other numbers. There is no easy reference to a number such as 500 minutes per week. However, being told it's 100 minutes per day (or 1 hr 40 min per day) means a lot because everyone knows there are 24 hours in a day.
This is a small example of a larger problem with using averages. The media loves to give out statistics like six people are dying of diabetes every minute (e.g. here). This is typically done by dividing the total number of diabetes-related deaths in a year by the number of minutes in a year. Why divide by total number of minutes in a year? The fallacy of such a calculation is evident if one applies this logic to natural deaths (since we all have to die some day). As the world population grows, there will just be more and more people dying every minute!
Choosing the appropriate reference point -- just like picking the right scale -- is the beginning of any good analysis.
Reference: New York Times magazine, April 27 2008; Center on Education Policy.
In celebrating the recent trend by "elite" colleges to lowering the cost of education, the Times printed this chart, the top part of which is shown here.
The three colors represent different levels of aid. Blue means "grants replace loans"; red means "free tuition"; yellow means "parents pay nothing". The colleges are grouped by the minimum qualifying income for the blue category.
The whole effect is of a knit. We shall call this the "knit chart".
I believe a simple data table will do the job nicely. If any reader has other ideas, please show us your work!
A few points to note about the original:
Reference: "The (Yes) Low Cost of Higher Ed", New York Times, April 20 2008.
PS. The original point about the "any income level" was incorrect as pointed out by Chris below. I have replaced that with a different issue.
PPS. Matias' version (see comments) is a superb demonstration of the power of data tables, well-applied. It is clean and simple, and addresses both the questions pointed out in the last bullet point. The only thing sacrificed was the visual representation of the relative size of the income requirements, which I agree is the least valuable part of the original. As usual, many thanks to our readers for coming up with great ideas!
First we wanted to process the triangles, dots and squares to make sense of this data. We noted that the data came from a single year (2005) so the chart did not trace the development of the education sector over time. But wait, it used a different route to get at the same idea. The author compared different generations within each country to see if more and more citizens took university degrees. So each vertical "arrow" was kind of a historical record of different generations within a country. Under this criterion, Korea and Japan had come a long way while the US and China stagnated.
The chart is quite impossible to read as designed. There is little reason to sort by 25-34-year-old proportion when the message concerns improvement over generations. Besides, what about countries that apparently retrogressed? (like Russia and Germany)
For this data, I returned to my favored bumps chart. Here is version one. There are two ways to read this chart: across countries, we note that most of the European states (blue) had similar profiles showing roughly a constant rate of growth. The Asian duo of Japan and Korea (brown) had the most marked growth. Of North America (black), Canada diverged from the US since the 35-44 generation.
Alternatively, we can focus on the change generation-over-generation. From 55-64 to 45-54, almost all countries in this sample (except Japan) grew at the same rate. Then between 45-54 and 35-44, the two Asian countries clearly set the pace. The generation between 35-44 and 25-34 is most interesting: Korea has not slowed, Japan has slowed a little but still grew as fast as Canada. A trio of European countries (Spain, Ireland, France) outpaced their neighbors.
Reader Nick B. sent in this example calling it "interesting". The chart tells a compelling story once we figure out what it is. Grasping the tree structure is key.
It illustrates the important idea that averaging sometimes masks variations in the data. For example, while the province of Guerrero scored 78% on literacy, the municipalities within Guerrero had scores ranging from 28% to 90%.
It also shows that the gender gap was larger in lesser Metlatonoc municipality than in more literate Cuautitian.
In addition, it tells us that while Mexico on average measured very well on literacy, subpopulations within Mexico spanned the world's best and worst (from about Mali's level to Italy's).
While I find this chart adequate, the pieces hanging off each other did not seem ideal, especially the two overlapping municipality pieces which were placed next to each other. However, it is tough to come up with an alternative. Here's one attempt; the changes are mild.
The branches are emphasized (as opposed to the "T" junction) because that's a key part of the story.
The national level, especially the span between Mali and Italy, is de-emphasized; I treat it as gridlines.
Instead of placing the overlapping pieces next to each other, I let the ranges literally overlap, which serves to stress this feature.
Happy New Year
The cosmos of university ranking got more interesting recently with the advent of the "brain map" by Wired magazine. This new league table counts the total number of winners of five prestigious international prizes (Nobel, Fields, Lasker, Turing, Gairdner) in the past 20 years (up to 2007); and the researcher found that almost all winners were affiliated with American institutions.
As discussed before, the map is a difficult graphical object; it acts like a controlling boss. In this brain map, the concentration of institutions in the North American land mass causes over-crowding, forcing the designer to insert guiding lines drawing our attention in myriad directions. These lines scatter the data asunder, interfering with the primary activity of comparing universities.
The chain of dots object cannot stand by itself without an implicit structure (e.g. rows of 10). This limitation was apparent in the hits and misses chart as well. Sticking fat fingers on paper to count dots is frustrating. Simple bars allow readers to compare relative strength with less effort.
In the junkart version, we ditched the map construct completely, retaining only the east-west axis. [For lack of space (and time), I omitted the US East Coast and Washington-St. Louis.] With this small multiples presentation, one can better contrast institutions.
To help comprehend the row structure, I inserted thin strikes to indicate zero awards. A limitation of the ranking method is also exposed: UC-SF has a strong medical school and not surprisingly, it has received a fair share of Nobel (medicine), Lasker and Gairdner prizes; but zero Lasker and Gairdner could be due to less competitive medical schools or none at all!
Reference: "Mapping Who's Winning the Most Prestigious Prizes in Science and Technology", Wired magazine, Nov 2007.