David Leonhardt's article on the graduation rates of public universities caught my attention for both graphical and statistical reasons.
David gave a partial review of a new book "Crossing The Finish Line", focusing on their conclusion that public universities must improve their 4-year graduation rates in order for education in the U.S. to achieve progress. This conclusion was arrived at through statistical analysis of detailed longitudinal data (collected since 1999).
This chart is used to illustrate this conclusion. We will come to the graphical offering later but first I want to fill in some details omitted from David's article by walking through how a statistician would look at this matter, what it means by "controlling for" something.
The question at hand is whether public universities, especially less selective ones, have "caused" students to lag behind in graduation rate. A first-order analysis would immediately find that the overall graduation rate at less selective public universities to be lower, about 20% lower, than at more selective public universities.
A doubter appears, and suggests that less selective schools are saddled with lower-ability students, and that would be the "cause" of lower graduation rates, as opposed to anything the schools actually do to students. Not so fast, the statistician now disaggregates the data and look at the graduation rates within subgroups of students with comparable ability (in this instance, the researchers used GPA and SAT scores as indicators of ability). This is known as "controlling for the ability level". The data now shows that at every ability level, the same gap of about 20% exists: about 20% fewer students graduate at the less selective colleges than at the more selective ones. This eliminates the mix of abilities as a viable "cause" of lower graduation rates.
The researchers now conclude that conditions of the schools (I think they blame the administrators) "caused" the lower graduation rates. Note, however, that this does not preclude factors other than mix of abilities and school conditions from being the real "cause" of lower graduation rates. But as far as this analysis goes, it sounds pretty convincing to me.
That is, if I ignore the fact that graduation rates are really artifacts of how much the administrators want to graduate students. As the book review article pointed out, at the less selective colleges, they may want to reduce graduation rates in order to save money since juniors and seniors are more expensive to support due to smaller class sizes and so on. On the other hand, the most selective colleges have an incentive to maintain a near-perfect graduation rates since the US News and other organizations typically use this metric in their rankings -- if you were the administrator, what would you do? (You didn't hear it from here.)
Back to the chart, or shall we say the delivery of 16 donuts?
First, it fails the self-sufficiency principle. If we remove the graphical bits, nothing much is lost from the chart. Both are equally impenetrable.
A far better alternative is shown below, using a type of profile chart.
Finally, I must mention that in this particular case, there is no need to draw all four lines. Since the finding of a 20% gap essentially holds for all subgroups, no information is lost by collapsing the subgroups and reporting the average line instead (with a note explaining that the same effect affected every subgroup).
By the way, that is the difference between the statistical grapher - who is always looking to simplify the data - and the information grapher - who is aiming for fidelity.
Reference: "Colleges are lagging in graduation rates", New York Times, Sept 9, 2009; "Book review: (Not) Crossing the Finish Line", Inside Higher Education, Sept 9 2009.
Right on the heels of the disastrous bubble chart comes another, courtesy of the NYT Magazine. Bubble charts are okay for the conceptual ("this is really big, and that is really tiny"). This chart wants readers to compare the sizes of the bubbles, which highlights the worst part of such graphs.
Poor scaling is the huge issue with bubble charts. They are the prototype of what I call not "self-sufficient" charts. Without printing all the data, the chart is unscaled, and thus useless (see below middle). When all the data is printed (as in the original, below left), it is no better than a data table.
In the above right chart, we simulated the situation of a bar or column chart, i.e. we provide a scale. For this chart, the convenient "tick marks" are at 10, 20, 34, 41. Unfortunately, this scaled version also fails to amuse.
Note further that the data should have been presented in two sections: the party affiliation analysis and the gender analysis. Also, it is customary to place "Independents" between "Republicans" and "Democrats" because they are middle-of-the-road.
A profile chart is an attractive way to show this data. Here, we quickly learn a couple of things obscured in the bubble chart.
On the issue of abortion, Independents are much closer to Democrats than Republicans. Also, there is barely any difference between the genders, the only difference being the strength of support among those who want to legalize.
Reference: "A matter of Choice", New York Times Magazine, Oct 19 2008.
PS. Based on RichmondTom's suggestion, here are the cumulative profile charts.
Jens, a long-time reader, tried to re-make the boring data tables used to report poll data. Here is an example from USA Election Polls (left) and his enhanced version (right).
Like Jens, I find most of the tabular presentation of poll data underwhelming. Too much data hiding all the useful information. For example, the pollster and polling date data provide a context for super-serious poll watchers to interpret the data; however, they do not present themselves in a way that actually help readers. Read further for versions that bring out this data much better.
Meanwhile, Jens' revision uses color and ordering to bring out the current state of affairs. The addition of electoral votes allows us to understand the relative weight of each row, countering the weakness of the tabular format, that each row has the same height, implying erroneously that they have the same importance.
There are a number of good web-sites where this type of data is presented in attractive ways.
I have been a fan of Political Arithmetik, which made great use of the pollster and polling date data mentioned above. Those data have been averaged to show the overall trend while the individual poll results are plotted as dots in the background. The polling date data is embedded in the horizontal positions of the dots. Even more impressively, the margins of error are presented. Remarkably, this race has been a statistical tie for all these months, the 95% lower limit never quite making it above the zero level.
Another great site is fivethirtyeight.com. Below, they essentially turned Jen's enhanced table into a map. The legend on the right perhaps represents what they call "East Coast bias"? All of Nathan's graphs are very attractively produced; I just wish he'd put more labels on them (such as the differentials corresponding to shades of red and blue.)
Frederic M. sent in this chart, together with his commentary.
Bubbles across rows have vastly different numbers but their circles are
of identical size (or vice versa). It borders on the ridiculous that all
bubbles of the US
row have the same size... The question if teenage birth rates and teen sex are
correlated cannot be eye-balled with this kind of display. The fact that you
cannot compare across rows make this an instance of “chart junk”.
White spaces -- always dangerous. Does lack of bubble imply no data or no abortions/sex?
Sorting -- this is what Howard Wainer called "Arizona first" with a twist (United States)
Loss aversion -- would U.S. readers be resentful if countries like Iceland are excluded? A much reduced version comparing U.S. to say Canada, U.K, Japan and Germany may yield more information for the reader.
Sufficiency -- if all the data are printed as in a table, why do we need the bubbles?
Reference: "Let's Talk About Sex ", New York Times, Sep 6 2008.
Andrew N., a reader from Australia, wasn't too impressed with the way National Nine News presents the Olympic medal table on its home page. To the extent that we want to venture beyond the typical tabular presentation, this bar chart is in fact quite appropriate. Let me explain.
Lets take a tour around the world. It's the battle of the data tables.
The Boston Globe's is the cleanest of the bunch. I especially like the way they set up the USA count at the top; the use of country codes is inferior to spelling out country names, as done in all of the other examples. The New York Times is the only one to utilize colors to set aside gold, silver and bronze, which lets readers easily assess the two dominant metrics, total golds and total medals. A small touch but very nice.
The biggest design issue here is the existence of the two different metrics. In any tabular presentation, the countries can be ranked by only one metric so the designer must make a choice. The American papers present ranking by total medals; the French paper by total golds; the two Canadian ones shown here are split. The American papers also choose to carry the ranking implicitly while the others explicitly provide a numerical rank. Le Monde and Globe and Mail provide ranks that are consistent with ordering of countries, both by total golds. The Star, by contrast, wants it both ways: the order reflects total medals while the "POS" column shows total golds. This extra column does help the readers who prefer ranking by golds but the primacy of the other ranking has not been overcome.
So what about National Nine News? I have not been a fan of stacked bar charts but surprisingly, this is a great application. Stacked bars have the disadvantage that the stacked segments don't share the same base and thus it is difficult to compare their lengths. Here, though, our two metrics are total medals and total golds so readers should be drawn to compare the total lengths, and the lengths of the first segments. Those wanting to compare silvers and bronzes must make a stronger effort but they will be in the minority.
What can be improved are the distracting data labels, especially the gold circles. Instead, one should provide a scale, or use symbols such as one circle per medal. (See this old post.) Here is a version with a scale:
One cannot end this post without mentioning the attempt by NYT editors to insert levity into these proceedings with first a cartogram and then a bubble chart.
Avinash has an interesting piece about some examples of visualization of Web data. That's a very rich area since there is so much data. I agree with his observation that there are precious few truly great charts that have thus far appeared. (Note, though, that typically the more data, the more noise. See this post.)
He discussed a tag cloud display of the top cities from which website visitors hail. We like tag clouds too. See here, here and here.
He praised a particular pie chart because "the pie ... is just a stage prop". It worked because all the data was printed on the chart itself. This violates our self-sufficiency principle: if all the data is printed on the chart, and the only way to read it is to look at the data, then the chart serves no purpose. More here.
He liked the Amazon's feature of customer ratings distributions. Me too. A powerful example of small graphics that make a huge impact. Here is the typical Web rating display: Almost everyone uses the statistical average. This hides information about how dispersed (or not) customer's reactions were. The current Amazon display gives us this information: Notice that 108 customers actually gave this book the lowest rating even though the average was four stars.
The most intriguing example was Google's comparison of keyword performance to the site average. It's a good idea but the execution is wanting.
Firstly, I believe the percentages are much better presented as index values, with 100 being the site average. Secondly, it is unnerving to have red associated with positive values, green with negative values, or to have negative values on the right of positive values. I think they realize green and to the right should represent "good" (bounce rate of visitors lower than average) but this just doesn't work.Thirdly, are the data labels really necessary? they impede our sight lines when comparing bars. And do we need to know to two decimal places?
PS. Apologies for the inconsistent font. Typepad continues its mischief: I couldn't change the font size after adding a hyperlink. Apparently I have to fix the font size before adding a link. You also might notice the changing font size as I write this paragraph. Don't know why there was a switch; I didn't ask for it.
Sean C. asked a question that has been on my mind for some time: what do we think of treemaps? Are they too busy? Do they add to our understanding of the data?
Generally, this type of chart is pretty good for exploration but not so for communication. We can stare at the chart however long and still it isn't clear what the designer wanted to convey.
Sean used it to show the components of Australia's CPI and to explain the source of recent inflation.
It does present the hierarchical structure of the data in a compact way. It also provides information on both the relative importance of each factor, together with its growth rate, with little fuss.
That said, the differing sizes and orientations of the boxes makes it hard to compare their sizes. For example, is "home purchase" on the lower left larger than "financial services" in the middle? How about household services (blue on the top) relative to audio, visual and computing (blue on the lower right)?
Also note that the two data series do not carry equal weight: readers are likely drawn more by the box sizes than by color gradations (which do not convey relative values well); as a result, the composition of the CPI rather than the changes in the components will gain more attention.
If the purpose of the chart is to communicate findings, then a data table enhanced with colors and boxes can do a good job. There are other ways to utilize the tabular format, such as sparklines or other symbols in lieu of numbers.
That said, the treemap is more intriguing and inviting than a table of numbers.
We recently showed an example of when data tables worked well to clarify the data. Last week, there was an example from the Times which did the opposite.
The accompanying article boldly claimed that
the 40-yard dash stands above them all as having the strongest correlation to success in the NFL. The three-cone drill, the shuttle run, the bench press -- none correlate to NFL success. The 40 is king.
Further, it cited Bill Barnwell from FootballOutsiders.com who created an "index" using both 40 time and body weight that is "an even better predictor than 40 time alone". In other words, this formula
does the trick.
The data table, shown above, presumably clinched the case.
We were mystified when we put the data to the test, however. Among the set of 15 running backs, the Index did not predict the Yards Per Carry at all! The Index explained only 8% of the variation in Yards Per Carry between the backs.
The data table obscures this bivariate relationship. As it was sorted by the Index, we would look for the column showing Yards Per Carry to be naturally sorted in the same order. But it is hard to tell the trend from the noise in a table.
What went wrong? It turned out neither 40 Time nor Body Weight had any relationship with Yards Per Carry.
These variables did not explain the range of Yards Per Carry attained by this set of running backs.
Finally, we found strong correlation between 40 Time and Body Weight. (The heavier you are, the slower you run!) This meant that both variables contained similar information and some unlikely formula involving the two would be unlikely to perform significantly better than each variable alone.
So we are left to turn the table on the table. More pertinent evidence is needed to prove the case.
The entire analysis suffers from survivorship bias as only the top
running backs are examined, and no adjustment is made to deal with
wide-ranging tenures. Apparently, there is more data available in a book. There is no indication of how the model shown above was validated.
Reference: "The Race of Truth: 40-Yard Times Can Tell the Future", New York Times, April 27, 2008.
One of the things I picked up from Tufte is the horrible habit of counting the amount of data on a chart. This is part of the info gathering to estimate the data-ink ratio (amount of data divided by the amount of ink used to depict them).
Leon B, a reader, left this in my inbox, months ago it turned out. This is the British government's way of informing people how energy-efficient their homes are. As Leon said:
these charts might be a great example of governments going overboard with colours, bars, letters and numbers and lines for something that really only has four data points.
In addition, I find the use of two different scales to be confusing and unnecessary. If it is decided that scores in a particular range can be grouped as A, B, ..., G, then the original scale should be discarded. 52 is E and 70 is C. (This is especially so since the score ranges are not intuitive, like 69-80 = C ?!)
Even worse, what's the point of citing the 0-100 scale without explaining what is the metric?
A table presentation does a far better job in a fraction of the space:
PS. This post set off a torrent of emotions (see the comments). Another version that I discarded was the simplest table possible. In my view, there is still way too much distracting "junk" in the original design. No one has yet explained why the 0-100 scale should be emphasized, or what it means!
A couple of you noticed this table of bubbles in the Times, and asked what I think of it. Dustin J suggested that this could be considered a decent application of bubble charts. I agree, with some reservations.
The data set is the best thing about this chart. The riches that lay beneath! Many questions can be addressed, including:
Which Presidential candidates are getting the most face time?
Are candidates seen equally often across the stations?
Are there differences between network and cable stations in terms of total face time? In terms of individual face time?
Are there Democratic/Republican leanings by station? by type of station?
The intrepid can even build a regression out of it.
The bubble chart contains answers to all those questions but nothing jumps out. Okay, it's easy to see the station that gives each candidate the most face time. Anything else requires moderate to a lot of effort. Here's the junkart version.
The list of things done to the data is long:
Candidates are grouped together by party
Candidates within each party are arranged in order of decreasing maximum face time
Stations are arranged by increasing total face time, this order happens to retain the network vs cable divide
A heat map construct is used instead of bubbles: the legend is missing but there are four hues for each color: darkest = top 10%; medium = 10th - 50th percentile; light = bottom 50th percentile excepting zeroes; white = no face time. In raw numbers, 90th percentile = 81 minutes, 50th percentile = 19 minutes.
The only data shown are the totals by candidate and totals by station.
On the right margin are little bar charts that show the distribution of network/cable for each candidate.
On the bottom margin are little column charts showing the distribution of party affiliation by station.
A few observations follow:
Cable stations gave much more face time to the candidates in general. Fox, no surprise, gives Republicans 85% of its time while all the others were roughly equal.
The more mainstream the candidate, the balanced was the time spent on networks versus cable. John McCain (R), Hillary Clinton (D) and John Edwards (D) had the highest proportion of network time.
More time is not necessarily good since McCain was the clear winner but his campaign is struggling
Source: "Tracking Face Time", New York Times, August 1, 2007.