« April 2012 | Main | June 2012 »

Nice example of histograms

The New York Times (link) uses two histograms to show us the geographical distribution of college graduates today compared to 1970. The histograms clearly and forcefully demonstrate two points: the almost three-fold increase in the concentration of college graduates in metropolitan areas, and the wider spread in geographical preference. In other words, we find that the shape of the distribution (in particular, the width) and the mid-point of the distribution have both shifted in those decades.


Readers must be careful about interpreting the colors, which are keyed to relative scales. Every single orange square on the right chart represent a higher percentage of college graduates than the single orange square on the left... this is because of the massive increase in the number of adults with college degrees over this period of time.

I'd suggest two small improvements. Arranging the histograms vertially makes a huge difference:

On the maps, I'd get rid of the gray dots. The point of the maps is to show where the graduates are flocking to and where they are not favoring. The gray dots on the other hand serve mainly as a geographical lesson of where the metropolitan areas are on the U.S. map.

Spring flowers and striking hours

Reader Joe DiNoto sent me to the following National Post (Canada) chart via Twitter, complaining about the circles. (The full chart is found here.)


This chart is supposed to show that the students in Quebec are wrong to go on strike against a roughly 10% increase in tuition fees because the cost of education in Quebec is dwarfed by those in other provinces. This particular message is visible by virtue of the small amount of space occupied by the Quebec "flower" relative to other provinces.

However, to convey that message would require only a chart of the average tuition of the seven provinces. The dataset here contains a lot more information than just the average: it has the tuition by major. But, does the general pattern of relative tuitions apply to individual majors? This chart type (a disguised bubble chart) does the reader few favors. (At least, the designer managed to keep each "petal" at the same angles; otherwise it would make our lives even harder.)


In order to bring out the tuition by major comparison, the following set of dot plots helps:


The purple dots are Quebec tuitions. The gray dots are the remaining provinces. We find that Quebec is at the bottom of the cost scale for every major. We also learn that the variance of tuition for dentistry, medicine, and law is very high. Surprisingly, the business degree is rather cheap - maybe the demand for it up north is lower?

No sorting and lack of structure undermine a chart


Reader Daniel L. isn't impressed with this page of charts about gay rights in the U.S., from the Guardian paper (London). (link)


The use of circles to organize data has a long history, stretching back at least to the Nightingale rose, which turns the time dimension into a circle. Andrew doesn't like this concept (e.g. here), neither do I.  Here is something similar by McCandless (link) that has appeared on this blog.


Take the following set of charts showing the legislative differences by region of the country.


Since states within region are categories with no order, there is no easy way to order the states. This is made worse by the categorical nature of the other variable: the legislative posture on marriage, civil union and domestic partnership is very messy data with no order either.

The regions can be sorted reasonably by the "average" permissiveness but this chart shows no concern over sorting at all.

About the only easy read from this set of charts is the observation that the Northeast states are most permissive while the Southeast statements are most restrictive. Anyone who has casual exposure to this social issue knows this without needing a chart.


The key to clarifying this chart is to clarify the underlying structure, particularly the structure of the permissiveness variable. Dissecting the data reveals that there are only five possible postures (Banned all three rights, Banned marriage but allows one of the other rights, Allow civil unions, Allow marriage, and No information).  The following data table conveys the data with minimal fuss:




Look what I found: two amazing charts

While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link).

How embarrassing is the cost effectiveness of U.S. health care spending?


When a chart is executed well, no further words are necessary.

I'd only add that the other countries depicted are "wealthy nations".


Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. An important point to note is that the U.S. started out in 1970 similar to the other nations.


Let's appreciate this beauty:

  • Let the data speak for itself. Time goes from bottom left to upper right. As more money is spent, life expectancy goes up. However, the slope of the line is much smaller for the US than the other countries. There is no need to add colors, data labels, interactivity, animation, etc.
  • Recognize what's important, what's not. The US line is in a different color, much thicker and properly made the foreground of the chart.
  • Rather than clutter up the chart, the other 19 lines are anonymized. They all have the same color and thickness, and all given one aggregate label. This is an example of overcoming loss aversion (see this post for more): it is ok to suppress some of the data.
  • The axis labeling is superb. Tufte preaches this clean style. There is no need to use regularly-spaced axis labels... use data-informed labels. Unfortunately, software is way behind on this issue. You can do this in R but that's about it.