Spring flowers and striking hours
Poking at the data behind a chart

Nice example of histograms

The New York Times (link) uses two histograms to show us the geographical distribution of college graduates today compared to 1970. The histograms clearly and forcefully demonstrate two points: the almost three-fold increase in the concentration of college graduates in metropolitan areas, and the wider spread in geographical preference. In other words, we find that the shape of the distribution (in particular, the width) and the mid-point of the distribution have both shifted in those decades.


Readers must be careful about interpreting the colors, which are keyed to relative scales. Every single orange square on the right chart represent a higher percentage of college graduates than the single orange square on the left... this is because of the massive increase in the number of adults with college degrees over this period of time.

I'd suggest two small improvements. Arranging the histograms vertially makes a huge difference:

On the maps, I'd get rid of the gray dots. The point of the maps is to show where the graduates are flocking to and where they are not favoring. The gray dots on the other hand serve mainly as a geographical lesson of where the metropolitan areas are on the U.S. map.


Feed You can follow this conversation by subscribing to the comment feed for this post.


The gray boxes do clutter up the maps, but I would still keep them. It lets me see that the metro area nearest to me has declined, relative to the other metro areas. Without the boxes information is lost because I can't see this from the histograms. Overall, a very nice graphic.


I also like that you also dropped the number of metro areas within 5% of the mean on the two graphs in your version. With a growth of college degrees from 12% to 32% I would expect to see the range in the percentage of people with college degrees to change as well.

Craig Wong

I'm not sure I agree with what the author is trying to imply. The three-fold increase in metro areas, sure, but the wider spread of geographical preference? I'm not buying it.

For the 1970 data, the 5pt spread represents almost a 42% change from the average. In 2010, the same 5pt spread represents only a 15% change from the average. If we were to apply a 42% threshold for orange/black squares on the 2010 data, it would be +/- 13 points, making the distribution reasonably similar to 1970.

The comments to this entry are closed.