Last call
The tweeting crowd

Convention and function

Over at the Social Science Statistics blog, Deidre showcased a set of maps that show the evolution of income inequality (as measured by the Gini coefficient) by state in the United States over the last four decades.  This presentation reminds us of the CDC obesity maps, the only difference being how the CDC packaged the maps in a nice little animated jpeg movie. We loved the chart then.


The most important goal of a chart designer is to match form and function. What is the best way to present the data that explains the message the designer has gleaned from the data? Certain conventions have been developed in this profession over the years, following which the designer can produce graphics of adequate value. For example, the preference for bar charts over pie charts, starting bar charts at zero, and so on.

On the other hand, conventional thinking can sometimes hinder us. When faced with geographical data, the first thing to come to mind is the geographical graph paper (i.e. the map); but the map is a highly inflexible canvass, and we should always consider non-geographical presentations as well.

In the income inequality example, Deidre's maps are great in displaying the following:

  • An overall increase in Gini levels across the entire country (the yellow to orange transition)
  • The higher inequality in the South throughout this period (by 2007, these states are still darker red than the rest of the countries)
But the readers will have other questions that cannot be easily answered by these maps:
  • What's the profile of growth in inequality? (the color scale does not convey this information well as compared to, say, a line chart)
  • Are there groupings of states (outside of proximity/regions) that have experienced similar profiles of growth over this period?
  • How to identify specific subsets of states, e.g. those that started with the least inequality and ended with the most, those that experienced the smallest amount of change over this period, etc.?

The following panel chart answers the above questions much better but at the expense of the geographical graphical paper.


A few words are in order to explain this chart.  Each panel is a line chart of the growth in inequality (gini) over time for a specific state.  States that have a similar profile are grouped together. On this display, the slopes of the lines tell us quickly which states have experienced the greatest growth in inequality (California, New York, Connecticut, DC), and those with the slowest growth (Alaska, Dakotas, etc.).  

I used a k-means clustering algorithm to create six groups of states -- DC, which has the highest inequality by far, is a cluster by itself.  Within each cluster of states, the panels are arranged in alphabetical order.  I am not particularly happy with the cluster analysis result - I tried different algorithms but did not find any better patterns in the time I spent with this dataset.

This type of display is very flexible. One could group the state panels by whatever criterion one desires. If one wants to look at regional differences, for example, the states could be grouped by region.

Is the panel chart always superior to the maps?  No.  It depends.  The point is that one should always check out different displays before settling onto maps.

Related older posts here.


Jorge Camoes

"Everything is related to everything else, but near things are more related than distant things." (First law of geography, Waldo Tobler)

Choropleth maps often leave many unanswered questions, but any serious analysis of a regional dataset must start by looking for spatial patterns.

You are "not particularly happy with the cluster analysis result". I used to feel that way too, exactly for the same reason: when applying the k-means algorithm to a dataset with a spatial dimension. You should try a spatially constrained cluster analysis algorithm instead.

Unless a bizarre randomness emerges from mapping the data, I don't think you can safely say that a panel chart is superior to a map (or the other way around). Each one tell us a different part of the story.


Excellent work.


Linked micromaps would be the way to go with this. By linking the trends and spatial arrangement you can simultaneously evaluate similarity in trend as well as spatial clustering.


I wonder what a Hans Rosling Gapminder-style movie would look like, with the bubbles (simple spots in this case, unless you add the changing state populations to the time series data) color-coded according to the results of your cluster analysis?


Ok I have redone the figure as a linked micromap plot (almost) with geographically constrained clustering of the mapped groups. Is there someplace I can post the plot?



Nicholas: send it to me and I'll post it. the user name of our gmail account is the name of the blog.

The comments to this entry are closed.