Convention and function
Feb 06, 2010
Over at the Social Science Statistics blog, Deidre showcased a set of maps that show the evolution of income inequality (as measured by the Gini coefficient) by state in the United States over the last four decades. This presentation reminds us of the CDC obesity maps, the only difference being how the CDC packaged the maps in a nice little animated jpeg movie. We loved the chart then.
The most important goal of a chart designer is to match form and function. What is the best way to present the data that explains the message the designer has gleaned from the data? Certain conventions have been developed in this profession over the years, following which the designer can produce graphics of adequate value. For example, the preference for bar charts over pie charts, starting bar charts at zero, and so on.
On the other hand, conventional thinking can sometimes hinder us. When faced with geographical data, the first thing to come to mind is the geographical graph paper (i.e. the map); but the map is a highly inflexible canvass, and we should always consider non-geographical presentations as well.
In the income inequality example, Deidre's maps are great in displaying the following:
- An overall increase in Gini levels across the entire country (the yellow to orange transition)
- The higher inequality in the South throughout this period (by 2007, these states are still darker red than the rest of the countries)
- What's the profile of growth in
inequality? (the color scale does not convey this information well as
compared to, say, a line chart)
- Are there groupings of states (outside of proximity/regions) that have experienced similar profiles of growth over this period?
- How to identify specific subsets of states, e.g. those that started with the least inequality and ended with the most, those that experienced the smallest amount of change over this period, etc.?
The following panel chart answers the above questions much better but at the expense of the geographical graphical paper.
A few words are in order to explain this chart. Each panel is a line chart of the growth in inequality (gini) over time for a specific state. States that have a similar profile are grouped together. On this display, the slopes of the lines tell us quickly which states have experienced the greatest growth in inequality (California, New York, Connecticut, DC), and those with the slowest growth (Alaska, Dakotas, etc.).
I used a k-means clustering algorithm to create six groups of states -- DC, which has the highest inequality by far, is a cluster by itself. Within each cluster of states, the panels are arranged in alphabetical order. I am not particularly happy with the cluster analysis result - I tried different algorithms but did not find any better patterns in the time I spent with this dataset.
This type of display is very flexible. One could group the state panels by whatever criterion one desires. If one wants to look at regional differences, for example, the states could be grouped by region.
Is the panel chart always superior to the maps? No. It depends. The point is that one should always check out different displays before settling onto maps.
Related older posts here.
"Everything is related to everything else, but near things are more related than distant things." (First law of geography, Waldo Tobler)
Choropleth maps often leave many unanswered questions, but any serious analysis of a regional dataset must start by looking for spatial patterns.
You are "not particularly happy with the cluster analysis result". I used to feel that way too, exactly for the same reason: when applying the k-means algorithm to a dataset with a spatial dimension. You should try a spatially constrained cluster analysis algorithm instead.
Unless a bizarre randomness emerges from mapping the data, I don't think you can safely say that a panel chart is superior to a map (or the other way around). Each one tell us a different part of the story.
Posted by: Jorge Camoes | Feb 07, 2010 at 02:48 PM
Excellent work.
Posted by: Cory | Feb 07, 2010 at 03:53 PM
Linked micromaps would be the way to go with this. By linking the trends and spatial arrangement you can simultaneously evaluate similarity in trend as well as spatial clustering.
Posted by: Nicholas | Feb 07, 2010 at 04:30 PM
I wonder what a Hans Rosling Gapminder-style movie would look like, with the bubbles (simple spots in this case, unless you add the changing state populations to the time series data) color-coded according to the results of your cluster analysis?
Posted by: derek | Feb 08, 2010 at 12:38 PM
Ok I have redone the figure as a linked micromap plot (almost) with geographically constrained clustering of the mapped groups. Is there someplace I can post the plot?
Nicholas
Posted by: Nicholas | Feb 10, 2010 at 01:00 AM
Nicholas: send it to me and I'll post it. the user name of our gmail account is the name of the blog.
Posted by: Kaiser | Feb 10, 2010 at 01:54 AM