Convention and function
Feb 06, 2010
Over at the Social Science Statistics blog, Deidre showcased a set of maps that show the evolution of income inequality (as measured by the Gini coefficient) by state in the United States over the last four decades. This presentation reminds us of the CDC obesity maps, the only difference being how the CDC packaged the maps in a nice little animated jpeg movie. We loved the chart then.
The most important goal of a chart designer is to match form and function. What is the best way to present the data that explains the message the designer has gleaned from the data? Certain conventions have been developed in this profession over the years, following which the designer can produce graphics of adequate value. For example, the preference for bar charts over pie charts, starting bar charts at zero, and so on.
On the other hand, conventional thinking can sometimes hinder us. When faced with geographical data, the first thing to come to mind is the geographical graph paper (i.e. the map); but the map is a highly inflexible canvass, and we should always consider non-geographical presentations as well.
In the income inequality example, Deidre's maps are great in displaying the following:
- An overall increase in Gini levels across the entire country (the yellow to orange transition)
- The higher inequality in the South throughout this period (by 2007, these states are still darker red than the rest of the countries)
- What's the profile of growth in
inequality? (the color scale does not convey this information well as
compared to, say, a line chart)
- Are there groupings of states (outside of proximity/regions) that have experienced similar profiles of growth over this period?
- How to identify specific subsets of states, e.g. those that started with the least inequality and ended with the most, those that experienced the smallest amount of change over this period, etc.?
The following panel chart answers the above questions much better but at the expense of the geographical graphical paper.
A few words are in order to explain this chart. Each panel is a line chart of the growth in inequality (gini) over time for a specific state. States that have a similar profile are grouped together. On this display, the slopes of the lines tell us quickly which states have experienced the greatest growth in inequality (California, New York, Connecticut, DC), and those with the slowest growth (Alaska, Dakotas, etc.).
I used a k-means clustering algorithm to create six groups of states -- DC, which has the highest inequality by far, is a cluster by itself. Within each cluster of states, the panels are arranged in alphabetical order. I am not particularly happy with the cluster analysis result - I tried different algorithms but did not find any better patterns in the time I spent with this dataset.
This type of display is very flexible. One could group the state panels by whatever criterion one desires. If one wants to look at regional differences, for example, the states could be grouped by region.
Is the panel chart always superior to the maps? No. It depends. The point is that one should always check out different displays before settling onto maps.
Related older posts here.