A matter of compactness
Apr 29, 2012
Andrew Gelman may have nominated himself the graphics advisor for the World Happiness Report (link). That would be a very good thing.
To kick this off, I re-made the Figures 2.1-2.2.8 in the report, which summarized the findings of the Gallup World Poll covering annual samples of 1,000 people aged 15 and over from each of 150 countries. (These charts are effectively the first charts to appear in the report. There is no Figure 1 because Chapter 1 has no charts. The report also inexplicably follows the outdated academic-publishing convention of banishing all diagrams to the end of the report as if they were footnotes.)
In the report, they presented histograms of the 0-10 ratings (10 = happiest) by region of the world, two charts a page running to 5 pages. Here's one such page:
If you're presenting regional data, you're expecting readers to want to compare regions. It's not very nice to make them flip back and forth and task their memory in order to do these comparisons.
This data set is where small multiples show their power. Small multiples are a set of charts all sharing the same execution (type, axes, etc.) but each showing different subsets of the data. This sort of chart is designed for group comparisons, and is one of the key propositions by Edward Tufte in his classic book.
In the following junkart version, I plotted each region's histogram against the global average histogram (indicated in gray as background). The average rating in each region is indicated with the light blue vertical line. The countries are sorted from highest average happiness to lowest.
The same data now occupies only one page of the report. (A topic for a different post: does the higher average rating in N. America/Europe indicate greater happiness or grade inflation?)
***
Alternatively, one can stack up the line charts into a column, as shown on the right. This view is somewhat better for any pairwise comparisons. (Calling JMP developers: how do I rotate the text labels to make them horizontal?)
***
Finally, I made a chart for exploratory purpose, using a scatterplot matrix (see also this post). In this version, every pair of regions is under the microscope. Since there are 10 regions (including the global total), we have (10*9)/2 = 45 pairwise comparisons. Each of these comparisons have its own chart in the matrix, indexed by the labels on the axes.
Each individual chart is a scatter plot of the proportions selecting a particular rating. If the histograms in region A and region B are identical, then we see 11 dots all lined up in a diagonal line going from bottom left to top right.
In addition, the pink area of the chart contains 95% of the data. So the more the pink area resembles a diagonal line, the more correlated are the histograms between the two regions being compared.
For example, the very top chart compares CIS with East Asia. The thinness of the pink area tells us that the histograms of happiness ratings in those two regions resemble each other. You can easily verify this finding by looking at the first two line charts shown in the column of line charts above.
By contrast, the chart comparing CIS and Europe has an expansive pink area, meaning the happiness ratings follow different distributions. This is also verified by looking at the line charts, which show that Europeans are generally happier than people in CIS. There is an "excess" of people with ratings around 6-8 in Europe compared to CIS. The dots corresponding to these ratings would appear above the diagonal.
This scatterplot matrix explores all possible comparisons on one page but it is a lab exercise not suitable for mass consumption because it has too much detail.
***
For those curious, the small-multiples of line charts is made using R. The column of line charts and scatterplot matrix are created using JMP.