Flooding the Himalayas
Look what I found: two amazing charts

A matter of compactness

Andrew Gelman may have nominated himself the graphics advisor for the World Happiness Report (link). That would be a very good thing.

To kick this off, I re-made the Figures 2.1-2.2.8 in the report, which summarized the findings of the Gallup World Poll covering annual samples of 1,000 people aged 15 and over from each of 150 countries. (These charts are effectively the first charts to appear in the report. There is no Figure 1 because Chapter 1 has no charts. The report also inexplicably follows the outdated academic-publishing convention of banishing all diagrams to the end of the report as if they were footnotes.)

In the report, they presented histograms of the 0-10 ratings (10 = happiest) by region of the world, two charts a page running to 5 pages. Here's one such page:


If you're presenting regional data, you're expecting readers to want to compare regions. It's not very nice to make them flip back and forth and task their memory in order to do these comparisons.

This data set is where small multiples show their power. Small multiples are a set of charts all sharing the same execution (type, axes, etc.) but each showing different subsets of the data. This sort of chart is designed for group comparisons, and is one of the key propositions by Edward Tufte in his classic book.

In the following junkart version, I plotted each region's histogram against the global average histogram (indicated in gray as background). The average rating in each region is indicated with the light blue vertical line. The countries are sorted from highest average happiness to lowest.



The same data now occupies only one page of the report. (A topic for a different post: does the higher average rating in N. America/Europe indicate greater happiness or grade inflation?)


Redo_whr_linesAlternatively, one can stack up the line charts into a column, as shown on the right. This view is somewhat better for any pairwise comparisons. (Calling JMP developers: how do I rotate the text labels to make them horizontal?)


Finally, I made a chart for exploratory purpose, using a scatterplot matrix (see also this post). In this version, every pair of regions is under the microscope. Since there are 10 regions (including the global total), we have (10*9)/2 = 45 pairwise comparisons. Each of these comparisons have its own chart in the matrix, indexed by the labels on the axes. 

Each individual chart is a scatter plot of the proportions selecting a particular rating. If the histograms in region A and region B are identical, then we see 11 dots all lined up in a diagonal line going from bottom left to top right.

In addition, the pink area of the chart contains 95% of the data. So the more the pink area resembles a diagonal line, the more correlated are the histograms between the two regions being compared.















For example, the very top chart compares CIS with East Asia. The thinness of the pink area tells us that the histograms of happiness ratings in those two regions resemble each other. You can easily verify this finding by looking at the first two line charts shown in the column of line charts above.

By contrast, the chart comparing CIS and Europe has an expansive pink area, meaning the happiness ratings follow different distributions. This is also verified by looking at the line charts, which show that Europeans are generally happier than people in CIS. There is an "excess" of people with ratings around 6-8 in Europe compared to CIS. The dots corresponding to these ratings would appear above the diagonal.

This scatterplot matrix explores all possible comparisons on one page but it is a lab exercise not suitable for mass consumption because it has too much detail.


For those curious, the small-multiples of line charts is made using R. The column of line charts and scatterplot matrix are created using JMP.


Feed You can follow this conversation by subscribing to the comment feed for this post.

Paul D

Hi there,

I'm an occasional JMP user, and I've tried rotating the y-groups and I can't see anything obvious, short of doing it manually in Inkscape or [insert editing program of choice].

One thing I can do to improve the graph though, is arrange your regions top to bottom by average happiness, as per the small multiples chart:

You can do this by selecting Region Column > Col Info > Col Properties > Value ordering

Finally, I like the scatterplot analysis. Again changing the order of the plots, I can see that NA / Europe / LA are happier and don't correlate with each other. Sub-Sahara is sadder and doesn't correlate with the others. The 5 remaining middle distributions are highly correlated and could be treated as a dingle region in any further analysis / explanations


I noticed that in your scatterplot matrix (which, by the way, looks great) you followed the convention of only using the lower triangle. I'm wondering if this is because you think this is the way plots like this ought to be presented. Personally I find it quite an annoying convention. I mean, I get the idea--it's a symmetric matrix, so half of the elements are redundant--but visually speaking, it's just a lot easier to get a sense of the relationships with a particular variable by scanning across the row and/or column associated with that variable. With one triangle removed, not only do you have to change the "direction" of your scan midway through, but the axis on which the variable of interest lies switches sides!

Bob K

How come you prefer JMP over R for some of the graphs?


If you hover over an axis in JMP until you see a hand (Grabber tool), double click to open the axis specification window. There you will find a section called Tick Label Orientation, with a drop-down menu including options such as horizontal, vertical, and angled.


Laura: thanks, I'll check that out.

Bob K: I wrote a post about the Graph Builder function in JMP some time ago. It's like a sandbox that you can use to explore different chart types. R is great if you already know what type of chart to use-otherwise, you throw away a lot of code. R is great if you want to control tiny details such as using irregularly labeled axes, but if you're just making a standard chart, why not save some time?

The comments to this entry are closed.