An embarrassment
Progress and retrogress

Hanging tough


Reader Nick B. sent in this example calling it "interesting".  The chart tells a compelling story once we figure out what it is.  Grasping the tree structure is key.

It illustrates the important idea that averaging sometimes masks  variations in the data.  For example, while the province of Guerrero scored 78% on literacy, the municipalities within Guerrero had scores ranging from 28% to 90%.

It also shows that the gender gap was larger in lesser Metlatonoc municipality than in more literate Cuautitian.

In addition, it tells us that while Mexico on average measured very well on literacy, subpopulations within Mexico spanned the world's best and worst (from about Mali's level to Italy's).

While I find this chart adequate, the pieces hanging off each other did not seem ideal, especially the two overlapping municipality pieces which were placed next to each other.  However, it is tough to come up with an alternative.  Here's one attempt; the changes are mild.

Redo_literacy_2 I prefer the horizontal orientation.

The branches are emphasized (as opposed to the "T" junction) because that's a key part of the story.

The national level, especially the span between Mali and Italy, is de-emphasized; I treat it as gridlines.

Instead of placing the overlapping pieces next to each other, I let the ranges literally overlap, which serves to stress this feature.




Feed You can follow this conversation by subscribing to the comment feed for this post.

Oskar Shapley

The vertical graphs is more readable. We are talking about a 'level' variable.


There is something I don't like about this representation.

This representation assumes, or at least suggests, that at the finest level there is no group with a highest literacy rate than the males from Cuautitlan, Mexico and none with a lower rate than the females from Metlatonoc, Guerrero.

What if that weren't the case? what if there was a group with a higher literacy rate than the one shown on the chart, although their municipality or province would have a lower score than Cuautitlan or Mexico? or conversely, what if there was a group with a lower score than 20 even though their municipality or province did better than Metlatonoc or Guerrero? in that case the graph wouldn't work or at least it would be misleading. that's only coincidence that the subsets with extreme values happen to be part of subsets with extreme values at a higher level.


I get the impression that this representation only sets out to illustrate how extreme the disparity is, not to conclusively rule on the absolute extrema. It works for me. Once I got what was going on, I actually prefer the original chart; I found the graphical representation with shading "fanning out" to connect the subsets helped me to visually group the extrema more easily than the junkchart version. Looking only at the municipality column, Acapulco and Metlatonoc clearly belong in the same grouping. The junkchart version requires me to look up to origination point of the connecting lines in the Province row to make the same association. My only change to the original would be to eliminate the horizontal lines connecting the successive subsets, since it is redundant with the shaded areas almost to the point of losing clarity.

The comments to this entry are closed.