What's a histogram?
Reading log: HBR's specialty bar charts

The curse of dimensions

Usually the curse of dimensions concerns data with many dimensions. But today I want to talk about a different kind of curse. This is the curse of dimensions in mapping.

We are only talking about a few dimensions, typically between 3 and 6, so small number of dimensions. And yet it's already a curse. Maps are typically drawn in two dimensions. Those two dimensions are usually spoken for: they show the x- and y-coordinate of space. If we want to include a third, fourth or fifth dimension of data on the map, we have to appeal to colors, shapes, and so on. Cartographers have long realized that adding dimensions involves tradeoffs.

***

Andrew featured some colored bubble maps in a recent post. Here is one example:

Dorlingmap_percenthispanic

The above map shows the proportion of population in each U.S. county that is Hispanic. Each county is represented by a bubble pinned to the centroid of the county. The color of the bubble shows the data, divided into demi-deciles so they are using a equal-width binning method. The size of a bubble indicates the size of a county.

The map is sometimes called a "Dorling map" after its presumptive original designer.

I'm going to use this map to explore the curse of dimensions.

***

It's clear from the design that county-level details are regarded as extremely important. As there are about 3,000 counties in the U.S., I don't see how any visual design can satisfy this requirement without giving up clarity.

More details require more objects, which spread readers' attention. More details contain more stories, but that too dilutes their focus.

Another principle of this map is to not allow bubbles to overlap. Of course, having bubbles overlap or print on top of one another is a visual faux pas. But to prevent such behavior on this particular design means the precise locations are sacrificed. Consider the eastern seaboard where there are densely populated counties: they are not pinned to their centroids. Instead, the counties are pushed out of their normal positions, similar to making a cartogram.

I remarked at the start – erroneously but deliberately – that each bubble is centered at the centroid of each county. I wonder how many of you noticed the inaccuracy of that statement. If that rule were followed, then the bubbles in New England would have overlapped and overprinted. 

This tradeoff affects how we perceive regional patterns, as all the densely populated regions are bent out of shape.

Another aspect of the data that the designer treats as important is county population, or rather relative county population. Relative – because bubble size don't portray absolutes, plus the designer didn't bother to provide a legend to decipher bubble sizes.

The tradeoff is location. The varying bubble sizes, coupled with the previous stipulation of no overlapping, push bubbles from their proper centroids. This forced displacement disproportionately affects larger counties.

***

What if we are willing to sacrifice county-level details?

In this setting, we are not obliged to show every single county. One alternative is to perform spatial smoothing. Intuitively, think about the following steps: plot all these bubbles in their precise locations, turn the colors slightly transparent, let them overlap, blend away the edges, and then we have a nice picture of where the Hispanic people are located.

I have sacrificed the county-level details but the regional pattern becomes much clearer, and we don't need to deviate from the well-understood shape of the standard map.

This version reminds me of the language maps that Josh Katz made.

Joshkatz_languagemap

Here is an old post about these maps.

This map design only reduces but does not eliminate the geographical inaccuracy. It uses the same trick as the Dorling map: the "vertical" density of population has been turned into "horizontal" span. It's a bit better because the centroids are not displaced.

***

Which map is better depends on what tradeoffs one is making. In the above example, I'd have made different choices.

 

One final thing – it's minor but maybe not so minor. Most of the bubbles on the map especially in the middle are tiny; as most of them have Hispanic proportions that are on the left side of the scale, they should be showing light orange. However, all of them appear darker than they ought to be. That's because each bubble has a dark border. For small bubbles, the ratio of ink on the border is a high proportion of the ink for the entire object.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Derek

They're named after Danny Dorling, the Halford Mackinder Professor of Geography at Oxford.

I like to drop obscure words like "vigintile" (twenty parts) and "tercile" (three parts) in meetings :-)

The comments to this entry are closed.