Visualizing uneven distributions
Small multiples with simple axes

Oldie but goodie

Back in 2007, the New York Times graphics team produced a fabulous chart explaining the rise in prices at the pump (link).

Let's start with the tab labeled "Regional Price" which contains a well-executed map of the average gas prices by county:


The color scale is wonderful. It's just one color and yet the gradations are easily discerned. The general spatial pattern jumps out at you, with prices being higher in the Pacific coast, and lower in New England all the way down south. The Lakes region also has higher prices so does New Mexico and Colorado and Hawaii.


The legend is just superb. Take a closer look:


What sets this legend apart is varying lengths of the segments. In particular, the darkest blue also corresponds to a wide range of prices (3.45-3.94). One can also easily figure out the lowest and highest price in the nation--the designers located exactly in which counties those prices were recorded, which is another nice touch.

To determine the breakpoints on the legend, one can use a statistical methodology: a standardized scale anchored on both sides of the national average price (from the other chart, the average price was $3.22). Then, we have each color mapping to the length of one standard deviation of prices in both directions. What this does is to put counties into standardized groups: for example, all counties whose prices were within one standard deviation above the average are given one tint while those that were one to two standard deviations above the average has a darker blue, and so on. In effect, we would have created a contour map.


I see the designers' intention in clearly labeling the areas where they do not have data, with the diagonal stripes on white. My own preference is to put those areas in a mild gray, in effect blending them into the surroundings. In this way, the missing data do not distract the average reader, while the fastidious reader can still figure out where the data holes are.

This is a key learning for most research scientists. We have a tendency to train our eyes on the outliers and the data holes because they are like imperfections in diamonds. This leads us to the tendency of highlighting the least important message up front. And it's a bad habit.


In the following, I put the county and state level views side by side. The NYT graphic allows users to switch between the two views via a tab.


Much like the recent post on the age of buildings in Brooklyn, the state aggregates tell a simpler story but still capture almost all of the spatial pattern. The average prices per state are now printed directly on the chart. The question the designer should ask is what the readers want to learn from such a chart, and which one delivers more of such requirements. It's possible the Times is catering to two types of readers. Perhaps one can strike a middle ground, which is to break out certain states like Texas into contiguous "regions".



Feed You can follow this conversation by subscribing to the comment feed for this post.


I mostly agree with your analysis, except for one thing. Having the areas without have the same (no) pattern and a "blending in" grey colour would be a mistake.

For one thing, charts are still frequently printed out, or even copied, and can easily lose the colour information in the process. Also, people with some forms of colour blindness will be unable to distinguish a pale blue and a grey.

In either kind of situation, you lose the "No data" information and falsely assign data values to those areas. Keep a separate pattern and you don't risk losing the information.


Interesting how dark Illinois is because of Chicago.


I suspect altitude - and hence, the difference in octane level of regular gas - was not adjusted for.


derek, David: be careful what you wish for. Adjusting for something effectively removes its effect on the statistic so if you adjust for everything you know about, you might end up with a undifferentiated blue!


Kaiser - that's the point! You *need* to adjust for octane level ... regular is only 85 octane in Colorado, so it should be cheaper than the rest of the nation. Adjust/account for it to verify that.

The comments to this entry are closed.