The importance of a proper scale
Dec 09, 2013
Business Insider (link) highlighted a map showing childhood food insecurity across the 50 states, with the data coming from a report by Brookings.
This is a nice map. I like the tones of the chosen colors although the colors are not intuitively matched to magnitude. (There is a small labeling issue in the New England section.) The message is very clear.
I wondered about the scale, in particular, the use of equal sized buckets to split the scale. As a designer, several key decisions here include the number of buckets, and the size of each bucket. The following chart shows the choice made by this designer:
In this chart, all the states are ranked by their food insecurity rates with the lowest on the left and the highest on the right. The three horizontal lines show where the current cutoff values are. They form two equal sized blocks because of the equal spacing chosen by the designer. There are a total of four buckets.
Now if you ignore the dashed lines, and focus on the solid line showing the increasing food insecurity rates, you'd notice that maybe there are only three buckets, not four. The following amended chart shows where I'd put the cutoff values resulting in three buckets. (18% and 23%).
With the new cutoff values, let's look at what the map looks like:
I'm pretty happy with this. It shows an even clearer picture. There are three clusters of states, most of the south and west suffer more than the north and east. The odd state here and there (e.g. Louisiana) turned out not to be so special.
But this version picks out the "outliers", the group that has the best food insecurity rates than the rest of the country (as shown on the left side of the line charts). These particularly well-performing states are North Dakota and Minnesota, New Hampshire and Mass. and Viriginia.
A small shift in the scaling cleans up the message!
Here is the same map with a progressive color scheme:
Hey Kaiser, I think the legend labels are flipped on the graph you did. North Dakota should be in the 10-18 range.
Posted by: Chris P | Dec 09, 2013 at 08:19 AM
Two comments: First, I've never been diagnosed with any kind of colour blindness, but still I could see no difference at all between the leftmost and rightmost ranges in the original chart. Only when I clicked through and compared carefully was I able to spot the difference.
Second, just for kicks, how about splitting the range according to cumulative population? So the lowest-performing states together exceeding, say, 20% of the total population is one group; the next states that brings it up to 40% is another group and so on?
That would give you a more "people-centric" view, showing that x% of people in the country live in places where y% have no food security.
Posted by: Janne | Dec 09, 2013 at 08:23 AM
I completely agree with the broad criticisms of the original chart, but I don't really see the point of having arbitrarily-scaled buckets rather than a linear scale. Look at where you've put the split for the lowest group, for example - 18% is right after one of the largest "jumps" in the series, and you're consequently putting Wyoming (18.0%) and South Dakota (18.3%) in separate groups. Why 18% and not 17.5%? If there isn't a rationale behind a non-linear scale (for example, if specific percentages were considered "tipping points" of some kind or another), and you're edging into manipulating the data to convey a specific message rather than letting the data speak for itself.
The whole thing would be better done as a heat map, in any case - as far as I can see, there's no real purpose to any sort of bucketing or categorisation for this data set.
Posted by: Chris Moore | Dec 09, 2013 at 09:51 AM
Oops, just realised that you've not put Wyoming and South Dakota in separate groups - my bad. Looked at the scaling rather than the map.
Posted by: Chris Moore | Dec 09, 2013 at 10:28 AM
Chris P: yes, my worst fear came true... when I added the color scheme, I reversed the labels. Will fix tonight.
Chris M: based on your comment, I'll need to write a future post on bucketing. In the meantime, look up any advanced description on histograms and note that figuring out the right-sized buckets is of paramount importance.
A linear scale is only appropriate if the units increase linearly. Think about the extreme case: if all data is concentrated in the middle with one outlier on either side of the distribution; a linear scale would place everything in one color and then have two other colors with only one data point each.
Janne: I'll be writing about other ways to bucket this data. The population one often fails because it's dominated by a few big entries - haven't tried with this data. As I said, I like the chosen colors but not the progression.
Posted by: Kaiser | Dec 09, 2013 at 11:30 AM
Sure, but your reasons for choosing these three different-sized buckets particularly remain unclear - you seem keen to highlight the states that perform well, but that's not the intent of the original map (illustrating widespread high levels of child food insecurity in the US). And I consequently have more confidence in the original map (with all its flaws) than your revision. Without any benchmarks to go by (would WHO consider any of these states to be in crisis, for example?), I would likely find a heat map more revealing.
Nevertheless, I remain an avid reader and shall look forward to being properly enlightened on this topic at a later date.
Posted by: Chris Moore | Dec 09, 2013 at 12:23 PM
When using colours that are not progressive. There is no visual cue to the order of the colours and the data becomes purely nominal. In this case the nominal data is classes in a quantative variable. How to choose such classes is very complex indeed. It may be based on some benchmark (as Chris M suggests) or the distribution of the data as Kaiser suggests. Such a classing based on distribution need not be equal sized though. That is only a requisite of histograms, where buckets must be equal to correctly show the form of the distribution. Colour has no shape.
Posted by: Jörgen Abrahamsson | Dec 11, 2013 at 04:12 AM
Kaiser, I would love to read an article on the science of data bucketing. This is a fine topic that would prove tremendously interesting for non statisticians
Posted by: Bernard Lebelle | Dec 12, 2013 at 11:16 AM
There are three things I look for in a chart that uses colors:
1. Bold colors that a partially color blind person like me can easily distinguish
2. Progressive colors that are distinct enough to be matched to a legend
3. Colors that translate well to black and white printing, which often occurs before a graph is consumed.
Posted by: Richard Krablin | Dec 16, 2013 at 11:11 AM