« The graphical version of "to be seen" | Main | The exception to the rule against dual axes »


Chris P

Hey Kaiser, I think the legend labels are flipped on the graph you did. North Dakota should be in the 10-18 range.


Two comments: First, I've never been diagnosed with any kind of colour blindness, but still I could see no difference at all between the leftmost and rightmost ranges in the original chart. Only when I clicked through and compared carefully was I able to spot the difference.

Second, just for kicks, how about splitting the range according to cumulative population? So the lowest-performing states together exceeding, say, 20% of the total population is one group; the next states that brings it up to 40% is another group and so on?

That would give you a more "people-centric" view, showing that x% of people in the country live in places where y% have no food security.

Chris Moore

I completely agree with the broad criticisms of the original chart, but I don't really see the point of having arbitrarily-scaled buckets rather than a linear scale. Look at where you've put the split for the lowest group, for example - 18% is right after one of the largest "jumps" in the series, and you're consequently putting Wyoming (18.0%) and South Dakota (18.3%) in separate groups. Why 18% and not 17.5%? If there isn't a rationale behind a non-linear scale (for example, if specific percentages were considered "tipping points" of some kind or another), and you're edging into manipulating the data to convey a specific message rather than letting the data speak for itself.

The whole thing would be better done as a heat map, in any case - as far as I can see, there's no real purpose to any sort of bucketing or categorisation for this data set.

Chris Moore

Oops, just realised that you've not put Wyoming and South Dakota in separate groups - my bad. Looked at the scaling rather than the map.


Chris P: yes, my worst fear came true... when I added the color scheme, I reversed the labels. Will fix tonight.

Chris M: based on your comment, I'll need to write a future post on bucketing. In the meantime, look up any advanced description on histograms and note that figuring out the right-sized buckets is of paramount importance.

A linear scale is only appropriate if the units increase linearly. Think about the extreme case: if all data is concentrated in the middle with one outlier on either side of the distribution; a linear scale would place everything in one color and then have two other colors with only one data point each.

Janne: I'll be writing about other ways to bucket this data. The population one often fails because it's dominated by a few big entries - haven't tried with this data. As I said, I like the chosen colors but not the progression.

Chris Moore

Sure, but your reasons for choosing these three different-sized buckets particularly remain unclear - you seem keen to highlight the states that perform well, but that's not the intent of the original map (illustrating widespread high levels of child food insecurity in the US). And I consequently have more confidence in the original map (with all its flaws) than your revision. Without any benchmarks to go by (would WHO consider any of these states to be in crisis, for example?), I would likely find a heat map more revealing.

Nevertheless, I remain an avid reader and shall look forward to being properly enlightened on this topic at a later date.

Jörgen Abrahamsson

When using colours that are not progressive. There is no visual cue to the order of the colours and the data becomes purely nominal. In this case the nominal data is classes in a quantative variable. How to choose such classes is very complex indeed. It may be based on some benchmark (as Chris M suggests) or the distribution of the data as Kaiser suggests. Such a classing based on distribution need not be equal sized though. That is only a requisite of histograms, where buckets must be equal to correctly show the form of the distribution. Colour has no shape.

Bernard Lebelle

Kaiser, I would love to read an article on the science of data bucketing. This is a fine topic that would prove tremendously interesting for non statisticians

Richard Krablin

There are three things I look for in a chart that uses colors:

1. Bold colors that a partially color blind person like me can easily distinguish
2. Progressive colors that are distinct enough to be matched to a legend
3. Colors that translate well to black and white printing, which often occurs before a graph is consumed.

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter