Jens M., a long-time reader, submits a good graphic! This small-multiples chart (via Quartz) compares the consumption of liquor from selected countries around the world, showing both the level of consumption and the change over time.
Ordered the countries by the most recent data point rather than alphabetically
Scale labels are found only on outer edge of the chart area, rather than one set per panel
Only used three labels for the 11 years on the plot
Did not overdo the vertical scale either
The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout.
Having done so much work to simplify the data and expose the patterns, it's time to look at whether we can add some complexity without going overboard. I'd suggest using a different color to draw attention to curves that are strangely shaped -- the Ukraine comes to mind, so does Brazil.
I'd also consider adding the top liquor in each country... the writeup made a big deal out of the fact that most of the drinking in South Korea is of Soju.
One way to appreciate the greatness of the chart is to look at alternatives.
Here, the Economist tries the lazy approach of using a map: (link)
For one thing, they have to give up the time dimension.
A variation is a cartogram in which the physical size and shape of countries are mapped to the underlying data. Here's one on Worldmapper (link):
One problem with this transformation is what to do with missing data.
Wikipedia has a better map with variations of one color (link):
The Atlantic realizes that populations are not evenly distributed on the map so instead of coloring countries, thay put bubbles on top of the map (link):
Unfortunately, they scaled the bubbles to the total consumption rather than the per-capita consumption. You guess it, China gets the biggest bubble and much larger than anywhere else but from a per-capita standpoint, China is behind many other countries depicted on the map.
PS. A note on submissions. I welcome submissions, especially if you have a good chart to offer. Please ping me if I don't reply within a few weeks. I may have just missed your email. Also, realize that submissions take even more time to research since it is likely in the area I have little knowledge about, and mostly because you sent it to me since you hope I'll research it. Sometimes I give up since it's taking too much time. If you ping me again, I'll let you know if I'm working on it.
The above does not apply to emails from people who are building traffic for their infographics.
PPS. Andrew Gelman chimes in with his take on small multiples.
One of the dangers of "Big Data" is the temptation to get lost in the details. You become so absorbed in the peeling of the onion that you don't realize your tear glands have dried up.
Hans Rosling linked to a visualization of tobacco use around the world from Twitter (link to original). The setup is quite nice for exploration. I'd call this a "tool" rather than a visual.
Let's take a look at the concentric circles on the right.
I appreciate the designer's concept -- the typical visualization of this type of data is looking at relative rates, which obscures the fact that China and India have far and away the most smokers even if their rates are middling (24% and 13% respectively).
This circular chart is supposed to show the absolute distribution of smokers across so-called "super-regions" of the world.
Unfortunately, the designer decided to pile on additional details. The concentric circles present a geography lesson, in effect. For example, high-income super-region is composed of high-income North America, Western Europe, high-income Asia Pacific, etc. and then high-income North America is composed of USA, Canada, etc.
Notice something odd? The further out you go, the larger the circular segments but the smaller the amount of people they represent! There are more people in the super-region of high-income worldwide than in high-income North America and in turn, there are more people in the high-income North American region than in USA. But the size of the graphical elements is reversed.
In principle, the "bumps"-like chart used to show the evolution of tobacco prevalence in individual countries make for a nice visual. In fact, Rosling marvelled that the global rate of consumption has fallen in recent years.
However, I'm often irritated when the designer pays no attention to what not to show. There are probably well above 200 lines densely packed into this chart. It is almost for sure that over-plotting will cause some of these lines to literally never see the light of day. Try hovering over these lines and see for yourself.
The same chart with say 10 judiciously chosen lines (countries or regions) provides the reader with a lot more profit.
The discerning reader figures out that the best visual actually does not even show up on the dashboard. Go ahead, and click on the tab called "Data" on top of the page. You now see a presentation of each country's "data" by age group and by gender. This is where you can really come up with stories for what is going on in different countries.
For example, the British have really done extremly well in reducing tobacco use. Look at how steep the declines are across the board for British men (in most parts of the world, the prevalence of smoking is much higher among men than women.)
Bulgaria on the other hand shows a rather odd pattern. It is one of the few countries in the bumps chart that showed a climb in smoking rates, at least in the early 2000s. Here the data for men is broken down into age groups.
This chart exposes a weakness of the underlying data. The error bars indicate to us that what is being plotted is not actual data but modeled data. The error bars here are enormous. With the average at about 40% to 50% for many age groups, the confidence interval is also 40% wide. Further, note that there were only three or four observations (purple dots) and curves are being fitted to these three or four dots, plus extrapolation outside the window of observation. The end result is that the apparent uplift in smoking in the early 2000s is probably a figment of the modeler's imagination. You'd want to understand if there are changes in methodologies around that time.
As a responsible designer of data graphics, you should focus less on comprehensiveness and focus more on highlighting the good data. I'm a firm believer of "no data is better than bad data".
Business Insider links to this blog with a chart depicting the top beer brands by state.
I like the quilt-like appearance brought on by using the packaging of different brands. The nine glowing yellow islands sitting in the Atlantic Ocean I find annoying. This happens a lot because those New England states are smaller in area than most.
The design problem evaporates if you choose a small multiples approach. As shown below, there is the added benefit that the regional pattern of brand preference is clearly visible whereas in the original chart, it is rather hard to figure out.
I won't comment on the data source here. It's highly suspect.
@guitarzan wants us to see this chart from north of the border, and read the comments. Please hold your nose first.
Here's one insightful comment: "I
think it's insane to debate the ages 18 or 19. Why not cap it off at
the much more rounded and sensible numbers 18.2 or 19.4??"
Reminds me of signs that say this elevator holds 13 people, or this auditorium holds 147 people safely.
I mean, which software package enables this chart?
For the vertical axis, it appears that the major gridlines are specified to 0.4 with minor gridlines at 0.2 apart. The lower limit of the vertical axis was specifically set to 17, which violates the start-at-zero rule for bar charts.
The software also allows the yx-axis labels to be printed twice, one in super tiny font in the expected locations, and the other turned sideways and printed into the bars.
And Canadians, please tell us why the provinces were ordered in this way.
This data calls for a simple map, with two colors.
Here's a chart in the November edition of Bloomberg Markets:
Curiosities include: how they split up the lamb chop, why an onion is chosen to represent "fresh vegetables/melons"?
The chart contains some strange data that make readers feel nervous. For example, the fish image seems to say 88 percent of seafood eaten in the States are imported, and yet the two largest importing countries listed below (China and Vietnam) together account for only 22.5 percent. So the residual 65.5 percent must be split among at least 10 countries each accounting for not more than 6.5 percent of the total.
Then when you look at vegetables, Mexico and Canada together supply 72 percent. But the onion graphic tells us it's less than 20 percent. The categorization seems to be different between the top and the bottom layers. We have "fruit and nuts" / "fresh vegetables/melons" on the one side, and "fruit" / "vegetables" on the other side.
And why are melons combined with fresh vegetables rather than fruit?
Business Insider calls this stacked bar chart "staggering" (link). Maybe they are referring to its complexity.
Is there a reason to include all the fine details? The details serve little purpose other than to shout at readers that there is a lot of data behind this chart. It is impossible to compare the different drugs on the individual harmful effects based on reading this chart. All we can see is the comparison of total harm. (By the way, I can't explain why anabolic steriods rated 10 would be sandwiched between khat and ectasy both rated 9.)
It turns out there is an easy way to fix this chart. We turn to the original Lancet paper which contained this chart, by David Nutt (link). The 16 categories of harm are nicely organized into a tree structure:
Instead of taking data from the right side of this tree, we can take data from the aggregated levels. The sacrifice in detail comes with a major benefit in clarity. In the original paper, Nutt produced a chart that aggregated everything to two levels:
It's amazing how much more we learn from this chart even though it has less data than the previous one. (I'd still remove the data labels since they are redundant when one has the axis labels.)
Similarly, we can plot a chart at the level of physical, psychological, social, etc., and it would still be much more readable than the "staggering" one.
PS. Apparently, David Nutt is a controversial character. See Wiki (link).
NYC mayor Michael Bloomberg is getting mixed reviews for his proposal to ban super-sized sugary drinks. Reader John O. wasn't impressed with this graphical effort (link):
The key problem: this picture is not scary at all. The reason it's not horrifying is that there is no context. People who have knowledge about healthy eating habits will get the message but that's preaching to the choir.
If you know that the recommended consumption of daily sugars for adults is roughly 20-36 grams, then you can see that one sugary drink of 12 ounces or higher would take you over the daily limit. A 64-ounce drink would give you more than 7 times what you need in a day. That's a powerful message but you won't know it from this chart. Not from the sugar cubes doubling as shadows, which is a cute, creative concept.
Also, make use of the chart-title real estate! Instead of "Sugar & Calories per Fountain Drink", say something memorable. "Fountain drinks make you fat and sick".
There is something else fishy about this graphic. What are the most prominent data being displayed?
You got it. They're 7, 12, 16, 32, 64. Where have we seen this type of data display?
Yup. This format is lifted from a menu in a Starbucks or a McDonald's (without prices).
Is this a health warning? Or a restaurant menu?
Also slightly confused about the slightly non-linear relationship between calories and drink size. Maybe volume of ice is held constant...
It is in fact a proportional relationship. The confusion arises from the non-linear increase in cup size from 7 to 64 ounces. The math is roughly 11 calories per ounce, and 3g of sugar per ounce. I wonder if it is better to show those two numbers instead of the ten not-very-memorable numbers shown on the chart itself.
In case you're wondering, the heights (thus areas) of the cups have no relationship with any of the data, not calories, not sugars, and not the cup size.
PS. John also wrote: "The soda cup graph reminds me of the chart from Pravda that Tufte cites in 'Cognitive Style of Powerpoint'. " If you know what he's talking about, please post a link to the chart. Thanks.
While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link).
How embarrassing is the cost effectiveness of U.S. health care spending?
When a chart is executed well, no further words are necessary.
I'd only add that the other countries depicted are "wealthy nations".
Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. An important point to note is that the U.S. started out in 1970 similar to the other nations.
Let's appreciate this beauty:
Let the data speak for itself. Time goes from bottom left to upper right. As more money is spent, life expectancy goes up. However, the slope of the line is much smaller for the US than the other countries. There is no need to add colors, data labels, interactivity, animation, etc.
Recognize what's important, what's not. The US line is in a different color, much thicker and properly made the foreground of the chart.
Rather than clutter up the chart, the other 19 lines are anonymized. They all have the same color and thickness, and all given one aggregate label. This is an example of overcoming loss aversion (see this post for more): it is ok to suppress some of the data.
The axis labeling is superb. Tufte preaches this clean style. There is no need to use regularly-spaced axis labels... use data-informed labels. Unfortunately, software is way behind on this issue. You can do this in R but that's about it.
We look at another idea from the visualization project "Gaps in the US Healthcare System" (link). This was a tip from reader Jordan G. (link). One of the bright points about this project is the conscious attempt to try something different although the end result is not always successful.
A tree-like branching chart was used to represent cancer death rates, broken down by racial group, gender and type of cancer, in that order.
The tree structure loses its logic after the race and gender splits. Why link different types of cancers (the gray squares) together in a sequence? Stranger still is the existence of a third branch coming out of every race node (the four closest to the center). One branch is male, the other branch is female, what's the third leg? It appears to be prostate cancer which is male only--why doesn't it go with the male branch?
It's not easy to find the connection between what's depicted here, and the idea of "gaps" in the US healthcare system. I think the question is ill-posed to begin with. The rate of death reflects both the possible differential quality of healthcare between groups and the differential incidence of cancers between groups so no visualization tricks could be used to find reliable answers to the question being posed.
The chart fails the first corner of the Trifecta checkup. The chart type also does not fit the data.
The following chart plots the same data in a Bumps style.
I separated the male and female data since certain cancers are limited to one gender, and the gender difference is not likely to be the primary interest. The gender difference, incidentally, is clearly observed: the male death rates are generally about twice as high as the female rates of the same type of cancer, except for colorectal.
In terms of the "race gap", we find that black death rates are generally quite a bit higher than white death rates, especially for prostate cancer but except for lung cancer in females.
Asians and American Indians have practially the same death rates but in both cases the sample sizes are small.
The raw data can be found at the CDC website here.