« October 2015 | Main | December 2015 »

Efficiency in space usage leads to efficiency in comprehension

Consider the following two charts that illustrate the same data. (I deliberately took out the header text to make a point. The original chart came from the Wall Street Journal.)


To me, the line chart gets to the point more quickly: that Burberry stores are more numerous in those places shown on the left and fewer in those places shown on the right, relative to comparable luxury brands (Prada and Louis Vuitton).

The reason why the tiled bar chart is tougher to decipher is its inefficient use of space. Within each country group,  the three places are plotted on two levels, one on the upper level, and two on the lower level. Then the two groups of countries are placed top and bottom. Readers have to first size up the individual group of three countries, then make a comparison between the two groups.


From a Trifecta checkup perspective, the bigger issue here is the data. The full story seems to be that those two country groups have different currency experiences... Japan and the continental European countries have weakening currencies, which tends to make their goods cheaper for Chinese consumers. This crucial part of the story is not anywhere on the chart.

In addition, the number of stores is not a telling statistic, because stores may have different areas, and certainly the revenues generated by these stores differ, potentially by country. A measure such as change in same-store sales in each country is more informative.

It is also not true that the distribution of stores is purely a matter of business strategy, as Burberry is a British brand, Prada is Italian and Louis Vuitton is French. They each have more stores in their home countries, which seems very logical.

Egregious chart brings back bad memories

My friend Alberto Cairo said it best: if you see bullshit, say "bullshit!"

He was very incensed by this egregious "infographic": (link to his post)


Emily Schuch provided a re-visualization:


The new version provides a much richer story of how Planned Parenthood has shifted priorities over the last few years.

It also exposed what the AUL (American United for Life) organization distorted the story.

The designer extracted only two of the lines, thus readers do not see that the category of services that has really replaced the loss of cancer screening was STI/STD testing and treatment. This is a bit ironic given the other story that has circulated this week - the big jump in STD among Americans (link).

Then, the designer placed the two lines on dual axes, which is a dead giveaway that something awful lies beneath.

Further, this designer dumped the data from intervening years, and drew a straight line from the first to the last year. The straight arrow misleads by pretending that there has been a linear trend, and that it would go on forever.

But the masterstroke is in the treatment of the axes. Let's look at the axes, one at a time:

The horizontal axis: Let me recap. The designer dumped all but the starting and ending years, and drew a straight line between the endpoints. While the data are no longer there, the axis labels are retained. So, our attention is drawn to an area of the chart that is void of data.

The vertical axes: Let me recap. The designer has two series of data with the same units (number of people served) and decided to plot each series on a different scale with dual axes. But readers are not supposed to notice the scales, so they do not show up on the chart.

To summarize, where there are no data, we have a set of functionless labels; where labels are needed to differentiate the scales, we have no axes.


This is a tried-and-true tactic employed by propagandists. The egregious chart brings back some bad memories.

Here is a long-ago post on dual axes.

Here is Thomas Friedman's use of the same trick.

Choropleths, cartograms, tile maps and all those

I like this discussion by Richard Brath about the designer choices when it comes to creating maps.

Richard systematically walks through each type of map and points out the strengths and weaknesses. He has some interesting ideas about improving tile maps, such as the following equal-area but different-orientation tile map:


Several of my prior posts are related to this discussion:

I talk about tile maps here, and point out its built-in every-state-is-equal assumption, which is appropriate only if that assumption holds. Tile map is the newly popular name for equal-area cartograms, a term that Richard uses.

Years ago, I found a case of mis-use of the tile maps.

Recently, I wrote about the invariance property of some choropleths.



Statistics report raises mixed emotions

It's gratifying to live through the incredible rise of statistics as a discipline. In a recent report by the American Statistical Association (ASA), we learned that enrollment at all levels (bachelor, master and doctorate) has exploded in the last 5-10 years, as "Big Data" gather momentum.

But my sense of pride takes a hit while looking at the charts that appear in the report. These graphs demonstrate again the hegemony of Excel defaults in the world of data visualization.

Here are all five charts organized in a panel:


Chart #5 (bottom right) catches the eye because it is the only chart with two lines instead of three. You then flip to the prior page to find the legend. The legend tells you the red line is Bachelor and the green line is PhD. That seems wrong, unless biostats departments do not give out Master degrees.

This is confirmed by chart #2, where we find the blue line (Master) hugging zero.

Presumably the designer removed the blue line from chart #5 because the low counts mean that it fluctuates wildly between 0 and 100 percent and so disrupts the visual design. But the designer forgets to tell readers why the blue line is missing.


It turns out the article itself contradicts all of the above:

For biostatistics degrees, for which NCES started providing data specifically in 1992, master’s degrees track the overall increase from 2010– 2014 at 47%...The number of undergraduate degrees in biostatistics remains below 30.

Asa_enrollment_legendIn other words, the legend is mislabeled. The blue line represents Bachelor while the red line, Master. (The error was noticed after the print edition went out because the online version has the correct legend.)


There is another mystery. Charts #2, #3, and #5, all dealing with biostats, have time starting from 1992, while Charts #1 and #4 starts from 1987. The charts aren't lined up in a way that would allow comparisons across time.

Similarly, the vertical scale of each chart is different (aside from Charts #3 and #4). This design choice impairs comparison across charts.

In the article, it is explained that 1992 was when the agency started collecting data about biostatistics degrees. Between 1987 and 1992, were there no biostatistics majors? were biostatistics majors lumped into the counts of statistics majors? It's hard to tell.


While Excel is a powerful tool that has served our community well, its flexibility is often a source of errors. The remedy to this problem is to invest ample time in over-riding pretty much every default decision in the system.

For example:


This chart, a reproduction of Chart #1 above, was entirely produced in Excel.