Avoid concentric circles

A twitter follower sent me this chart by way of Munich:


The logo of the Munich Security Conference (MSC) is quite cute. It looks like an ear. Perhaps that inspired this, em, staggered donut chart.

I like to straighten curves out so the donut chart becomes a bar chart:


The blue and gray bars mimic the lengths of the arcs in the donut chart. The yellow bars show the relative size of the underlying data. You can see that three of the four arcs under-represent the size of the data.

Why is that so? It's due to the staggering. Inner circles have smaller circumferences than outer circles. The designer keeps the angles the same so the arc lengths have been artificially reduced.



The donut chart is just a pie chart with a hole punched in the middle. For both pie charts and donut charts, the data are encoded in the angles at the center of the circle. Under normal circumstances, pie charts can also be read by comparing sector areas, and donut charts using arc lengths, as those are proportional to the angles.

The area and arc interpretation fails when the designer alters the radii of the sections. Look at the following pair of pie charts, produced by filling the hole in the above donuts:


The staggered pie chart distorts the data if the reader compares areas but not so if the reader compares angles at the center. The pie chart can be read both ways so long as the designer does not alter the radii.


McKinsey thinks the data world needs more dataviz talent

Note about last week: While not blogging, I delivered four lectures on three topics over five days: one on the use of data analytics in marketing for a marketing class at Temple; two on the interplay of analytics and data visualization, at Yeshiva and a JMP Webinar; and one on how to live during the Data Revolution at NYU.

This week, I'm back at blogging.

McKinsey publishes a report confirming what most of us already know or experience - the explosion of data jobs that just isn't stopping.

On page 5, it says something that is of interest to readers of this blog: "As data grows more complex, distilling it and bringing it to life through visualization is becoming critical to help make the results of data analyses digestible for decision makers. We estimate that demand for visualization grew roughly 50 percent annually from 2010 to 2015." (my bolding)

The report contains a number of unfortunate graphics. Here's one:


I applied my self-sufficiency test by removing the bottom row of data from the chart. Here is what happened to the second circle, representing the fraction of value realized by the U.S. health care industry.


What does the visual say? This is one of the questions in the Trifecta Checkup. We see three categories of things that should add up to 100 percent. With a little more effort, we find the two colored categories are each 10% while the white area is 80%. 

But that's not what the data say, because there is only one thing being measured: how much of the potential has already been realized. The two colors is an attempt to visualize the uncertainty of the estimated proportion, which in this case is described as 10 to 20 percent underneath the chart.

If we have to describe what the two colored sections represent: the dark green section is the lower bound of the estimate while the medium green section is the range of uncertainty. The edge between the two sections is the actual estimated proportion (assuming the uncertainty bound is symmetric around the estimate)!

A first attempt to fix this might be to use line segments instead of colored arcs. 


The middle diagram emphasizes the mid-point estimate while the right diagram, the range of estimates. Observe how differently these two diagrams appear from the original one shown on the left.

This design only works if the reader perceives the chart as a "racetrack" chart. You have to see the invisible vertical line at the top, which is the starting line, and measure how far around the track has the symbol gone. I have previously discussed why I don't like racetracks (for example, here and here).


Here is a sketch of another design:


The center figure will have to be moved and changed to a different shape. This design conveys the sense of a goal (at 100%) and how far one is along the path. The uncertainty is represented by wave-like elements that make the exact location of the pointer arrow appear as wavering.





Plotted performance guaranteed not to predict future performance

On my flight back from Lyon, I picked up a French magazine, and found the following chart:

French interest rates chart small

A quick visit to Bing Translate tells me that this chart illustrates the rates of return of different types of investments. The headline supposedly says "Only the risk pays". In many investment brochures, after presenting some glaringly optimistic projections of future returns, the vendor legally protects itself by proclaiming "Past performance does not guarantee future performance."



Two unusual decisions set this chart apart:

1. The tree ring imagery, which codes the data in the widths of concentric rings around a common core

2. The placement of larger numbers toward the middle, and smaller numbers in the periphery.

When a reader takes in the visual design of this chart, what is s/he drawn to?

The designer evidently hopes the reader will focus on comparing the widths of the rings (A), while ignoring the areas or the circumferences. I think it is more likely that the reader will see one of the following:

(B) the relative areas of the tree rings

(C) the areas of the full circles bounded by the circumferences

(D) the lengths of the outer rings

(E) the lengths of the inner rings

(F) the lengths of the "middle" rings (defined as the average of the outer and inner rings)

Here is a visualization of six ways to "see" what is on the French rates of return chart:


Recall the Trifecta Checkup (link). This is an example where "What does the visual say" and "What does the data say" may be at variance. In case (A), if the reader is seeing the ring widths, then those two aspects are in sync. In every other case, the two aspects are disconcordant. 

The level of distortion is visualized in the following chart:


Here, I normalized everything to the size of the SCPI data. The true data is presented by the ring width column, represented by the vertical stripes on the left. If the comparisons are not distorted, the other symbols should stay close to the vertical stripes. One notices there is always distortion in cases (B)-(F). This is primarily due to the placement of the large numbers near the center and the small numbers near the edge. In other words, the radius is inversely proportional to the data!

 The amount of distortion for most cases ranges from 2 to 6 times. 

While the "ring area" (B) version is least distorted on average, it is perhaps the worst of the six representations. The level of distortion is not a regular function of the size of the data. The "sicav monetaries" (smallest data) is the least distorted while the data of medium value are the most distorted.


To improve this chart, take a hint from the headline. Someone recognizes that there is a tradeoff between risk and return. The data series shown, which is an annualized return, only paints the return part of the relationship. 




The French takes back cinema but can you see it?

I like independent cinema, and here are three French films that come to mind as I write this post: Delicatessen, The Class (Entre les murs), and 8 Women (8 femmes). 

The French people are taking back cinema. Even though they purchased more tickets to U.S. movies than French movies, the gap has been narrowing in the last two decades. How do I know? It's the subject of this infographic


How do I know? That's not easy to say, given how complicated this infographic is. Here is a zoomed-in view of the top of the chart:



You've got the slice of orange, which doubles as the imagery of a film roll. The chart uses five legend items to explain the two layers of data. The solid donut chart presents the mix of ticket sales by country of origin, comparing U.S. movies, French movies, and "others". Then, there are two thin arcs showing the mix of movies by country of origin. 

The donut chart has an usual feature. Typically, the data are coded in the angles at the donut's center. Here, the data are coded twice: once at the center, and again in the width of the ring. This is a self-defeating feature because it draws even more attention to the area of the donut slices except that the areas are highly distorted. If the ratios of the areas are accurate when all three pieces have the same width, then varying those widths causes the ratios to shift from the correct ones!

The best thing about this chart is found in the little blue star, which adds context to the statistics. The 61% number is unusually high, which demands an explanation. The designer tells us it's due to the popularity of The Lion King.


The one donut is for the year 1994. The infographic actually shows an entire time series from 1994 to 2014.

The design is most unusual. The years 1994, 1999, 2004, 2009, 2014 receive special attention. The in-between years are split into two pairs, shrunk, and placed alternately to the right and left of the highlighted years. So your eyes are asked to zig-zag down the page in order to understand the trend. 

To see the change of U.S. movie ticket sales over time, you have to estimate the sizes of the red-orange donut slices from one pie chart to another. 

Here is an alternative visual design that brings out the two messages in this data: that French movie-goers are increasingly preferring French movies, and that U.S. movies no longer account for the majority of ticket sales.


A long-term linear trend exists for both U.S. and French ticket sales. The "outlier" values are highlighted and explained by the blockbuster that drove them.



1. You can register for the free seminar in Lyon here. To register for live streaming, go here.
2. Thanks Carla Paquet at JMP for help translating from French.

Saying no thanks to a box of donuts

As I reported last week, the Department of Education for Delaware is running a survey on dashboard design. The survey link is here.

One of the charts being evaluated is a box of donuts, as shown below:


I have written before about the problem with donut charts (see here). A box of donuts is worse than one donut. Here, each donut references a school year. The composition by race/ethnicity of the student body is depicted. In aggregate, the composition has not changed drastically although there are small changes from year to year.

In the following alternative, I use a side-by-side line charts, sometimes called slopegraphs, to illustrate the change by race/ethnicity.


The key decisions are:

  • using slopes to encode the year-to-year changes, as opposed to having readers compute those changes by measuring and dividing
  • using color to show insights (whether the race/ethnicity has expanded, contracted or remained stable across the three years) as opposed to definitions of the data
  • not showing that the percentages within each year summing to 100% as opposed to explicitly presenting this fact in a circular arrangement
  • placing annual data side by side on the same plot region as opposed to separating them in three charts


There is still a further question of how big a change from year to year is considered material.

This is a good example of why there is never "complete data." In theory, the numbers on this chart are "complete," and come from administrative records. Even when ignoring the possibility that some of the records are missing or incorrect, you still have the issue that the students in the system from year to year varies, so a 1 percent increase in the proportion of Hispanic students can indicate a real demographic trend, or it does not.



When design goes awry

One can't accuse the following chart of lacking design. Strong is the evidence of departing from convention but the design decisions appear wayward. (The original link on Money here)



The donut chart (right) has nine sections. Eight of the sections (excepting A) have clearly all been bent out of shape. It turns out that section A does not have the right size either. The middle gray circle is not really in the middle, as seen below.


The bar charts (left) suffer from two ills. Firstly, the full width of the chart is at the 50 percent mark, so readers are forced to read the data labels to understand the data. Secondly, only the top two categories are shown, thus the size of the whole is lost. A stacked bar chart would serve better here.

Here is a bardot chart; the "dot" part of it makes it easier to see a Top 2 box analysis.


I explain the bardot chart here.


 PS. Here is Jamie's version (from the comment below):




Layered donuts have excess fats and oils

Via Twitter, Nicholas S. sent this chart:


It's a layered donut. There isn't much context here except that the chart comes from USDA. Judging from the design, I surmise that the key message is the change in proportion by food groups between 1970 and 2014. I am assuming that these food groups are exhaustive so that it makes sense to put them in a donut chart, with all pieces adding up to 100%.

The following small-multiples line chart conveys most of the information:


The story is the big jump in "Added fats and oils".  In the layered donut, the designer highlighted this by a moire effect, something to be avoided.

Note the parenthetical 2010 next to the Added fats and oils label. The data for all other food groups come from 2014 but the number for the most important category is four years older. The chart would be more compelling if they used 2010 data for everything.

One piece of information is ostensibly absent in the line chart version - the growth in the size of the pie. The total of the data increased about 20% from 1970 to 2014. In theory, the layered donut can convey this growth by the perimeters of the circles. But it doesn't appear that the designer saw this as an important insight since the total area of the outer donut is clearly more than 20% of the area of the inner donut.


An unsuccessful adaptation of a classic

Found this chart in Hemispheres magazine on board a United flight:


A quick self-sufficiency test reveals the biggest shortcoming of this visual presentation.


What would you guess is the difference in areas between the two white-ish sectors (pointing at 9 o'clock and 2 o'clock)? The actual numbers are 18.3% and 12.5%. So roughly, if one takes the 2-o'clock sector (right), halve it and add it back to itself, one should obtain the area of the 9-o'clock sector (left). Clearly, the piece on the left is much too big.

The following chart shows the index of exaggeration increasing with the value of the data. (For example, the highest value of 18.3% is about 9 times the lowest value of 2.3% but the the ratio of the areas depicted is ~500 times.)


The distortion is larger than usual because the designer encodes the data twice, once in the angle of the sector, and again in the radius. Both those quantities contribute to the area of a circle.

Readers must look at the data in order to read this chart properly, therefore the visual elements are not self-sufficient. Further, if readers chose to perceive the relative sizes of the sectors, they would have misread the data massively.


The designer was probably inspired by the Nightingale rose diagram (link to Wikipedia):


In the original, Nightingale does not encode data into the angles. The circle is divided evenly into 12 pieces to display the 12 months of the year (She might have taken into account 28-31 days; it's hard to tell by inspection). The data is encoded once along the radial axes.

Another difference between the two charts is the ordering of the data. In Nightingale's version, the order is logically determined by the passing of time. In the Hemispheres chart, the order is chosen based on taste. A more natural order would be by the proportion of employment but I think the resulting chart would look like a snail's shell, or worse. I must say a more balanced "rose diagram" looks nicer but it forces my eyes to jump around to answer a simple question such as which are the top three employment sectors in San Francisco.

Two charts that fail self-sufficiency

My twitter followers have been sending in several howlers.

Twitter (link) made a bunch of bold claims about its own influence by using the number of tweets about the Oscars as fodder. They also adopt the euphenism common to the digital marketing universe, the so-called "view", which credit to them, they define as "how many times tweets are displayed to users". Yes, you read that right, displaying is the same as viewing in this world - and Twitter is just a follower not a trend setter here.

For @dtellom, it is this bubble chart about the Ellen tweet that displeased him:



In the meantime, @wilte found this unfortunate donut chart, created by PWC in the Netherlands.


Both designers basically used appropriated a graphical form and deprived it of data. In one, the designer threw the concept of scale to the wind. In the other, the designer dumped the law of total probability. In either case, the fundamental rationale for the particular graphical form is sacrificed.

Both are examples that fail our self-sufficiency test. This test says if a visual display cannot be understood unless the entire data set is printed on the chart, then why create a visual display? In both charts, if you block out the numbers, you are left with nothing!


The PWC chart was submitted by @graphomate, who also submitted the following KPMG chart:


The complaint was the total adding up to 101%. I'm not really bothered by this as it is a rounding issue. That said, I like to "hide" such rounding issues. I have never understood why it is necessary to display the imperfection. Flip a coin and remove the decimals from one of the categories!

The exception to the rule against dual axes

Dual axes are almost always a bad idea. But there is one situation under which I'd use it.


Last week, Alberto Cairo (link) engaged in a Twitter/blogging debate about a chart that first appeared in Reuters concerning the state of the woman CEO in the Fortune 500 companies. Here is the chart under discussion:


This chart already is cleaner and more useful than the original original, which came from a research report from Catalyst (link):


Jonathan Keller re-made the Reuters chart as follows:



Cairo Jorge Camões contributed this version:


The Voila blog (link) has yet another take:


Then Chris Moore, responding to Cairo, created this view and also left some insightful comments:



What's at stake here? There are really three related topics of discussion.

First, there is the matter of the upper limit of the vertical axis. Three solutions were suggested: 100 percent, 50 percent, and 4 percent. (Cairo at one point suggested 25 percent, which can be wrapped into the 50 percent bucket.) In reality, this is an argument over which of two key messages should be emphasized. The first message is that women still comprises a pathetically small proportion of Fortune 500 CEOs. The second message is more hopeful, that the growth in this proportion has been quite rapid since 1995.

All versions of the chart actually display both messages. In the Reuters chart (as well as Moore and Cairo), the message about the absolute proportion of women is given as an annotation while the Keller and Voila versions extend the vertical axis, thus encoding this message directly to the chart. Conversely, the Keller and Voila versions deemphasize the growth in proportions, and so I'd have preferred to see a note about that growth when using their versions.

Voila selectes a 50% upper limit because the 50/50 split has an intuitive meaning in the context of gender balance. Because the resulting chart is so visually arresting, and so biased to one of the two key messages, I'd only consider it if the point of the display is to draw attention to the female deficit.


The second disagreement is in using absolute counts versus relative proportions. Moore chose absolute counts. I am in this camp as well. This is primarily because we are talking about Fortune 500 and the 500 number is an idee fixe. In Moore's version, I find the data labels distracting since all the numbers are small and insignificant.

Finally, the linkage between the absolute and the relative numbers also produces multiple solutions. Cairo's post pinpoints this issue. His solution is to include an inset pie chart with an arrow to explicitly link the two views. Moore likes the inset idea, but experimented with a donut chart or a partition in place of the pie chart. He also removes the explicit guiding arrow.


It turns out this dataset is perfectly made for the dual axes. The absolute counts and relative proportions are in one to one correspondence because it's really only one data series expressed twice. This happy situation leads to one line that can be cross-referenced on two axes, one side showing counts and the other side showing proportions. This is shown in my version below (the orange line).


In addition to having two axes, I have plotted two related data series. The second series (in red) shows the incremental change in the number of women CEOs from the previous year (also shown in both counts and proportions).

The first series (the same one everyone plotted) draws attention to the first message, that the growth rate of women CEOs is quite strong since 1995. The second series is a bit of a downer on that message, suggesting that from the absolute count perspective, the progress (only one or two additions per year) has been painfully slow, and not that impressive.

Thanks again to Alberto for making me aware of this discussion. This has been fun!


PS. I have left out the other chart and may return to it in a future post.