Every chart, even if the dataset is small, deserves care. Long-time reader zbicyclist submits the following, which illustrates this point well.
The following comments are by zbicyclist:
This is from http://win.niddk.nih.gov/statistics/ -- from the National Institute of Diabetes and Kidney Diseases, part of the U.S. National Institutes of Health.
The pie chart is terrible in a pedestrian way – a bar chart could be so much clearer, or even a table. You have to do too much work to match up the colors, numbers and labels on the pie chart.
To the right of the pie is a bar chart, but a bar chart in which the categories are nested – extreme obesity is part of obesity, extreme obesity and obesity are part of overweight or obesity. If we want to do something like this, there should be 3 charts (e.g. space on the x axis indicating a break). The normal expectation for a bar graph is that the categories are mutually exclusive. This problem is repeated in the Race/Ethnicity graph just below these.
Now, some comments by me.
Another issue of the design is inconsistency. The same color scheme is used in both charts but to connotate different concepts.
Put yourself at the moment when you just understood the chart on the left side. You figured out that obesity is deep green while extreme obesity is light green. Now you shifted your attention to the column chart. You were expecting the light green columns to indicate extreme obesity, and the deep green, obesity. And yet, the light/dark green represents a male-female split.
Here is a stacked column chart showing that females are more likely than males to be either extremely obese or not overweight. In other words, the female distribution has "fatter tails".
I learned the most upsetting thing about this chart when re-making it: the listed percentages on the pie chart added up to 106 percent.
This is a case of the chart telling a different story from the data. Let's look at one of the charts, piece by piece.
The first pie(ce) suggests that methane and carbon dioxide (CO2) adds up to some total. That is the only way to read a pie chart. A pie chart shows components of a whole.
What is the whole? It's hard to interpret without some explanation. The title at the bottom says "Radiative Forcing change over the last 30 years" with a footnote disclosing... hold your breath... "Radiative forcings from other gases and human impact are not shown."
In other words, the visual object says that Radiative forcing from CO2 is about 5 times larger than that of Methane. A column chart would have displayed this relative scale more clearly.
But that chart is only one of a pair. Here is the whole picture:
This pair tells a particular story: Methane was a much larger share of something in the past and is predicted to become an almost irrelevant share of something in the future.
But such an interpretation would almost surely be wrong. The designer left a misleading cue here, which is to show two pies of equal size. There is just no conceivable way that the total "radiative forcing change" is identical in the last 30 years to that in the next 30 years.
The second pie chart also has a footnote. A better person can help me interpret what the following sentence means:
The radiative forcing that our current emissions have committed us to, 20 years from now, is based on a 300-year initial drawdown time scale for carbon dioxide, and 12 years for methane
I'm sure these words say something to a climate expert but this attempt stinks as a piece of public communication.
Returning to the equal-size pies for a moment. Since all other factors are removed, the chart only shows us the relative impact of Methane versus Carbon dioxide. If the data are to be believed, then the scale of the impact of Methane is expected to become much smaller relative to that of CO2 in the next 30 years. This does not imply that the absolute impact of Methane will be lower in the future than in the past.
There are three possible stories, all consistent with the above chart:
1) the absolute impact of Methane declines while the absolute impact of CO2 increases, and thus the relative impact of Methane decreases drastically
2) the absolute impacts of both decline but the impact of Methane declines a lot more
3) the absolute impacts of both increase but the increase of Methane's impact grows a lot more slowly
It is the designer's job to make it clear to readers the story of the data.
The fact that the entire blog post contains a PDF image and no words is either laziness or arrogance. The title of the piece is "the story of methane, in five pie charts". I don't know what the story of methane is. I doubt that the intention of the author was to tell us that methane is extremely unimportant relative to CO2.
PS. Steven below linked to a response from RealClimate.org. They confirm that the "story of methane" is that it is unimportant relative to CO2. Perhaps they should have called it the "non-story of methane". They see no problem with these pie charts.
Carl Bialik used to be the Numbers Guy at Wall Street Journal - he's now with FiveThirtyEight. Apparently, he left a huge void. John Eppley sent me to this set of charts via Twitter.
This chart about Citibike is very disappointing.
Using the Trifecta checkup, I first notice that it addresses a stale question and produces a stale answer. The caption below the chart says "the peak times ... seem to be around 9 am and 6 pm." What a shock!
I sense a degree of meekness in usnig "seem to be". There is not much to inspire confidence in the data: rather than the full statistics which you'd think someone at Citibike has, the chart is based on "a two-day sample last autumn". The number of days is less concerning than the question of whether those two autumn days are representative of the year. Curious readers might want to know what data was collected, how it was collected, and the sample size.
Finally, the graph makes a mess of the data. While the black line appears to be data-rich, it is not. In fact, the blue dots might as well be randomly scattered and connected. As you can see from the annotations below, the scale of the chart makes no sense.
Plus, the execution is sloppy, with a missing data label.
The next chart is not much better.
The biggest howler is the choice of pie charts to illustrate three numbers that are not that different.
But I have to say the chart raises more questions than it answers. I am not an expert in pregnancy but doesn't a pregnant woman's weight include the weight of the baby she's carrying? So the more weight the woman gains, on average, the heavier is her baby. What a shock!
The last and maybe the least is this chart about basketball players in the playoff.
It's the dreaded bubble chart. The players are arranged in a perplexing order. I wonder if there is a natural numbering system for basketball positions (center = #1, etc.), like there is in soccer. Even if there is such a natural numbering system, I still question the decision to confound that system with a complicated ranking of current-year playoff players against all-time players.
Above all, the question being asked is uninteresting, and so the chart is uninformative. A more interesting question to me is whether the best players are playing in this year's playoff. To answer this question, the designer should be comparing only currently active players, and showing the all-time ranks of those players who are playing in the playoffs versus those who aren't.
Dual axes are almost always a bad idea. But there is one situation under which I'd use it.
Last week, Alberto Cairo (link) engaged in a Twitter/blogging debate about a chart that first appeared in Reuters concerning the state of the woman CEO in the Fortune 500 companies. Here is the chart under discussion:
This chart already is cleaner and more useful than the original original, which came from a research report from Catalyst (link):
Jonathan Keller re-made the Reuters chart as follows:
Then Chris Moore, responding to Cairo, created this view and also left some insightful comments:
What's at stake here? There are really three related topics of discussion.
First, there is the matter of the upper limit of the vertical axis. Three solutions were suggested: 100 percent, 50 percent, and 4 percent. (Cairo at one point suggested 25 percent, which can be wrapped into the 50 percent bucket.) In reality, this is an argument over which of two key messages should be emphasized. The first message is that women still comprises a pathetically small proportion of Fortune 500 CEOs. The second message is more hopeful, that the growth in this proportion has been quite rapid since 1995.
All versions of the chart actually display both messages. In the Reuters chart (as well as Moore and Cairo), the message about the absolute proportion of women is given as an annotation while the Keller and Voila versions extend the vertical axis, thus encoding this message directly to the chart. Conversely, the Keller and Voila versions deemphasize the growth in proportions, and so I'd have preferred to see a note about that growth when using their versions.
Voila selectes a 50% upper limit because the 50/50 split has an intuitive meaning in the context of gender balance. Because the resulting chart is so visually arresting, and so biased to one of the two key messages, I'd only consider it if the point of the display is to draw attention to the female deficit.
The second disagreement is in using absolute counts versus relative proportions. Moore chose absolute counts. I am in this camp as well. This is primarily because we are talking about Fortune 500 and the 500 number is an idee fixe. In Moore's version, I find the data labels distracting since all the numbers are small and insignificant.
Finally, the linkage between the absolute and the relative numbers also produces multiple solutions. Cairo's post pinpoints this issue. His solution is to include an inset pie chart with an arrow to explicitly link the two views. Moore likes the inset idea, but experimented with a donut chart or a partition in place of the pie chart. He also removes the explicit guiding arrow.
It turns out this dataset is perfectly made for the dual axes. The absolute counts and relative proportions are in one to one correspondence because it's really only one data series expressed twice. This happy situation leads to one line that can be cross-referenced on two axes, one side showing counts and the other side showing proportions. This is shown in my version below (the orange line).
In addition to having two axes, I have plotted two related data series. The second series (in red) shows the incremental change in the number of women CEOs from the previous year (also shown in both counts and proportions).
The first series (the same one everyone plotted) draws attention to the first message, that the growth rate of women CEOs is quite strong since 1995. The second series is a bit of a downer on that message, suggesting that from the absolute count perspective, the progress (only one or two additions per year) has been painfully slow, and not that impressive.
Thanks again to Alberto for making me aware of this discussion. This has been fun!
PS. I have left out the other chart and may return to it in a future post.