« August 2014 | Main | October 2014 »

Hypothesis: pie charts are correlated with muddled thinking. But which direction is the causal arrow?

RealClimate (link) deserves a pie in the face for printing a set of pie charts. (Thanks to @guitarzan for the tip.)

This is a case of the chart telling a different story from the data. Let's look at one of the charts, piece by piece.

Realclimate_methane_pies_onepieThe first pie(ce) suggests that methane and carbon dioxide (CO2) adds up to some total. That is the only way to read a pie chart. A pie chart shows components of a whole.

What is the whole? It's hard to interpret without some explanation. The title at the bottom says "Radiative Forcing change over the last 30 years" with a footnote disclosing... hold your breath... "Radiative forcings from other gases and human impact are not shown."

Really, RealClimate?

In other words, the visual object says that Radiative forcing from CO2 is about 5 times larger than that of Methane. A column chart would have displayed this relative scale more clearly.


But that chart is only one of a pair. Here is the whole picture:Realclimate_methane_pies_top


This pair tells a particular story: Methane was a much larger share of something in the past and is predicted to become an almost irrelevant share of something in the future.

But such an interpretation would almost surely be wrong. The designer left a misleading cue here, which is to show two pies of equal size. There is just no conceivable way that the total "radiative forcing change" is identical in the last 30 years to that in the next 30 years.

The second pie chart also has a footnote. A better person can help me interpret what the following sentence means:

The radiative forcing that our current emissions have committed us to, 20 years from now, is based on a 300-year initial drawdown time scale for carbon dioxide, and 12 years for methane

I'm sure these words say something to a climate expert but this attempt stinks as a piece of public communication.


Returning to the equal-size pies for a moment. Since all other factors are removed, the chart only shows us the relative impact of Methane versus Carbon dioxide. If the data are to be believed, then the scale of the impact of Methane is expected to become much smaller relative to that of CO2 in the next 30 years. This does not imply that the absolute impact of Methane will be lower in the future than in the past.

There are three possible stories, all consistent with the above chart:

1) the absolute impact of Methane declines while the absolute impact of CO2 increases, and thus the relative impact of Methane decreases drastically

2)  the absolute impacts of both decline but the impact of Methane declines a lot more

3) the absolute impacts of both increase but the increase  of Methane's impact grows a lot more slowly

It is the designer's job to make it clear to readers the story of the data.


The fact that the entire blog post contains a PDF image and no words is either laziness or arrogance. The title of the piece is "the story of methane, in five pie charts". I don't know what the story of methane is. I doubt that the intention of the author was to tell us that methane is extremely unimportant relative to CO2.


PS. Steven below linked to a response from RealClimate.org. They confirm that the "story of methane" is that it is unimportant relative to CO2. Perhaps they should have called it the "non-story of methane". They see no problem with these pie charts.





A little something while I'm away

Note: This is cross-posted to both my blogs.

I have been on vacation. Regular posts will resume next week. Before then, here is a little something for you.

ASA News recently asked me to describe "a day in the life of" a business statistician. My response is here, together with responses by two others. 

As a statistician, I am worried about describing an "average" day for an "average" statistician. Better think of this as a day selected from a normal distribution that is close to its center.


If you have even more time, Andrew Gelman thinks these six posts are worth re-reading.

Happy weekend!


Relevance, to you or me: a response to Cairo

Alberto Cairo discussed a graphic by the New York Times on the slowing growth of Medicare spending (link).

Medicarespend_combinedThe chart on the top is published, depicting the quite dramatic flattening of the growth in average spending over the last years--average being the total spend divided by the number of Medicare recipients. The other point of the story is that the decline is unexpected, in the literal sense that the Congressional Budget Office planners did not project its magnitude. (The planners did take the projections down over time so they did project the direction correctly.)

Meanwhile, Cairo asked for a chart of total spend, and Kevin Quealy obliged with the chart shown at the bottom. It shows almost straight line growth.

Cairo's point is that the average does not give the full picture, and we should aim to "show all the relevant data".


I want to follow that line of thinking further.

My first reaction is Cairo did not say "show all the data", he said "show the relevant data".  That is a crucial difference. For complex social problems like Medicare, and in general, for "Big Data", it is not wise to show all the data. Pick out the data of interest, and focus on those.

A second reaction. How can "relevance" be defined? Doesn't it depend on what the question is? Doesn't it depend on the interests and persuasion of the chart designer (or reader)? One of the key messages I wish to impart in my book Numbersense (link) is that reasonable people using uncontroversial statistical methods to analyze the same dataset can come to different, even opposite, conclusions. 

Statistical analysis is concerned with figuring what is relevant and what isn't. This is no different from Nate Silver's choice of signal versus noise. Noise is not just what is bad but also what is irrelevant.

In practice, you present what is relevant to your story. Someone else will do the same. The particular parts of the data that support each story may be different. The two sides have to engage each other, and debate which story has a greater chance of being close to the truth. If the "truth" can be verified in the future, the debate is more easily settled.

Unfortunately, there is no universal standard of relevance.


Going back to the NYT story. The chart on total Medicare spending is not as useful as it may seem. This is because an aggregate metric like this for a social phenomenon is influenced by a multitude of factors. Clearly, population growth is a notable factor here. When they use the word "real", I don't know if this means actualized (as opposed to projected), or "in real terms" (that is, inflation adjusted). If not the latter, the value of money would be another factor affecting our interpretation of the lines.

Without some reference levels for population and value of money, it is hard to interpret whether the straight-line growth implies higher or lower spending intensity. For the second chart, I suggest plotting the growth in the number of Medicare recipients. I believe one of the goals of the Affordable Care Act is to reduce the ranks of the uninsured so a direct depiction of this result is interesting.

The average spend can be thought of as population-adjusted. It is a more interpretable number -- but as Cairo pointed out, it is also narrow in scope. This is a tradeoff inherent in all of statistics. To grow understanding, we narrow the scope; but as we focus, we lose the big picture. So, we compile a set of focal points to paint a fuller picture.



Try a new way of learning dataviz; course announcement




Fall 2014 (Oct 6 - Nov 24, Mondays 6:30-9:30)

New York University

Instructor: Kaiser Fung

Location: New York City


Learn how to make knock-out data visualization in an innovative, immersive and fun setting, with classmates who are similarly passionate about making the numbers speak visually.


The class is conducted in the style of creative-writing workshops. Each student will focus on one data visualization project during the term, and gain knowledge through drafting and revisions, offering and receiving critique, and above all, learning from others.


You will develop a discriminating eye for good visualizations. For students enrolled in the Certificate in Data Visualization, the course offers an ideal setting to demonstrate mastery of the integrated approach combining the perspectives of statistical graphics, graphical design, and information visualization.


Prerequisite: We welcome students from all backgrounds. A more diverse class makes a better experience for everyone. In order to be a full participant in the course, you should have prior experience making data graphics for an audience (broadly defined), and feel comfortable offering critique of others’ work.


Because of the workshop structure, enrollment is limited to 12 students. Enroll now to reserve your spot.



Playing with orientation and style

I saw this nifty chart in the Wall Street Journal last week. The Post Office is competing with Fedex and UPS on pricing. The nice feature about this small dataset is that the story is very clear. In almost every setting, the old USPS prices were higher than those of Fedex and UPS, but have been reduced to below those levels.


Below are a couple of different looks. I like the vertical scale for prices better. Long-time readers will know I prefer the second version with lines.



Exquisite chart by-of-for academics

This chart published in Harvard Magazine has won my heart.


It is well executed in many ways. The chart illustrates a study of time spent by assistant and associate professors. It focuses specifically on time spent working versus time spent on household chores. One of the obvious questions of the study is whether female professors are disadvantaged when they have family obligations.

The general visual framework is the profile chart. Four segments of professors are arranged left to right from single with no children to married, with children and both parents working or single parent. The chart makes these points clear:

  • Having children adds about 15-30 hours to time spent on household duties, per partner
  • Household duties are not evenly split by gender, with the expected bias. (Of course, this observation must be carefully vetted. The men and women are not married to each other, even on the right side of the chart. But I presume the usual interpretation should hold.)
  • Male professors with kids do spend more time on household chores than those without but not as much as female professors with kids

In the meantime, the amount of time spent working is about the same for all four segments, raising a side question: what other activities got displaced? The juxtaposition of the lines allows us to see that the displaced hours are almost 50 percent of the total time spent working! What did they do less of?

I especially like the explicit depiction and labeling of the "gender gap" (the orange vertical lines). Also, the use of median hours instead of average hours.

My one little complaint is that the designer forgot to tell us the hours are off a weekly basis (I'm guessing here). Just adding "per week" after "median hours" would have fixed this. 


One simple chart cannot address all possible questions on such a complicated subject. I like the restraint the designer exercised in not saddling the chart with too many questions.

I will just mention one tricky statistical issue. Getting tenure and making babies are both activities that occur within some time window in a professor's life, if at all. So there is a survivorship bias. The professors who receive tenure drops out of the picture. If you are older, and still in the pool, you probably are less "accomplished" from the perspective of the tenure-granting process. The longer you stay in that pool, the more likely you will have gotten married and/or have children--thus, there is an age bias going from left to right, as well as a survivorship bias. This implies that the characteristics of the professors in the four groups are likely to be different not just on their marital and child-rearing statuses but also on age and probability of tenure.