Hiding the message with bar charts
Sep 15, 2005
Since the purpose of data visualization is to reveal insights, a chart that hides its message is bad; the reader can recognize a bad chart within seconds.
Sometimes, good advice can be taken to extremes. Here is an example from the Wall Street Journal (on the left):
While the caption informs us that sales of Civic cars have declined since 1998, this message is buried in tall and thick bars that are similar in height, the downward trend not at all obvious. Possibly, the designer is taking advice seriously; I'm sure a guru or two has written about it, that honest chartists start the vertical axis at 0 so as not to exaggerate small differences. This tip is somewhere in one of these seminal books, by Tufte or by Wainer.
However, for this chart, the difference is the message, and starting the axis at 0 serves to hide this message. The junkchart version fixes the vertical scale and instead uses a line chart. The downward slope is now clear but we also observe that sales have fluctuated over the years. Prof. Gelman, a sometime visitor here, also has also suggested that line charts are often more effective than bar charts.
Reference: "Honda, in a Funk, Tries to Revive the Civic's Virtues", Wall Street Journal, Sept 9, 2005.
I would keep the zero line. But I think the key is to add information. I'd take the x-axis back to 1975 (or whenever it was that the Civic was introduced) and I'd put in a few more lines for other cars (maybe other Hondas, maybe other small cars like Subarus; I dunno, I'm not the automotive reporter).
Making a trend out of only 5 years seems to me to be short-term thinking; I'd expect the Wall Street Journal to know better.
P.S. "How to Lie With Statistics" (by Huff, from the 1950s) has the start-at-zero rule. Tufte or Cleveland or somebody else actually made a point of saying you don't have to start at 0. But in this particular example I actually would.
Posted by: Andrew Gelman | Sep 15, 2005 at 08:58 PM
The writer seems to be trying to make two points:
1. That sales have declined since 1998.
2. That there was a peak in 1998.
The graph only addresses point 1. I'd go with the line chart with the zero line but with data from several years before 1998 added in.
Posted by: Tom Fanshawe | Sep 16, 2005 at 06:49 AM
Both Andrew and Tom suggest adding data before 1998 to help set the context. Both Andrew and Tom think that the zero line should be shown.
As it happens, I have a graphic that looks at both these ideas, inspired by a global-warming skeptic's observation that temperatures have been decreasing since 1998 (the same year as in the Honda graphic). The supporting evidence for this (true) claim could be something like the upper left panel of the graphic here.
The graphic shows global annual mean temperature in four ways: the two left-hand panels show the data from 1998-2004 while the right-hand panels add data from 1856 onward. The top two panels show the range of the data; the bottom two show the zero line. In each panel I have highlighted the 1998 data point in red.
I would have said that of these four presentations the best "context" to evaluate the claim would be the upper right panel, which adds data but omits the zero line.
Posted by: Robert | Sep 17, 2005 at 12:47 AM
Robert, you saved me the effort to do the plots! The start-at-0, in my view, should be applied with discretion. In this case, I'd prefer to highlight the annual changes, which would be squeezed if the axis starts at 0.
The other point about identifying 1988 as the peak is well taken.
Also, thanks Prof. Gelman for correcting the references. Huff's book is here
Posted by: Kaiser | Sep 17, 2005 at 01:14 AM
Start-at-0 is relevant when 0 is a meaningful comparison. Zero Kelvin is so far from the earth's temperature that it is a terrible idea to include it on the plot. In contrast, zero sales are relevant, especially if you're looking at car sales which could very well vary by a factor of two or more over decades. For that matter, if you start the graph with the introduction of the Civic in 1973, zero is a very reasonable comparison since they didn't sell any in 1972! The point is, if you look at car sales over any reasonable length of time, you'll see a lot of variation. Including zero gives an immediate visual sense of how much this variation is, compared to the total sales. I'm no expert on car sales but my guess is that a drop from 324000 to 300000 really isn't so much from a historical perspective. In contrast, yeah, one degree of global temperature is a lot. Comparing to zero Kelvin just isn't so relevant.
The point of start-at-0 is really that if there is a reasonable comparison point it should be shown. When plotting %Republican share of vote over time, I don't bring the plot down to 0 or up to 1, but I put a dotted line at 50%.
Posted by: Andrew Gelman | Sep 17, 2005 at 09:45 PM
Of course, zero kelvin is a silly start point. I could have started at zero celsius and it still would have been silly.
I tell my students that they have to figure out what the story they want to tell is, and that helps to determine the context. If they're telling a story about the level of sales compared to other cars, they very well may (though sometimes not) include the zero. If they're telling a story about the decrease in sales, which this article was supposed to be about, they ought to include the context to tell if the sales were decreasing or if it was just bouncing around or if 1998 was an unusually high year. In that sense, the top of the sales distribution is arguably more relevant than its bottom.
Posted by: Robert | Sep 18, 2005 at 06:12 AM
Yes, zero celsius is silly too. Although, if you were making a graph of freezing points of various liquids, I would like to see zero celsius there as a reference line. But for car sales, even with just one car model, I'd be inclined to include 0. It's not a rigid rule, and I'm sure there are cases where it would be better not to include 0 in similar situations--but for the Honda example, I think zero is very relevant to see the total range of the data.
I think that in this case we only differ in that I'd include 0 as a default for the car sales, and you would include 0 as an option.
I agree totally about "telling the story" and, in fact, explain to students that the most important audience for their story is . . . themselves. Many times I've done the effort to make a nice-looking graph, and as a result I've learned something important.
Sort of like how, if you have to write your ideas up formally for your boss (or for a scientific journal), the effort required to make things clear for others often pays off in making things clear for oneself.
Posted by: Andrew Gelman | Sep 18, 2005 at 11:40 AM
I agree totally about "telling the story" and, in fact, explain to students that the most important audience for their story is . . . themselves.
It's kinda interesting how much our teaching philosophies seem to overlap--though I tell students that having oneself as the audience is necessary but insufficient. Working on a chart has often helped me to understand something but sometimes that's not enough. Usually junkcharts get produced because of a lack of understanding; too often they get produced in spite of it. Understanding and communicating that understanding are different things, much to my regret: that's why I don't have tenure.
Posted by: Robert | Sep 19, 2005 at 03:35 AM
i just came across tuftes argument concerning the zero point: "By the way, real scientists don't show the zero point; they show the data. In general, the zero point should only be shown if it occurs reasonably near the range of the actual data. Instead of empty space vertically reaching down to a number which never occurs empirically, the way to show context is more data horizontally. Note that The Visual Display of Quantitative Information never recommends showing zero-points. See pp. 74-75 for a seqeuence of displays that provide increasing context by showing more data horizontally rather than reaching down to a zero point." (http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0000Jr&topic_id=1&topic=Ask%20E%2eT%2e)
Posted by: 失踪 | Aug 15, 2006 at 06:03 PM