When not to use bars
Nov 29, 2005
Hope everyone had a great Thanksgiving. This weekend, I came across two examples of poorly-executed bar charts, both from the Economist. (More on bar charts here, here, here, here.)
In both cases, an additional symbol (a range, a dot) was superimposed on the bar chart, which is an act both obfuscating and ugly. It is made painfully clear that each bar contains only one piece of data, completely indicated by its top edge; in other words, one can replace any bar with just its top edge, which is what I have done in each case.
In the first example, the baseline estimates of people living with HIV show up more clearly. (I'm not sure why upper and lower estimates are included for years past as they should have official counts.)
In the second example, the focus on the gap between official and actual retirement ages is restored and emphasized.
It would not be proper to sign off without revisiting the start-at-zero rule (start here or here). In both the above charts, I have chosen not to start at zero. I assume that the point of these charts is to illustrate recent changes in the depicted variables (Andrew will want to see longer time series, I'm sure.) If I start these charts at zero, I run into difficulty deciding the separation of the tick labels: in order to capture the differences which are squeezed into a small range (due to the narrow date range), I'd have to use a lot of ticks, most of which are useless outside the range of the data!
Reference: "Spin Doctors" and "Must Try Harder", Economist, Nov 26, 2005.
I agree that your version of the HIV chart looks better, but I'm not sure about the chart for retirement age. You've certainly maximized the data-to-ink ratio, but now the points just kind of hang out there, unmoored. Probably the most significant point the data make is that Japan's official retirement age does not correspond with reality (nor with the official retirement ages of other industrialized countries). However, the only visual clue to this fact is the presence of a faint gray dot.
These data are similar to the situation where there is a formula that predicts something (e.g. monthly energy use) and we want to compare the predicted and actual values. One solution I have tried -- and I'm not sure it's the right one -- is to connect the actual and predicted values with a line. The line is red if actual is less than predicted, black otherwise.
Posted by: John S. | Nov 29, 2005 at 11:43 AM
I think that one of the reasons the bar chart is so popular is that it paints broad strokes of ink (particularly striking when color is used), giving the figure a kind of visual punch. The original figures above can be seen from halfway across a room, whereas the redrawn versions nearly disappear (admittedly I'm not wearing my glasses, but I think the point holds). However, the redrawn HIV figure could achieve a similar effect using vertical colored boxes with light horizontal lines indicating the baseline estimates.
Another reason for the popularity of bar charts may be that they connect the label with the estimate, thereby avoiding the points-hanging-out-there effect that John S noted in the redrawn retirement age figure.
Regarding the retirement age figure, John S suggests connecting pairs of values with lines. An arrow might work, since its direction could be readily interpreted. But maybe the data shown in the figure are incomplete. The official retirement age in a country is a single number, but I'm guessing the "effective retirement age" is a mean. It might be more informative to see the mean + - 1 or 2 standard deviations (or perhaps the 25th, 50th, and 75th percentiles).
Posted by: Nick Barrowman | Nov 29, 2005 at 11:48 PM
Further to my comment above, I have put an example of the alternative display I suggested on my blog. I'd be very interested in feedback!
Posted by: Nick Barrowman | Nov 30, 2005 at 07:14 PM
Hi Nick, thanks for your thoughtful comments. The reason why I prefer lines rather than bars is that the addition of a second dimension (bar width) is somewhat distorting, drawing our attention to the area rather than height of the bars. I'll explain this point in a future post.
Responding to you and John both, I think two additional features will help the retirement age plot: (1) order the countries based on the age gap, rather than on actual retirement age -- this will do a lot to help focus the attention on the gap; (2)
connect the gray dots -- this is similar to lines in so-called interaction plots for regression.
Posted by: Kaiser | Nov 30, 2005 at 10:54 PM