Redundancy
Feb 03, 2008
Nick B., who occasionally writes about statistical graphics, found some classic chart junk from a Canadian report on the Afghan army. Here's one example, together with the junkchart version.
Redundancy is an enemy of good graphics, and incongruous redundancy is worse. Here, troop level is variously described as "total force size", "strength" and "army growth"; the chart on the right uses only the army concept. The data labels ("47000 Strength"), the axis labels ("50000 Total Force Size"), and the gridlines all germinate from the five grand data points underlying the entire chart!
Another distorting feature is that use of different-sized time intervals, which we space out appropriately on the right chart.
Ultimately, the key message should be growth in the army size, not the absolute number of troops. The slopes of the line segments encode this information. Alternatively, a data table can be rather powerful for simple data like this:
By what is called the "end state", there would be 70% more troops than those as of December 2007.
As relative change is the interest, the graph would be better with the origin at zero.
Posted by: Ken | Feb 04, 2008 at 02:15 AM
Interesting post and blog! But the graph on the right does tend to suggest a multiple increase in troop size, although it is 'only' 70%. As Ken suggested, maybe setting the origin to zero would solve the issue.
Keep up the awesome work on your blog.
Posted by: n | Feb 04, 2008 at 04:31 AM
Ratios need an origin, but differences don't. So as an alternative to setting the origin to zero, you could transform the scale to a logarithmic one. On a log scale, ratios translate to differences, which don't need an origin.
I'm not suggesting it seriously in this particular case, though. Because the ratio is 1.7, a linear scale is quite comfortable. Showing a ratio of 1.007 would have been a different matter. Then, a log scale would be unavoidable.
Posted by: derek | Feb 04, 2008 at 06:42 AM
The table shows the change, but not the rate of change (the slope), and not the variation of this rate. The only improvement over the "junkchart" version of the graphic might be the log scale suggested by Derek. However, with these numbers, there isn't really much difference between the log- and linear-scaled plots.
Posted by: Jon Peltier | Feb 04, 2008 at 08:03 AM
Note: There's a labeling error in your data table -- the 3rd element should be 3/09 not 3/08.
Posted by: timz | Feb 04, 2008 at 12:43 PM
Not at all related to the post is this presentation featuring an interesting way to represent data: http://www.insna.org/pdfs/Power.pdf
Posted by: Rettaw | Feb 09, 2008 at 10:56 AM