Redundancy

Feb 03, 2008

Nick B., who occasionally writes about statistical graphics, found some classic chart junk from a Canadian report on the Afghan army.  Here's one example, together with the junkchart version.

Redundancy is an enemy of good graphics, and incongruous redundancy is worse.  Here, troop level is variously described as "total force size", "strength" and "army growth"; the chart on the right uses only the army concept.  The data labels ("47000 Strength"), the axis labels ("50000 Total Force Size"), and the gridlines all germinate from the five grand data points underlying the entire chart!

Another distorting feature is that use of different-sized time intervals, which we space out appropriately on the right chart.

Ultimately, the key message should be growth in the army size, not the absolute number of troops.  The slopes of the line segments encode this information.  Alternatively, a data table can be rather powerful for simple data like this:

By what is called the "end state", there would be 70% more troops than those as of December 2007.

You can follow this conversation by subscribing to the comment feed for this post.

As relative change is the interest, the graph would be better with the origin at zero.

Interesting post and blog! But the graph on the right does tend to suggest a multiple increase in troop size, although it is 'only' 70%. As Ken suggested, maybe setting the origin to zero would solve the issue.

Keep up the awesome work on your blog.

Ratios need an origin, but differences don't. So as an alternative to setting the origin to zero, you could transform the scale to a logarithmic one. On a log scale, ratios translate to differences, which don't need an origin.

I'm not suggesting it seriously in this particular case, though. Because the ratio is 1.7, a linear scale is quite comfortable. Showing a ratio of 1.007 would have been a different matter. Then, a log scale would be unavoidable.

The table shows the change, but not the rate of change (the slope), and not the variation of this rate. The only improvement over the "junkchart" version of the graphic might be the log scale suggested by Derek. However, with these numbers, there isn't really much difference between the log- and linear-scaled plots.

Note: There's a labeling error in your data table -- the 3rd element should be 3/09 not 3/08.

Not at all related to the post is this presentation featuring an interesting way to represent data: http://www.insna.org/pdfs/Power.pdf

The comments to this entry are closed.