Happy to see a fantastic graphic in NYT this past Saturday. The chart is a variant of Tukey's box plot, which essentially summarizes the distribution of a data series, displaying especially its dispersion.

Typically, a box plot contains a five-number summary. The version used here has three numbers: max, min and the most current cash incentive.

Having the ten box plots side by side is a powerful way to compare different groups of objects. Even better is the care taken to sort the car type from largest current incentive to smallest. The chart is really powerful as the reader can glean many insights at a glance, for instance:

- Lincoln and Cadillac generally have the best incentives while Lexus and Acura offer much less
- Mercedes, Saab and BMW have changed their incentive structure the most in terms of the range of incentives
- February was a good month to buy Saab, Mercedes, Volvo, Infiniti or even Lincoln as the incentive levels for these brands are close to the 12-month maxima
- It is a particularly great time to get a Mercedes because the current incentive is the highest in the past 12 months among a huge range
- On the other hand, it may not be wise to buy Cadillac or Lexus

Some minor improvements can be made to the chart. The lines linking the left edges of the boxes to the vertical axes are redundant.

More seriously, the "average incentive" row at the bottom tends to confuse rather than enlighten. The minimum "average incentive" represents the average incentive across the 9 brands in some specific month. Say that month is August. Then the minimum = [X1(8) + ... X9(8)]/9 where (8) means August and X is the incentive. The reader is asked to compare this number to the minima of each of the other boxes but this is apples to oranges. For example, if Lexus offered the minimum incentive in January, then the left end of the Lexus box = X9(1) where (1) means January and X9 indicates Lexus incentive. (Notice that X9(8) not X9(1) was used in the minimum "average" incentive calculation.)

Therefore, the only useful number in the last row is the current month's average incentive across all 9 brands. This average can easily be eye-balled by looking over the first 9 rows. The last row should be removed.

A further variant of this chart would be a dot plot. So instead of using just the max and min, print all 12 data points, perhaps using smaller dots for everything other than the current month. Such a treatment would, for instance, allow us to judge whether Mercedes had many months of low incentives or just one month of low incentives (causing the box to become so wide).

In summary, this graphic is much more informative and occupies much less space than most newspaper charts, and totally worthy of this newspaper.

Reference: New York Times, March 18 2006.

PS. Can't let this post appear without a rant... when will Excel include Tukey's box plot as one of the key chart types?

## Recent Comments