« March 2006 | Main | May 2006 »

Stacks and groups

HuexpensessmThis stacked, grouped bar chart is a mess!  There isn't much right about it: the colors are blinding, the group labels are taxing, the grouping is obscure, the scale should have been in millions, and neither axes have labels.

Stacked bars, sometimes used in place of pie charts, are not much of an improvement.  For example, it is difficult to read from this chart the operating expenses for the medical school because it is at not the bottom of the stack.

RedoharvardBy simplifying, the junkart version manages to gain clarity.  The colors are not necessary but I include them to provide reference to the original chart.  Apparently, the author saw it fit to cluster the departments into three groups, namely the 4 largest faculties (blue), all other academic departments (green), and non-academic departments (orange).  On this chart, one can easily see that the Medical School spent close to $500 million in 2004.

Reference: Harvard Magazine, May-June 2006, p. 75.

Light entertainment V

One thing we trust about Northern Trust is the entertainment value of its current ad campaign.  (Also see our old posts, like here.)   Here is a close-up of the chart:

Apparently one should pick the "wrong" financial partner in January and February, and pick the "right" partner the rest of the year except if it is in March, then one should throw the die to decide!  Confused?  call Rick!

Seeing the "wide divide"

The report on booming executive pay in the NYT included an informative and entertaining graphic (read our previous commentary on whether good charts can be entertaining).  To see this super-sized graphic, click here.

This chart has numerous virtues, beginning with its ability to summarize and visualize a wealth of data.  Each point in the chart is a ratio of average executive pay to average work pay.  Each pay level is itself an average of much data on individual pay.  The graph tracks history all the way back to 1940 and thus allows us to see a long-term trend.

The key message is clear: the "wide divide" persists no matter which statistic one focuses on:

  • median (black line)
  • middle 50% (what statisticians call "interquartile range")
  • 90th percentile

Indeed, the inclusion of multiple statistics enhances the chart.  It puts into context, for example, the outlier data point (90th percentile) of 2000.

Although the text does not point this out, an important feature is the fast-growing, ever-worsening dispersion among the companies.  The median disparity has risen significantly but within a smaller set of corporations, the disparity has literally exploded.


Reference: Executive Compensation: A Special Report, New York Times.

Does it make sense?

Nyt_lexusThere is nothing wrong with the way this chart is constructed.  I'd probably have the labels all set to the right in a column but otherwise, no real issues.

However, this chart bugs me in another way.  I just cannot make sense of it.  The year 1999 appears to be some kind of watershed year for these 4 automakers: they roughly split the market that year.  It is very strange that all the lines would meet at a point and then spread out again.  It seems very unnatural.  It makes me wonder if we are looking at bad data.  Anyone have access to the underlying data or knowledge of what kind of watershed was 1999?

Reference: "An Ambitious Lexus Takes On the Europeans", New York Times, April 15 2006.

Glass Half Full

2158820411pollThere are many reasons to commend this data graphic, from the Sacramento Bee, including:

  • Putting the no opinion group in the middle so that readers can compare the disapproves and the approves directly, utilizing both sides of the vertical axis
  • Sensible omission of either axes, since the relevant data have already been printed on the chart
  • Effective use of color
  • Good spacing in the legend to coincide with the bars
  • Inclusion of other data useful to interpreting the information, such as sample size, margin of error and sampling population

RedofieldpollTo the right is the junkart version, which improves upon the original in a few ways:

  • The bars are ordered in the only sensible way, by increasing proportions of those approving of the President.  Notice that if one were to focus on disapproval ratings, then Bush 2's is nothing unusual
  • The data labels inside the bars are aligned with the verticals, vastly enhancing readability
  • Omitting the data for "no opinion".  This reduces clutter without sacrificing information

Boxplot as gridlines

Fund management companies provide a great source of materials for this blog.  Compare Evergreen's pitch with Pioneer's. (The entire ad is here.)
Perhaps this ad is targeted at highly sophisticated readers; it appeared in the Institutional Investor magazine.  Otherwise, they cannot assume a casual reader understand "universe comparison" and "alpha", and how these two concepts relate to each other (or not).

While boxplots have been tailored in many effective ways, this particular variation on the boxplot would surely have Tukey turning in his grave.  This is the boxplot used as gridlines. The six boxes in the plot are identical, and each includes the 10th, 25th, 50th, 75th and 90th percentiles.  This information duplicates what has already been given in the vertical axis.

More seriously, the vertical scale should never have been percentiles.  It should be the net return rate.  In this way, the reader can compare the level of returns at different time scales.  The astute reader will notice that the return was only 0.7% better than S&P during the last five years, even though
in terms of separation of the dots, this performance appear equally as strong as during the last year when the performance difference was 3.2%. 

And finally, a question for those in the fund industry: it troubles me that the chosen benchmark (S&P) performed at the 90th percentile in the last year.  There appears to be two different benchmarks at play in this graph, one being the S&P, the other being the universe of funds included in the computation of percentiles.  Therefore, if S&P were the right benchmark for this universe, one would expect its performance to be roughly at the median of the universe.  This is clearly not the case, by which I interpret to mean that S&P is not the right benchmark.  Am I missing something here?

Bubbles, troubles

On March 23, NYT served up a double dose of bubble trouble in the business pages.Nytbubbletrouble1 Nytbubbletrouble2For the record:

Both these displays contain very little data and perhaps the only way to read their intention is to see them as decorated data tables, in other words, as objets d'art rather than data displays.  The cutoffs and overlaps warn us against gleaning anything from the size of these bubbles.

Reference: "Who Will Work the Farms?" and "G.M. and Auto Union Reach Deal to Cut Work Force", New York Times, March 23 2006.

Aesthetics and function

Sf_gaspricegrfHere's a pretty effective graphic that shows the rising trend in gas/petrol prices since January, and in particular, the far bigger jump in California against the national average.

It's interesting to examine the array of aesthetic "enhancements" that was deployed:

  • Typically, modest coloring aids comprehension but less is always more.  For this data set, two colors suffice, one for California (and its metro areas) and one for the national average
  • It is always preferable to place data labels next to the data, rather than deploying a maze of lines leading to other parts of the chart, as is the case here.  The legend would have been clearer if placed uniformly on the right hand side of the chart, next to the end of the lines
  • The axis labels can be vastly improved.  On the price (vertical) axis, the only relevant numbers are the average prices for each region.  So I'd remove the vertical axis completely, put the region labels on the right side as suggested above, and put the regional average prices next to the labels
  • On the time (horizontal) axis, the choice of 7-day intervals is somewhat baffling.  If weekly prices are of concern then it'd have been better to label them as "week 1", "week 2", etc., rather than 6, 13, 20, 27...  In this case, nothing is lost by marking only the beginning and middle of each month
  • Grid-lines should be removed unless this is a chart intended to be used as graph paper, as in science lab
  • The practice of shading the plot area should be abolished: the labelling of month on the time axis is sufficient, anything else is redundant

Finally, if the intent of the chart was to judge the difference between California and national gas prices, then a much longer time horizon is required.  Our curiosity is piqued by the convergence of the lines at the start of 2006; what had been the trend the year before?

Reference: "Gas Prices Rising Out of Sight Again", San Francisco Chronicle, Mar 28, 2006.