## Comparing Federer and Agassi

##### Sep 12, 2005

Comparability, again, is at the heart of this chart.  (Thanks Annette for bringing it up to my attention.)

What was attempted here was an impossible magic act, hoping to cram several disparate data series onto one chart.  It would appear that the relative age of Agassi and Federer is considered the primary control variable.  Alas, comparability is lost when the two sections were aligned by calendar year, rather than by age.  For me, it would make more sense to compare Federer at 24 to Agassi at 24; that would involve comparing 2005 to 1994, for example.

In the junkchart version (below), I use what are, at first glance, pie charts.  Ha, you muse, ain't I eating humble pie, given my repeated health warnings about pie charts?   I plead not.

Here, I pick the pies because of their circular shape, which has a neat analogy with the annual, four-step series of Grand Slams.  The reader does not have to judge the size of the slices or pies, only the shading and the location of each quarter-pie.  By leveraging the pie analogy, the presentation is more compact.  Compactness is a key virtue when the primary purpose of a chart is comparison; one would like to place the items for comparison as close together as possible.

As explained above, I use Age rather than Year as the variable.  Quickly one observes that Agassi skipped many Grand Slams in his early career while Federer has played almost a full slate.  They both won their first Slam at Wimbledon, both at 22 years old.  The chart signs off with an intriguing question of whether Federer's future path would mirror Agassi's.

The other half of the chart is rather easier to manage.  One would use Age as the variable and put both lines onto the same chart for compactness.

Reference: "Agassi Defies His Age; Federer Keeps on Rolling", New York Times, Sept 11, 2005.

## Rushing baseball rookies to debut?

##### Sep 10, 2005

Alan Schwarz looked at this table and concluded that rookies are "getting consistently older, not younger" when they make their first appearances in an MLB game.

If we just consider the last row (TOTAL), the average age by decade regardless of playing positions, his conclusion makes sense, at least from the 1960s to the 2000s.

Alan went a step further, providing data to help interpret the average age based on the positions of the rookies.  The nine additional rows contain 54 more numbers, with values all within a narrow range (22-26).  How should the reader interpret these numbers?  The following chart lends some help.  The black line depicts the average debut age over all positions and is identical in each graph, establishing a standard for comparison. (The vertical axis plots average age at debut.)

Here are some key insights:

• Age getting older is shown by the black line.  The green lines show that this conclusion does not apply to every positions: for example, the age of rookie shortstops (SS) appears to have fallen.
• Rookie shortstops (SS) and center-fielders (CF) have always debuted at younger ages than average.  Same for right-fielders (RF) until the 2000s.
• On the other hand, catchers (C) and to lesser extents, left-fielders (LF) and pitchers (P) have debuted at older ages than average.
• Specific deviations from the norm are revealed: the shortstop (SS) curve is clearly at odds with everything else; rookie 1st basemen (1B) have debuted at older ages in the 2000s than before.

All these points can be obtained from the table of numbers as well; it just takes much more time and much more effort than looking at a chart.

Reference: "For Baseball Rookies, the Only Rush Is to Judgment", New York Times, Sept 4 2005.

##### Sep 08, 2005

Data graphics are frequently used for comparisons, whether comparing the number of wi-fi networks across states, ad spending from year to year, efficiency of baseball teams from coast to coast, or labor costs across countries.

Comparability is not always attainable but such situations can often be rescued through data transformation or standardization.  For example, unit labor costs were compared against the EU average.  Avoid the pitfall of comparing apples with oranges, as in the following set of charts from a McKinsey report:

The accompanying (abridged) report has this to say about the graphs:

Germany, Japan, and the United States have traditional hump-shaped life cycle savings patterns. In these countries, aging populations will cause a dramatic slowdown in household savings and wealth. In contrast, Italy has a flatter savings curve, resulting in part from historical borrowing constraints that forced households led by people in their 20s and 30s to save more. Thus an increase in the share of elderly households will have less impact on the country's financial wealth.

A first glance at the charts can leave one mystified as to how Italy's curve is "flatter" and not "hump-shaped" like those of Germany, Japan and the U.S.  A closer inspection reveals that each chart has its own scale because the savings rate is expressed in each country's currency.  The junkchart version below establishes comparability by expressing all amounts in U.S. dollars (using the Jan 3, 2000 exchange rate); once that is done, all five curves can be plotted on the same graph, facilitating visual comparison.

Numerous insights become clear:

• Italy's savings rate is much higher than those of the other four countries.  Its savings curve has a clear hump which occurs at around 55-60 years, later than several other countries.
• Germany and the UK both have flat savings curves (but not Italy).
• Japan, the U.S. and the U.K. have curves that dip below 0, an important feature that was not mentioned in the summary report.  (It is unclear why in the original version, the scales for Germany and Italy started at -1 and -2 even though all values are positive.)
• Japan, the U.S. and to a lesser extent Germany have similarly shaped curves, with plateaux at around 30-50 years.  Italy and the U.K. curves peak after 55 years.  A key observation, made in the report, was the myth that the Japanese saved much more than Americans; not true in recent years, apparently due to the rapidly aging population.
• The U.K. has the only rising curve (but note that data for 70-80 years is missing).

Reference: "The Demographic Deficit: How aging will reduce global wealth", McKinsey Quarterly, March 2005.

## A variant of the "half-baked" disease

##### Sep 07, 2005

The disease of the "half-baked" (see here) is in fact widespread.  The Wall Street Journal and other media outlets have a liking for this particular feature in their bar charts.

In the chart on the right, the half-baked bar rears its head in 2005.  In this variant of the disease, the 2004 bar is split into two halves so that we can compare first half 2004 and first half 2005.

Two numbers do not a trend make and so the extra data point only gives us false comfort.  While 2005 first half compares poorly to 2004 first half, we don't know how 2005 first half compares to other such periods in the bank's history.  In reading such charts, avoid the tendency to generalize.

Why not split every bar so that we can accurately judge the size of the half-baked 2005 bar?  Alternatively, remove the gray bars and draw in a projected full-year 2005 number.

Reference: "Omnicom Lands Bank of America", Wall Street Journal, Sept 1 2005.

## Simplicity and clarity

##### Sep 06, 2005

Here's a quick post to tide you over while I recover from the weekend's festivities.  There was surprisingly little traffic on the roads this weekend, and one suspects that gas prices had something to do with it.

Gas prices form the data underlying this simple, clear and effective chart:

No bells and whistles.  Just a straightforward presentation of the data.

The key messages scream out at the reader: that the price of gas in the U.S.,as high as it is right now, is still much lower than those in many European countries; and that government taxes explain most of the price differential.

We've seen many examples of simple charts that fail to do justice to the data.  Even simple data series should be handled with care.  I'm glad to say this chart is well-done, and I appreciate the effort.

Reference: "Europe, With Other Woes, Takes Gas Prices in Stride", New York Times, Sept 1 2005.

## Congratulations!

##### Sep 06, 2005

I'd like to congratulate my friends Beth and Art who tied the knot this Sunday; thanks for a beautiful wedding weekend in Vermont.

Best wishes to Hendrik and Alice who also exchanged vows recently.

## One for the cutting room floor

##### Sep 02, 2005

This chart comparing U.S. and China garment markets really calls for mixed metaphors: it should've been left on the cutting room floor.

The two intended messages are simple: the U.S market is much larger but the China market is growing much faster.  But the chart manages to confuse us all the same.

First, the China market is tracked for 14.5 years versus 3.5 for the U.S. market, without explanation.  By stacking these bars together, the chart creates a false impression of exponential growth.

Then, the data from 2002-5 are enclosed in gray boxes of arbitrary heights, interfering with our ability to read the trends.  While I don't like gridlines on bar charts in general, their appearance in these redundant gray boxes really beggars explanation.

Even the vertical scale needs re-editing.  Why can't they halve the line segments and place the numbers next to the lines?  The omission of the zero-dollar line may mislead some into thinking that the \$10 billion line represents zero.

Last, but not least, the black, half-baked bars (representing the first half of 2005) impair our comprehension.  Their presence adds nothing to the graph at all.  Indeed, if the reader is to pick up the fast growth of the Chinese market, seeing the plunging and darkly accented last bar surely doesn't help.  Here, the chart designer has two choices: draw the projected full-year 2005 data or omit 2005 altogether.

I suspect that the bar chart format was selected, partly in order to accommodate these half-baked bars.  Otherwise, a line chart would work nicely in this context.  As an alternative, the following conceptual graph (since I don't have data) brings out the two messages much more clearly.

The gap between the two sets of points illustrates the relative sizes of the two markets  while the steeper line for China shows its much faster growth.

Reference: "Chinese Apparel Makers Seek the Creative Work", New York Times, Sept 1 2005.