« July 2006 | Main | September 2006 »


"Tropical Storm Debby strengthened as it moved northwest yesterday.  Of the 15 previous storms with positions similar [...], 13 became hurricanes, but only two reached the United States".

This data-laden statement accompanied the following weather map.

Now.  NyttsdebbyImagine if:

  • no colors were used, and
  • the two storms that landed had tracks bolded, and
  • the 13 hurricanes had solid tracks, and
  • all other tracks were drawn gray, and
  • the then location of Debby was annotated, and
  • only the two making landfall (Gloria 1985 and Storm 4 1938) were labelled, and
  • the locations of landfall were crossed, and
  • the large box was removed, revealing the land mass,

Then only the key data needed to support the accompanying statement would be present.

Reference: "Highlight: Tropical Storm Debby", New York Times (weather report),  Aug 24 2006

The dots don't connect

Nyt_stockownerNew York Times published a bar chart reminiscent of the one discussed here last week.  They added the 50% line and did not cluster the countries into groups of five. 

I like this chart for clarity and simplicity.  (Removing the decimal from the data would improve it.)  The U.S. and her special partner stand out as countries with the highest outside ownership of corporate shares. 

So far, so good.

Until I scanned the article itself, which startled and started with:

It turns out that most American investors are not xenophobic... Shareholders in the United States have been criticized as harboring "home bias" -- allocating far less to foreign stocks than they would if they did not let familiarity, patriotism and national loyalties stand in the way.

The dots don't connect, notwithstanding the academic references contained.  The chart shows how much U.S. stocks are owned by outsiders (which includes some foreigners but also many U.S. investors).  What has this to do with how much money U.S. investors spend on foreign stocks?

Even a good chart can't save a poor story.

Reference: "Investors without Borders", New York Times, Aug 27, 2006

Unscientific poll?

Nyt_evolution_1This decent chart adequately brought out a, to some, shocking point that the U.S. ranks next to dead last in our unscientific attitude towards evolution.

I have commented on 3-category bar charts before: putting the "not sure" category in the middle allows the reader to compare "Yes"/"No" responses easily.  I prefer lightly-tinted boxes for "not sure" to help gauge its size.

It's a good idea to provide the 50% label at the top.  It is mischievous to use a guiding line, akin to a tick mark, to indicate the "not sure" legend.  This line segment, while entirely redundant, creates confusion as the reader, exhausted by the height of the chart, would be desperately seeking the 50% mark at the bottom.  Without such, it is taxing to figure out what % of Americans actually answered "yes".

The biggest distortion in this chart is the absence of scale, in particular, population scale.  Half of the U.S. population represent many times the number of people as half of Cyprus, for example.  The choice of countries in the survey is also heavily biased toward small European countries.  In fact, Japan appears to be the only non-European country depicted, aside from the U.S. while curiously, the "special" partner of the U.S. is missing.

Reference: NYT, approx. Aug 16, 2006.
Here's a previous post on Science, with a link to Darwin's classics.

Bumps charts and NYT

I just cannot resist another post on Bumps charts since  NYT finally started using them.  Here are two recent examples:

Nyt_propertytaxThis first chart illustrates the change in property taxes in different municipalities since 1998, as compared to the national average.

A wealth of information is revealed:

  • All these places charge more than the national average today
  • New York City used to charge less than average but that ended in 2003
  • The tax rates are clustered into three groups, about 6%, about 5% and below 4%.  The variance between different places has decreased during these years
  • A sharp rise was recorded in all these places in 2001-3 although New York City lagged slightly.  The sharp rise was not observed nationwide

Reference: "Gain in Income is Offset by Rise in Property Taxes", New York Times, Aug 8 2006.

The second example is much cleaner as it involves only one period.  Bolding the "no one" line is particularly effective, bringing out the author's point well.

However, I'd have put the "no one" label on the right, just like the other labels, but bolded.

One could also argue that the real story is the simultaneous decline of "friend", "co-worker" and "neighbor" and rise of "no one" and "spouse".

Finally, it'd be interesting to see the multi-period version as the smooth linear trends are rather incredulous.

Reference: New York Times Magazine, July 16 2006.

Transparent circles

HousepricetoearningsratiolargeJens from Library House sent us this chart featuring house price to earnings ratios.  In his own words:

"the key thing that I just love is that they have included the data points, but not as points, but as little transparent circles. This allows you to understand by how much two data points are spaced apart from each other, visualising growth and making this chart look very dynamic. I have never seen this in this form before: very nice. Beyond this, the axes are clearly labelled, all in all a very simple chart, beautifully executed."

Illusion or junk? 3

Daniel, in his comments to my previous post, pointed out that my version of the bond market data focused on the relative sizes of different types of bonds, rather than their absolute values.  This is a keen observation and a design choice

When the chart designer tries to juggle too many balls at once, he tends to drop them all.

Redo2bonddataAs it were, I started out by creating a graph of absolute values but I just did not see much interest in it.  Here it is.  One would think the growth of ABS was extraordinary but not according to this chart.

The relative values were a lot more informative, in my opinion.

Illusion or junk? 2

Bondchart_1Previously we saw that the appearance of stacked area charts changes with how variables are ordered.  This is a serious deficiency.

Lets return to the bond market chart.

What is the relative prevalence of each type of debt over time?  In the original chart, this information is buried and can be extracted only painstakingly.

Redo_bonddataThe jun
kart version brings out this insight without much fuss.  As a percentage of total bond market debt, US Treasuries has dropped by half over the last two decades, much of it happening in the late 1990s.  Meanwhile, mortgage debt more than doubled during the same period, much of it occurring during 1985-1993.  The current distribution is also more balanced than in the last 20 years, as can be seen from the narrow spread.

A few design features are worth noting.  The vertical axis is given on both sides of the chart.  Limited colors are introduced to help readers distinguish the various lines.  Light vertical gridlines are provided to allow analysis during each 5-year period.  Non-essential tick labels are removed from vertical axes.

In fact, this is again a variant of the Bumps chart.  For more, see here, here and here.

Reference: Data from Bondmarkets.com (via Mahalanobis)

Illusion or junk?

BondchartMichael, over at Mahalanobis, sent over this chart (via this blog).  The person who created this chart later described it as a kind of "optical illusion", pointing out that the set of upward sloping lines interferes with our ability to read the 2005 data.  For example, was the MBS market (light green area) over or under $5,000?

A lively discussion at the blog concerns whether this is an illusion or just a junk chart.  It really is both.  Normally, by putting the vertical axis on one side (usually the left side), we require our eyes to trace an invisible horizontal line to read off the data.  The slanting lines, multiple colors and horizontal distance all conspire against us in this case.  However, this illusion can be summarily corrected by providing the same axis on the right side of the chart so it's really no big deal.

This type of stacked area chart has much bigger problems.  To wit,
Both these charts and the original chart contain exactly the same data.  Our eyes fool us by thinking that they are different charts.  The topmost line in each chart is indeed the same.  The next lower line, however, is different on each: on the left, that line represents the bond market excluding US Treasuries; on the right, excluding mortgages; and so on.  Every line except the topmost (and bottom most) line is information that cannot be easily digested.

What is the key message of this sort of chart?  Is it the growth in the total market?  growth in individual debt types?  distribution of debt among different debt types? changes in the distribution of debt?

For all these different objectives, there are better charts.  In the next post, I'll consider some alternatives.

Reference: Mighty Illusions Blog

Statistical literacy

I finally got around to reading "When Genius Failed", Roger Lowenstein's account of the spectacular collapse of LTCM, the hedge fund fronted by Scholes and Merton, Nobel laureates both.

It is a sobering read for anyone in the business of statistical prediction and modeling for sure.

What also caught my eye, and caused dismay, is how Lowenstein got basic statistical principles wrong in the book.   He used the bully pulpit to sound the usual alarm against the normality assumption and for fat tails.  He began by confusing LLN and CLT (central limit theorem):

Statisticians have long been aware of the "law of large numbers".  Roughly speaking, if you have enough samples of a random event, they will tend to distribute in the familiar bell curve ...

In the same breadth, he then equated two different probability distributions:

This is called the normal distribution, or in mathematical terms, the lognormal distribution.

Doesn't this say something about the state of statistical literacy?

PS. Here is a link to Dunbar's "Inventing Money" (thanks Marc).  It apparently came out before Lowenstein but didn't get as much press.