I ran across this hugely successful chart on Dean Foster's home page (and noted that he and his Wharton colleagues have a nice blog picking apart statistical errors committed in public.)

This is a histogram plotting the historical year-on-year returns of the S&P 500 index, binned into 10%-levels. It succeeds on two levels: the innovation of printing the years inside little blocks provides extra information without distracting the overall picture; the key message of this plot, that the negative return of 2008 is a negative outlier in the history of returns, is extremely clear.

This, in my mind, is a superior presentation than the usual time-series line chart that we see in every economics publication. For some purposes, it is better to unshackle ourselves from the linear time dimension, and this is a good example.

One question/comment: within each 10% level, the years are arranged in reverse chronological order fro top to bottom. This facilitates searching for a particular year. The obvious alternative is to order by the actual level of return, so that the result is akin to a stem-and-leaf plot.

While I like the graphical aspect of the chart, I feel like it has limited function. This graph appears useful to anyone who has a one-year investment horizon. If I want to predict what next year's S&P 500 return is, I might take a random sample from this distribution. However, as a lazy investor, I never look at a one-year horizon so this creates two problems: if I am looking five years out, I can't take five samples from this distribution because there is serial correlation in this data for sure; even if I could take those five samples, it is difficult to compute the five-year return in my head.

So what I did was to take the data and replicate this histogram for 2-year, 3-year, 5-year, 10-year, etc. returns. The results are as follows. I decided to simplify further and use Tukey's boxplot instead of the histogram. The data are real compounded total returns from S&P 500 from 1910-2008.

The boxplot on the top right shows that there is about a 25% chance that an investment in the S&P 500 will return negative in real terms in any three-year period (below the green line). At the other end, there is a 25% chance of getting earning more than 50% on the principal during those three years.

The next set of boxplots compared 5-year returns to 10-year returns and 10-year returns to 20-year returns. If we have a 10-year horizon, there is still positive chance of reaching the end of the decade and finding the investment under water! The median 10-year return is approximately doubling the principal (about 8% per annum compounded).

In a twenty-year period, there is hardly any chance of not making money on the S&P. There were two positive outliers of over 1000% (about 13% per annum compounded over 20 years).

Reference: Data from Global Financial Data

Nice post and some interesting thoughts. However, if you make investement decisions for investment horizons of 5+ years, you should also have a look at inflation corrected data. While that doesn't help when you try to compare various investments, it sheds some interesting light on the absolute point of "increasing value" versus "increasing numbers".

Posted by: Christian | Jan 16, 2009 at 10:08 AM

Nice boxplot chart. If you plotted "expected average annual return" for the period instead of "expected return" for the period, the boxes could be graphed against the same axis allowing easy comparison.

Posted by: Kent | Jan 16, 2009 at 10:49 AM

Good post. FYI, the Leuthold Group has been producing histograms like the one you link to for probably twenty-five years, including using the year (or other appropriate time period) within the building blocks of the histogram. I don't know whether they came up with the approach or not, but they use it effectively to visualize the history of a number of investment variables.

Posted by: tom brakke | Jan 16, 2009 at 04:39 PM

Christian: I think the data is inflation-adjusted; the source said it is real total returns.

Kent: I thought about putting the 5-year and 10-year returns as equivalent annual returns rather than cumulative. For my purpose, I'd like to know if I'm doubling my money, etc. so this makes more sense to me.

Tom: Would love to attribute those histograms to whoever first used them. Thanks for the pointer.

Posted by: junkcharts | Jan 19, 2009 at 07:47 PM

Did you mean to have 2 graphs labeled 10 years?

Why are there outliers at 5 and 20, but not the two in between?

Posted by: Evelyn | Jan 19, 2009 at 09:04 PM

The two 10-year plots show the same data but with different axis scales, don't they? In each case the axis scale is the same as the adjacent plot.

Presumably there were no data points in the 10-year sample that met the criterion for being outliers (e.g. were further from the mean than a chosen multiple of the standard deviation).

Posted by: Tom | Jan 20, 2009 at 07:48 AM