« December 2008 | Main | February 2009 »

Popping the bubble, so to speak

Matt H., who authored the previous post, and Charles G. both pointed to a great example of how people like readers here can make a difference.   A bad chart got made over!

The Financial Times published a chart from a JP Morgan report, using ... you guessed it ... bubbles to illustrate the deep plunge in market capitalization of many banks.



Some readers at the blog were none too happy with the choice of bubble charts.   Among other things, the designer made the common mistake of plotting proportional diameters rather than proportional areas.  This is clear from looking at JP Morgan's bubble.

This chart exposes the weakness of bubble charts well.  Look at the top row of bubbles.  Most of them look so similar it is impossible to know, without spending much time studying the circles, which bank was hurt more.

Eventually, a bar chart was produced.  Felix Salmon linked to it.  I am not sure how the banks were ordered in this chart.  It isn't one of the two obvious dimensions, nor is it in alphabetical order.


In fact, neither the bubble nor the bar chart works well for this case.  What we need is the Case-Schiller style asset-bubble life cycle chart.  In order to interpret these changes in market caps, we need to know how big was the bubble, and then how steep was the consequent decline.  Take a look at our discussion of real estate bubble charts here.

Reference: "Bank capitalization chart of the day", Felix Salmon at Portfolio.com and "Bank picture du jour", FT Alphaville, Jan 21 2008.

Bond yield makeover

Message to readers: I have a large backlog of reader suggestions.  Please be patient as I slowly get through them.  The frequency of posts will remain lower for the time being as I am busy finalizing a draft of a book.  More on that in the near future.

Matt H, a reader, sent in the following entry (with minor edits).

I saw a couple of bad charts on money.cnn.com and thought I'd submit them to you. 

They're both part of the same feature on investment bargains caused by the recession. 

Here are the links:  

Cnn_municipal_bonds Chart A: Municipal Bonds

Cnn_corporate_bonds Chart B: Corporate Bonds

It seems to me like both charts would have made their points more eloquently by using a much simpler, more common form, like a bar chart

In Chart A, cubes are used to display the difference between treasury bond yields and AAA municipal bond yields at the two-year horizon and the ten-year horizon.  The volume of each cube corresponds to the yield for the given type of bond in the given period (I think), which spreads the one dimension being compared (yields) across three dimensions, making the differences look smaller than they really are.

[...] At the two-year horizon, the two yields being compared are 1.16% for Treasury bonds and 3.01% for AAA municipal bonds.  The yield for AAA municipal bonds in this case is more than 2.5 times larger than the yield for Treasury bonds, but the difference doesn't look nearly that big in the chart provided.  [...] 

Time out.  Let me add that the inadvertent reference to an optical illusion concerning foreground and background!  The "outline only" cube on the left should have approximately the same volume as the "solid red" cube on the right (3.01% versus 3.30%)  and yet the red cube appeared quite a bit larger because our eyes reacted to the solid color more than thin outlines.

Matt continued:

In Chart B, [...] Again, the metric in question is bond yields:  ten-year Treasury bond yields compared to investment-grade corporate bond yields.  The 2008 figure for each is shown alongside the five-year average.  This chart uses the area of a circle to express these yields, spreading the one-dimensional value across two dimensions.  As in Chart A, the result is a chart in which the difference between values does not appear as large as it actually is.  

I will also send a simple bar chart version of each chart -- the bar charts should illustrate the differences in yields more effectively than the charts actually used in this article.

These are his revised charts:


We can do even better to convert the chart on the right to a time-series line chart.  Instead of the five-year average, it is better to display the gap beween treasury and corporate bonds for each of the five years plus 2008.  This should make for a more eye-catching graphic.

Reference: "Investment in the bargain bin", CNN Money.

The shackle of time 2

In the last post, we removed the time dimension in order to clarify certain aspects of the S&P 500 returns.  We found that with an investment horizon of five years, there was historically about a 25% chance of losing money and a 25% chance of more than doubling.

Even though we looked at cumulative returns, it was still the case that the data was serially correlated; in other words, it could be that the eventual return was not independent of the starting year of the five-year period.  To gauge this, we must return to the time dimension that was previously removed.

Redo_sandp5 The chart on the right plots the five-year returns for all five-year periods starting from 1910.  What it shows is that even with longer time frames, timing or luck still plays a key role. 

For example, any such investment in the S&P 500 between late 1950s and 1980 did not double in five years no matter which year the investment was made.  Then again, if the investment was made in the 40s and 50s, no one lost money in a five-year period, similarly in the 1980s.

So the fact that we saw a 25% chance of doubling (or losing money) over history says much less about what might happen in the next 5 years than the simple number suggests.

In response to a reader's comment - the data series was described as "real total return" so these are inflation-adjusted.

The shackle of time 1

SP_from_1825 I ran across this hugely successful chart on Dean Foster's home page (and noted that he and his Wharton colleagues have a nice blog picking apart statistical errors committed in public.)

This is a histogram plotting the historical year-on-year returns of the S&P 500 index, binned into 10%-levels.  It succeeds on two levels: the innovation of printing the years inside little blocks provides extra information without distracting the overall picture; the key message of this plot, that the negative return of 2008 is a negative outlier in the history of returns, is extremely clear.

This, in my mind, is a superior presentation than the usual time-series line chart that we see in every economics publication.  For some purposes, it is better to unshackle ourselves from the linear time dimension, and this is a good example.

One question/comment: within each 10% level, the years are arranged in reverse chronological order fro top to bottom.  This facilitates searching for a particular year.  The obvious alternative is to order by the actual level of return, so that the result is akin to a stem-and-leaf plot.

While I like the graphical aspect of the chart, I feel like it has limited function.  This graph appears useful to anyone who has a one-year investment horizon.  If I want to predict what next year's S&P 500 return is, I might take a random sample from this distribution.  However, as a lazy investor, I never look at a one-year horizon so this creates two problems: if I am looking five years out, I can't take five samples from this distribution because there is serial correlation in this data for sure; even if I could take those five samples, it is difficult to compute the five-year return in my head.

So what I did was to take the data and replicate this histogram for 2-year, 3-year, 5-year, 10-year, etc. returns.  The results are as follows.  I decided to simplify further and use Tukey's boxplot instead of the histogram.  The data are real compounded total returns from S&P 500 from 1910-2008.

Redo_sandp123  The boxplot on the top right shows that there is about a 25% chance that an investment in the S&P 500 will return negative in real terms in any three-year period (below the green line).  At the other end, there is a 25% chance of getting earning more than 50% on the principal during those three years.

The next set of boxplots compared 5-year returns to 10-year returns and 10-year returns to 20-year returns.  If we have a 10-year horizon, there is still positive chance of reaching the end of the decade and finding the investment under water!  The median 10-year return is approximately doubling the principal (about 8% per annum compounded).  

In a twenty-year period, there is hardly any chance of not making money on the S&P.  There were two positive outliers of over 1000% (about 13% per annum compounded over 20 years).

Reference: Data from Global Financial Data

A Harvard mess 2

Jon's comment on the previous post pretty much anticipated this post.   The prior post concentrated on graphical matters.  However, the biggest issue with that chart is the choice of metrics.  If the idea is to explore the potential adverse effect of a sharp decline in endowment investment performance, then it is not clear why one should be comparing the proportion of endowment funds and the proportion of operating revenues paid for by endowment funds.  A missing element from these two series is the relative size of the budgets of these different departments.

The next chart shows the proportion of each school's operating costs accounted for by endowment funds together with the total size of its operating costs.

We can turn the ratio around and directly compute how much of the total amount of endowment funds distributed for operating costs is accounted for by each school.  This is really the simplest metric that gets to the question.


There are really two possible worries: for the School of Arts and Sciences, who pays for just about $1 billion costs with endowment funds, any significant reduction in distribution will leave a gaping hole; for a department like Radcliffe that pays for over 80% of its operating budget out of endowment funds, obviously a reduction in distribution can cause problems but we are talking about a base of $18 million rather than $1 billion.

Reference: Harvard University Financial Report, 2008; Harvard Fact Book 2007

PS. The first source did not contain any data on operating budgets so the first set of graphs (now replaced) did not show what I intended to show.  The new ones used the right data and had the right order of magnitude in terms of budgets ranging from millions to 1 billion.  The 2008 data are not available as of yet.

A Harvard mess 1

There are some innocent-looking charts that throw up more and more problems, the more you look at it.  This example comes from a magazine sent to Harvard alumni.  We have all heard that their endowment fund suffered some horrific losses in the last few months and so the magazine editor thought it useful to describe the potential impact on the different departments.

It's a safe bet our readers would not think to present two related data series as a combination of one bar chart and one column chart. 

As the chart stands, the intended message is completely lost.  It takes a bit of fishing to learn that the Radcliffe Institute has a tiny stake in the endowment fund but supports over 80% of its operating funds from the endowment.

Now looking a little deeper, we find that the scales of the two charts are not standardized so the length of a bar and the length of a column cannot be directly compared.  Nor can the grid-lines as each section accounts for 10% in the bar chart but 20% in the column chart (to make it worse, the larger section represents the smaller percentage!)

Looking further still, we find that "Other" accounts for some 15% of the endowment but apparently consists of entities that do not have operating funds and thus goes missing in the column chart.  In our versions below, we will ignore the "Other" category completely; this is equivalent to assuming that we have allocated to the individual schools a proportional share of "Other"'s share.

Not to mention arranging the schools by alphabetical order.

Much of this mess became apparent when we put the two charts into a uniform setting, like this:


A scatter plot provides good information, especially if there is clustering  although we can debate whether it is fit for general publication.


More in our next post.

Reference: "The Endowment: Each school's stake", Harvard Magazine, Jan-Feb 2009.

PS.  The initial post switched the axis labels on the two bar charts. Thanks to Jon for pointing this out.

R power, math stats power

Amusingly, the New York Times finally got wind of the R software.   See the article here.

We in the statistics community owe these folks a lot of gratitude for developing such a flexible, powerful software.  It is unfortunate that they didn't mention graphing as one of the great strengths of the software.

Equally amusingly, the Wall Street Journal told us what we already know, that we have the best jobs in the world.  Their discovery here.



I'm fascinated by little software engines that generate blog spam automatically.  They often do a pretty good job saying something generic and not totally out of line with the rest of the content.  For example, during the holidays, a spam appeared here with the message "Thanks, super site!".  Humor me, I like to think this is a super site.   Such statements are not entirely irrelevant so they are not entirely spam.  It's like junk mail: it is not junk if you are going to use the $15 discount card Macy's sends you but it is junk if you are not interested.

Well, an even more confusing case is this other message, which said:

I said the same thing about the tree ring chart; I guess I'm paying attention. ;)

The comment links to a blog that looks authentic but there does not appear to be a post that deals with the tree ring chart.  So I'm all confused.   Is this the work of an ultra-smart auto-spammer or a real person?