May 17, 2009

Supplemental reading

What are other graphics blogs talking about recently?

Subway_sparklines2 Information Aesthetics highlighted the so-called New York City Subway sparklines.   (original site)  (Andrew also mentioned it.)

IA said "
The general idea is that the history of subway ridership tells a story about the history of a neighborhood that is much richer than the overall trend." 

Okay but what about these sparklines would clarify that history?  From what I can tell, this is a case of making the chart and then making sense of it.

The chart designer did make a memorable comment in his blog entry: "Hammer in hand, I of course saw this spreadsheet as a bucket of nails."  The hammer is a piece of software he created; the nails, the data of trips taken.



Wsj_stresstest Nathan at FlowingData gave a reluctant passing grade to this Wall Street Journal bubbles chart illustrating the recent U.S. bank "stress" test.

One should fight grade inflation with an iron fist.  (Hat tip to Dean Malkiel at Princeton.)  A simple profile chart would work nicely since the focus is primarily on ranks.  The bubbles, as usual, add nothing to the chart, especially where one can create any kind of dramatic effect by scaling them differently.


Envy_map Nathan also pointed to the maps of the seven sins, which garnered some national attention.  This set of maps is a great illustration of the weakness of maps to study spatial distribution of anything that is highly correlated with population distribution.  Do cows have envy too?  See related discussion at the Gelman blog.




May 03, 2009

Animal racetrack

We introduced the racetrack chart before.  Via Zero Hedge, we find a version of it, perhaps a race for animals.  In a race for humans, they run in concentric circles; animals are not so tame, they may stray off the track, or just refuse to continue.

Zerohedgeintervention_arc

The designers certainly tried very hard to make the numbers palatable.  Indeed, given how much of our taxpayer funds are being thrown to the fire these days, any informed citizen ought to know how the money was being spent.  Their hard work, unfortunately, was not rewarded as the various constructs failed to improve our understanding of the data.

The three annotations on the right tell us that the arc width at the left indicates the allocated funds while the arc width at the right indicates the actual amounts spent as of end of April.  In addition, the breakpoint on each arc in relationship to the fan of lines indicate the date at which the funds were allocated. 

In reality, things are a bit more complicated.  When all allocated funds have been spent, as apparently the case of Fed funds for AIG, the arc has no break point and thus the date of the allocation is missing.  Also, when the same use soaks up funds from multiple sources, the width on the right gets confusing: take for example FDIC funds for unlocking credits; it's unclear how the two arcs add up to 1.8 trillion.

Perhaps a flow chart might work well for this sort of data.


Reference: "Visual Representation of the Government Intervention Programs", Zero Hedge blog, April 8 2009.

Apr 30, 2009

Recovery inside a recovery

While reconstructing the Dow price chart (here), I noticed that there was some dubious statistics going on behind the scenes.  The chart made the point that the 1929 bear market took over 20 years to recover to its peak value.  The mystery wrapped in the enigma is the existence of the time series for a 1937 bear market and a 1939 bear market.  This could not happen unless there were bears within bears and recoveries within recoveries.

The uncomplicated time-series view brings this situation out more clearly:

Moredow

This is a sobering picture in the face of all the talk about "green shoots" and "bear market rallies".

From a statistical perspective, the 1937 and 1939 bear markets cannot be interpreted without noting that they happened inside of a larger bear market.

Apr 19, 2009

Don't mess with the scale

My friend Patrick pointed out the single biggest issue with the chart below -- that the designer chose a scale that precisely undermines the message of the chart.  Undermine may be too mild a word to use here; annihilate may be more apt.

Dow2


The lines in this chart are anchored at the zero point on the time line (horizontal axis) used to indicate the bottoms of various bear markets in the Dow from 1929 to 2007.  From that anchor, time runs to the left showing the amount of time for the Dow to go from peak to bottom (the decline); time runs to the right showing the amount of time for the Dow to climb back to the prior peak (the recovery).  As the caption said, the point of the chart is "if the decline was fast, the recovery took a considerable time".

Funny thing then that the distances from the zero point are roughly comparable on the left as on the right.

This illusion resulted from some very convoluted and perplexing messing around with the horizontal scale.  First, the left-of-center scale is in months while the right-of-center is in years.  Second, the left-of-center scale has normal spacing while the right-of-center seemingly was suffering from spasms.  Take a closer look:

Dow2_right The first five years (0-5) took up about half the scale while the next five (5-10) took maybe one-eighth.  The first year (0-1) took about as much space as the next two years (1-3).

I am not quite sure what is the logic behind this but since the message of the chart has everything to do with the time duration, it is most unfortunate to introduce such distortions.

There is yet another "innovation" in this chart.  Notice that on the right side, the axis labels are irregular (more spasms)... 0,1,3,4,5,10,15,20, 25...  This is as if the designer is posing one of those IQ questions requiring readers to figure out the next number in the sequence.  The specific time intervals selected may have meaning: note that all the lines are straightened out in between these tick marks.  Given that each line represents a different historical sequence, it is difficult to comprehend the regularity of these intervals across history.  Perhaps this will prove to be the key to unlocking the secret of this chart.  Please comment below if you are able to unravel this mystery.

Besides, the same type of "innovation" was not applied to the left side of the chart.  Here, the designer opted to throw out all the data between the peak and the bottom and straightened out all the intermediate fluctuations.

Below are two different versions of this chart, basically restoring the time scale to the normal, equally spaced, symmetric appearance.  The top one used monthly Dow returns where the volatility obstructed our understanding of the trends, requiring the use of color to differentiate the lines.  In the next version, I used R to generate the loess estimates (a type of smoothing) and the trends became clearer.  (There was a prior discussion of loess on Junk Charts here.)


Redo_dow2
 

Now, these pictures are very different from the original graph!

I'd be very cautious about reading into these charts anyway.  This question is not one suitable for statistical analysis.  The sample size of six is far too small.  Each recession is different in terms of causes, remedies and context.  The fact that we call them recessions do not make them comparable.  Further, it is also impossible to know at this stage if the 2007 decline has reached bottom.  The chart designer essentially assumed this to be the case but who knows?


PS. Nick Rapp, one of the designers of the chart, responds in the comments.  He has started a blog to feature the work of his graphics team at AP.  His colleague has created an interactive version.  More than anything, this post highlights an aspect of the chart that Nick and his team clearly spent a lot of time doodling over.  The concept of the chart itself is wonderful actually, if I didn't say so already; it is essentially the same chart as the oft-printed chart where the anchor point is the start of each recession, only here the anchor is the bottom of each recession.

Mar 28, 2009

Knowing what one is doing

Jess B. sent in some entertainment.


Billshrink

In case the font is too small:

Billshrink2

There is a lot more here, including the author's note in the comments section.

Feb 13, 2009

Eternal optimist

Chris P pointed us to the "Financial Comeback" calculator, surely a well-meaning joke from the folks at the Times.  Here is how one gets to make a 40% loss back in just six years!

Nyt_comeback1
Surely, someone has to tell them about simulation.  They have to assume a probability distribution on the annual returns, and show us some sample paths.  Using the average annualized historical return in essence wipes out all variability and no wonder it's smooth sailing upwards.  Eternal optimist.  

Here is Chris' comment:

The bad news is that the range of values it offers does not include the return on the market from last year (-31% to -36%).  I guess they are optimistic.


The interactive features of this chart, however, impressed me.  The smooth adjustment to the chart as one slides the control, including the automatic choice of appropriate axis labels, is very nice indeed.

Okay, they want to trademark the name of the calculator so perhaps this is serious.


Reference: "Calculate your financial comeback", Jan 6 2009.

Jan 28, 2009

Popping the bubble, so to speak

Matt H., who authored the previous post, and Charles G. both pointed to a great example of how people like readers here can make a difference.   A bad chart got made over!

The Financial Times published a chart from a JP Morgan report, using ... you guessed it ... bubbles to illustrate the deep plunge in market capitalization of many banks.

4156

Jpmorgan

Some readers at the blog were none too happy with the choice of bubble charts.   Among other things, the designer made the common mistake of plotting proportional diameters rather than proportional areas.  This is clear from looking at JP Morgan's bubble.

This chart exposes the weakness of bubble charts well.  Look at the top row of bubbles.  Most of them look so similar it is impossible to know, without spending much time studying the circles, which bank was hurt more.

Eventually, a bar chart was produced.  Felix Salmon linked to it.  I am not sure how the banks were ordered in this chart.  It isn't one of the two obvious dimensions, nor is it in alphabetical order.


Bankchart2


In fact, neither the bubble nor the bar chart works well for this case.  What we need is the Case-Schiller style asset-bubble life cycle chart.  In order to interpret these changes in market caps, we need to know how big was the bubble, and then how steep was the consequent decline.  Take a look at our discussion of real estate bubble charts here.

Reference: "Bank capitalization chart of the day", Felix Salmon at Portfolio.com and "Bank picture du jour", FT Alphaville, Jan 21 2008.

Jan 25, 2009

Bond yield makeover

Message to readers: I have a large backlog of reader suggestions.  Please be patient as I slowly get through them.  The frequency of posts will remain lower for the time being as I am busy finalizing a draft of a book.  More on that in the near future.


Matt H, a reader, sent in the following entry (with minor edits).

I saw a couple of bad charts on money.cnn.com and thought I'd submit them to you. 

They're both part of the same feature on investment bargains caused by the recession. 

Here are the links:  

Cnn_municipal_bonds Chart A: Municipal Bonds

Cnn_corporate_bonds Chart B: Corporate Bonds

It seems to me like both charts would have made their points more eloquently by using a much simpler, more common form, like a bar chart

In Chart A, cubes are used to display the difference between treasury bond yields and AAA municipal bond yields at the two-year horizon and the ten-year horizon.  The volume of each cube corresponds to the yield for the given type of bond in the given period (I think), which spreads the one dimension being compared (yields) across three dimensions, making the differences look smaller than they really are.

[...] At the two-year horizon, the two yields being compared are 1.16% for Treasury bonds and 3.01% for AAA municipal bonds.  The yield for AAA municipal bonds in this case is more than 2.5 times larger than the yield for Treasury bonds, but the difference doesn't look nearly that big in the chart provided.  [...] 

Time out.  Let me add that the inadvertent reference to an optical illusion concerning foreground and background!  The "outline only" cube on the left should have approximately the same volume as the "solid red" cube on the right (3.01% versus 3.30%)  and yet the red cube appeared quite a bit larger because our eyes reacted to the solid color more than thin outlines.


Matt continued:

In Chart B, [...] Again, the metric in question is bond yields:  ten-year Treasury bond yields compared to investment-grade corporate bond yields.  The 2008 figure for each is shown alongside the five-year average.  This chart uses the area of a circle to express these yields, spreading the one-dimensional value across two dimensions.  As in Chart A, the result is a chart in which the difference between values does not appear as large as it actually is.  

I will also send a simple bar chart version of each chart -- the bar charts should illustrate the differences in yields more effectively than the charts actually used in this article.

These are his revised charts:

Redo_bondyield

We can do even better to convert the chart on the right to a time-series line chart.  Instead of the five-year average, it is better to display the gap beween treasury and corporate bonds for each of the five years plus 2008.  This should make for a more eye-catching graphic.


Reference: "Investment in the bargain bin", CNN Money.




Jan 19, 2009

The shackle of time 2

In the last post, we removed the time dimension in order to clarify certain aspects of the S&P 500 returns.  We found that with an investment horizon of five years, there was historically about a 25% chance of losing money and a 25% chance of more than doubling.

Even though we looked at cumulative returns, it was still the case that the data was serially correlated; in other words, it could be that the eventual return was not independent of the starting year of the five-year period.  To gauge this, we must return to the time dimension that was previously removed.

Redo_sandp5 The chart on the right plots the five-year returns for all five-year periods starting from 1910.  What it shows is that even with longer time frames, timing or luck still plays a key role. 

For example, any such investment in the S&P 500 between late 1950s and 1980 did not double in five years no matter which year the investment was made.  Then again, if the investment was made in the 40s and 50s, no one lost money in a five-year period, similarly in the 1980s.

So the fact that we saw a 25% chance of doubling (or losing money) over history says much less about what might happen in the next 5 years than the simple number suggests.


In response to a reader's comment - the data series was described as "real total return" so these are inflation-adjusted.



Jan 16, 2009

The shackle of time 1

SP_from_1825 I ran across this hugely successful chart on Dean Foster's home page (and noted that he and his Wharton colleagues have a nice blog picking apart statistical errors committed in public.)

This is a histogram plotting the historical year-on-year returns of the S&P 500 index, binned into 10%-levels.  It succeeds on two levels: the innovation of printing the years inside little blocks provides extra information without distracting the overall picture; the key message of this plot, that the negative return of 2008 is a negative outlier in the history of returns, is extremely clear.

This, in my mind, is a superior presentation than the usual time-series line chart that we see in every economics publication.  For some purposes, it is better to unshackle ourselves from the linear time dimension, and this is a good example.

One question/comment: within each 10% level, the years are arranged in reverse chronological order fro top to bottom.  This facilitates searching for a particular year.  The obvious alternative is to order by the actual level of return, so that the result is akin to a stem-and-leaf plot.

While I like the graphical aspect of the chart, I feel like it has limited function.  This graph appears useful to anyone who has a one-year investment horizon.  If I want to predict what next year's S&P 500 return is, I might take a random sample from this distribution.  However, as a lazy investor, I never look at a one-year horizon so this creates two problems: if I am looking five years out, I can't take five samples from this distribution because there is serial correlation in this data for sure; even if I could take those five samples, it is difficult to compute the five-year return in my head.

So what I did was to take the data and replicate this histogram for 2-year, 3-year, 5-year, 10-year, etc. returns.  The results are as follows.  I decided to simplify further and use Tukey's boxplot instead of the histogram.  The data are real compounded total returns from S&P 500 from 1910-2008.

Redo_sandp123  The boxplot on the top right shows that there is about a 25% chance that an investment in the S&P 500 will return negative in real terms in any three-year period (below the green line).  At the other end, there is a 25% chance of getting earning more than 50% on the principal during those three years.

The next set of boxplots compared 5-year returns to 10-year returns and 10-year returns to 20-year returns.  If we have a 10-year horizon, there is still positive chance of reaching the end of the decade and finding the investment under water!  The median 10-year return is approximately doubling the principal (about 8% per annum compounded).  

In a twenty-year period, there is hardly any chance of not making money on the S&P.  There were two positive outliers of over 1000% (about 13% per annum compounded over 20 years).






Reference: Data from Global Financial Data




Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31