« March 2009 | Main | May 2009 »

Recovery inside a recovery

While reconstructing the Dow price chart (here), I noticed that there was some dubious statistics going on behind the scenes.  The chart made the point that the 1929 bear market took over 20 years to recover to its peak value.  The mystery wrapped in the enigma is the existence of the time series for a 1937 bear market and a 1939 bear market.  This could not happen unless there were bears within bears and recoveries within recoveries.

The uncomplicated time-series view brings this situation out more clearly:


This is a sobering picture in the face of all the talk about "green shoots" and "bear market rallies".

From a statistical perspective, the 1937 and 1939 bear markets cannot be interpreted without noting that they happened inside of a larger bear market.

Inspired by Tetris

What should we call this one?  A Tetris chart, perhaps.


In particular, pay attention to the rightmost three pieces: while the shapes look completely different, the actual proportions ranged from 6 to 8 percent.

The Tetris chart fails our self-sufficiency test. The only way to read it is to read the data labels.

Since the proportions add up to 100 percent, this multiple-choice question appears to allow only one answer, even though, as the text said, there were two acceptable answers!  It would be useful to label those two choices separately.  We'd also want to see how the question was phrased.

Seen differently, the Tetris chart is a 4x25 matrix with each cell representing one hundredth of the respondents.

Reference: "Name, Please?  High School Seniors Mostly Don't Know", New York Times, April 19 2009.

Don't mess with the scale

My friend Patrick pointed out the single biggest issue with the chart below -- that the designer chose a scale that precisely undermines the message of the chart.  Undermine may be too mild a word to use here; annihilate may be more apt.


The lines in this chart are anchored at the zero point on the time line (horizontal axis) used to indicate the bottoms of various bear markets in the Dow from 1929 to 2007.  From that anchor, time runs to the left showing the amount of time for the Dow to go from peak to bottom (the decline); time runs to the right showing the amount of time for the Dow to climb back to the prior peak (the recovery).  As the caption said, the point of the chart is "if the decline was fast, the recovery took a considerable time".

Funny thing then that the distances from the zero point are roughly comparable on the left as on the right.

This illusion resulted from some very convoluted and perplexing messing around with the horizontal scale.  First, the left-of-center scale is in months while the right-of-center is in years.  Second, the left-of-center scale has normal spacing while the right-of-center seemingly was suffering from spasms.  Take a closer look:

Dow2_right The first five years (0-5) took up about half the scale while the next five (5-10) took maybe one-eighth.  The first year (0-1) took about as much space as the next two years (1-3).

I am not quite sure what is the logic behind this but since the message of the chart has everything to do with the time duration, it is most unfortunate to introduce such distortions.

There is yet another "innovation" in this chart.  Notice that on the right side, the axis labels are irregular (more spasms)... 0,1,3,4,5,10,15,20, 25...  This is as if the designer is posing one of those IQ questions requiring readers to figure out the next number in the sequence.  The specific time intervals selected may have meaning: note that all the lines are straightened out in between these tick marks.  Given that each line represents a different historical sequence, it is difficult to comprehend the regularity of these intervals across history.  Perhaps this will prove to be the key to unlocking the secret of this chart.  Please comment below if you are able to unravel this mystery.

Besides, the same type of "innovation" was not applied to the left side of the chart.  Here, the designer opted to throw out all the data between the peak and the bottom and straightened out all the intermediate fluctuations.

Below are two different versions of this chart, basically restoring the time scale to the normal, equally spaced, symmetric appearance.  The top one used monthly Dow returns where the volatility obstructed our understanding of the trends, requiring the use of color to differentiate the lines.  In the next version, I used R to generate the loess estimates (a type of smoothing) and the trends became clearer.  (There was a prior discussion of loess on Junk Charts here.)


Now, these pictures are very different from the original graph!

I'd be very cautious about reading into these charts anyway.  This question is not one suitable for statistical analysis.  The sample size of six is far too small.  Each recession is different in terms of causes, remedies and context.  The fact that we call them recessions do not make them comparable.  Further, it is also impossible to know at this stage if the 2007 decline has reached bottom.  The chart designer essentially assumed this to be the case but who knows?

PS. Nick Rapp, one of the designers of the chart, responds in the comments.  He has started a blog to feature the work of his graphics team at AP.  His colleague has created an interactive version.  More than anything, this post highlights an aspect of the chart that Nick and his team clearly spent a lot of time doodling over.  The concept of the chart itself is wonderful actually, if I didn't say so already; it is essentially the same chart as the oft-printed chart where the anchor point is the start of each recession, only here the anchor is the bottom of each recession.

A book

We bring attention to a book on graphics written by Bernard Lebelle, a frequent contributor to this blog.  The book came out in France earlier this year.  The title is "Convaincre avec des graphiques efficaces sous Excel, PowerPoint ...", published by Eyrolles.  Thankfully because much of the book is visual, I don't need to know French to understand much of it.  Here, I discuss two interesting things:

On page 13, he discussed flow diagrams using the energy flow example that led to a long discussion on this blog.  He proposed using a Merimecko chart instead.


On page 89, he showed a concentric circle chart (see below).  This is a relatively simple train schedule showing the frequency of trains at each hour on each day of the week.  It looks interesting because of the allusion to the clock, except that typical clocks have twelve hours rather than 24.  I'd create a set of two charts, one for the first twelve hours, one for the second twelve.


This sort of chart is very limited in utility but it works well here because the data is entirely categorical - one or two trains per hour, hour of day, day of week - and in addition, the relationships are very simple.  In fact, the reader/user does not need to read any trends, general patterns or estimate the size or shape of anything.  The user is performing a simple search operation, that's it.

(The innermost circle is unlabelled so it is unclear what that signifies.)

Lebelle provided an alternative on page 90, which is essentially a data table, with time on the vertical dimension and calendar date on the horizontal, and the frequency inside the cells.  This is more straightforward, less interesting.

On page 151, he mentioned the self-sufficiency test that we discussed often here.  A graph should do more than just print all the data in the data set.

Lebelle is currently Senior Manager at Deloitte, the management consulting company, and he focuses on graphical construction in Excel.  This is both a limitation and an advantage.  Excel, of course, has many imperfections (don't get me started on the new and horrid Excel).  However, Excel is still the most widely used graphing application, by fa

The book takes a perspective on charting that fits our philosophy very well.  Here is a rough summary of the contents of the book (any mistakes are mine):

chapter 1: a summary of the key features of good charts... issues such as clarity and efficiency of the message are addressed

chapter 2: historical perspective, with examples from Playfair, Minard, Nightingale, etc.  page 38 has an interesting table comparing the contributions of Bertin, Tukey, Tufte, Ware and Cleveland.

chapter 3: constructs of a chart such as axes, legends, etc.  page 43 explains the difference between "information design", "infographics", "charts" and "information visualization".  introduces chartjunk, data-ink ratio.

chapter 4: "decoding" of a chart.  Discusses optical illusions, which I also consider to be fundamental to understanding the effect of charts on the audience.  Talks about how different ways of displaying the same data is perceived differently.  Interesting section (starting p.101) considering some quantitative theories about perception, citing Ernst Weber and Stanley Smith Stevens.

chapter 5: process of making a chart.  The nitty-gritty things like transforming the data, picking a scale, etc.

chapter 6: examples.  Also introduces a classification system for charts.  It has one of those flowcharts which is supposed to allow someone to pick a type of chart based on whether the data is numeric or categorical, etc.  I know this is very popular in engineering and scientific textbooks but I have never found any use for such flowcharts.  There are 30 - 40 pages of charts here and a great resource to get some ideas.

chapter 7: exercises

chapter 8: resources

Pure delight

Nyt_infantmortality  My favorite Bumps chart in the New York Times ...

For the purist, this is the original rank-based version.

With judicious use of color and background/foreground, this makes for a good story.

The color scheme here, however, is a bit bland.  Green for improvement, blue for decline and orange for USA.

Note, for example, New Zealand and England both suffered similar drastic drops as the US.

It would be better to (for example) split out the large improvements and large declines, or to split out the developed world versus the developing world.

This chart is created like this probably because the accompanying piece makes only passing reference to this chart so there is not a clear message to the creator what to do with the data.  

Interestingly, there were no ties in 1960 but quite a few ties in 2004.  I wonder why.  I'd shift the dot to the mid-point between ranks rather than move them up to the higher rank.

All in all, a much more engaging way to present this data than the reams of table found in say the UN World Development Report.

Reference: "Vital Statistics: U.S. Still Struggling With Infant Mortality", New York Times, April 6 2009.

An art class?

Robert F. pointed us to these charts, via the Digital Design Blog.  A larger version is found here.  These look like scraps from an art class, exploring perspective and 3D.

Mtcc These types of charts are quite prevalent in the web analytics area.  We have a long way to go in terms of producing good visualization of such data.

For even more light entertainment, click here.  (Warning: not for the easily offended, language purists, and mildly not safe for work).  (This is via Pete S).