Jun 29, 2009

Round up

Here are some interesting reading from other places:


Blog_foodtag Tag clouds have caught on since we approved them a while ago.  One interesting use was at the Life Vicarious blog.  They use it to compare the inclinations of three New York-based restaurant reviewers.  What they should have done is to remove irrelevant words like  "one", "also", "many", "make"/"made", etc.  In statistics, this is called removing "noise" which helps bring out the "signal".








Nyt_babyimbalance Andrew Gelman discussed the NYT article that reported the finding of unexpected male bias in the children of Asian American families.  He can be counted on to make useful comments on any accompanying graphics.  He rightly pointed out that this is one example of not starting at zero: the relevant baseline is 100 since the metric is essentially the over-age of males relative to females.  I also agree that a line chart with a longer time series plotting percentages rather than over-age would work better.


















Fd_calorie The racetrack chart made an appearance at Flowing Data.  This one is even more busy and just as impossible to decipher.


Jun 03, 2009

Spinning the climate

Mike L. pointed us to this pair of "climate change model pie charts", with the brief comment "Yuck".

Sd_climate
What they are doing is to use the spinning wheel analogy to present probabilities (odds).  Not a good use of pies either.  Histograms do the job with minimal fuss:

Redo_climate 


I collapsed the 2-2.5 and 2.5-3 degrees sectors since every other one is a one-degree interval.  We see immediately that the effect of the policy is to shift the probability distribution to changes of fewer degrees.


Reference: "Climate change odds much worse than thought", Science Daily, May 20 2009.

May 26, 2009

The trouble with two lines

It's never a good idea to put two scales on one chart, and this is another example of what not to do:

Ir_newspaper

The ugliest part of this chart are the duelling gridlines. Because neither axis starts at zero, it is difficult to access whether the number of newspapers was declining at a faster pace than the circulation was. Also, line charts would be better able to trace the evolution over time. The interspersed blue and red columns interfere with each other. Note that the designer lessened our pain by plotting every other year, thus halving the number of columns.

The junkart version also puts two data series on the same chart but on the same
scale. Instead of plotting the raw data, we plot indices, with 2008 as 100. This reveals a pattern that was not apparent in the original chart. There appeared to have been four periods of evolution: up till 1980, both the number of newspapers and total circulation were at a plateau; from 1980 to 1990, the circulation stayed stable while the number of papers dropped drastically, indicating perhaps consolidation; then from 1990 to 2003, both series declined at roughly equal rates; and finally, the bottom dropped off the circulation from 2003.


Redo_newspaper


See our previous discussion of dual axes here, here, and here


Reference: "Channel Shift: Online Circulars: the first step by retailers toward web-to-store harmony? ", Internet Retailer (print), May 2009. Data from the Newspapers Association of America.


Bonus 1: Thinking about the column interspersing trick some more, I realize that it is, em, possible, em, that one series was plotted for odd years and the other series plotted for even years!

Bonus 2: Here is the requested scatter plot (still indiced):

Redo_newspaper2



Apr 06, 2009

An art class?

Robert F. pointed us to these charts, via the Digital Design Blog.  A larger version is found here.  These look like scraps from an art class, exploring perspective and 3D.

Mtcc These types of charts are quite prevalent in the web analytics area.  We have a long way to go in terms of producing good visualization of such data.


For even more light entertainment, click here.  (Warning: not for the easily offended, language purists, and mildly not safe for work).  (This is via Pete S).

Feb 13, 2009

Eternal optimist

Chris P pointed us to the "Financial Comeback" calculator, surely a well-meaning joke from the folks at the Times.  Here is how one gets to make a 40% loss back in just six years!

Nyt_comeback1
Surely, someone has to tell them about simulation.  They have to assume a probability distribution on the annual returns, and show us some sample paths.  Using the average annualized historical return in essence wipes out all variability and no wonder it's smooth sailing upwards.  Eternal optimist.  

Here is Chris' comment:

The bad news is that the range of values it offers does not include the return on the market from last year (-31% to -36%).  I guess they are optimistic.


The interactive features of this chart, however, impressed me.  The smooth adjustment to the chart as one slides the control, including the automatic choice of appropriate axis labels, is very nice indeed.

Okay, they want to trademark the name of the calculator so perhaps this is serious.


Reference: "Calculate your financial comeback", Jan 6 2009.

Jan 28, 2009

Popping the bubble, so to speak

Matt H., who authored the previous post, and Charles G. both pointed to a great example of how people like readers here can make a difference.   A bad chart got made over!

The Financial Times published a chart from a JP Morgan report, using ... you guessed it ... bubbles to illustrate the deep plunge in market capitalization of many banks.

4156

Jpmorgan

Some readers at the blog were none too happy with the choice of bubble charts.   Among other things, the designer made the common mistake of plotting proportional diameters rather than proportional areas.  This is clear from looking at JP Morgan's bubble.

This chart exposes the weakness of bubble charts well.  Look at the top row of bubbles.  Most of them look so similar it is impossible to know, without spending much time studying the circles, which bank was hurt more.

Eventually, a bar chart was produced.  Felix Salmon linked to it.  I am not sure how the banks were ordered in this chart.  It isn't one of the two obvious dimensions, nor is it in alphabetical order.


Bankchart2


In fact, neither the bubble nor the bar chart works well for this case.  What we need is the Case-Schiller style asset-bubble life cycle chart.  In order to interpret these changes in market caps, we need to know how big was the bubble, and then how steep was the consequent decline.  Take a look at our discussion of real estate bubble charts here.

Reference: "Bank capitalization chart of the day", Felix Salmon at Portfolio.com and "Bank picture du jour", FT Alphaville, Jan 21 2008.

Jan 25, 2009

Bond yield makeover

Message to readers: I have a large backlog of reader suggestions.  Please be patient as I slowly get through them.  The frequency of posts will remain lower for the time being as I am busy finalizing a draft of a book.  More on that in the near future.


Matt H, a reader, sent in the following entry (with minor edits).

I saw a couple of bad charts on money.cnn.com and thought I'd submit them to you. 

They're both part of the same feature on investment bargains caused by the recession. 

Here are the links:  

Cnn_municipal_bonds Chart A: Municipal Bonds

Cnn_corporate_bonds Chart B: Corporate Bonds

It seems to me like both charts would have made their points more eloquently by using a much simpler, more common form, like a bar chart

In Chart A, cubes are used to display the difference between treasury bond yields and AAA municipal bond yields at the two-year horizon and the ten-year horizon.  The volume of each cube corresponds to the yield for the given type of bond in the given period (I think), which spreads the one dimension being compared (yields) across three dimensions, making the differences look smaller than they really are.

[...] At the two-year horizon, the two yields being compared are 1.16% for Treasury bonds and 3.01% for AAA municipal bonds.  The yield for AAA municipal bonds in this case is more than 2.5 times larger than the yield for Treasury bonds, but the difference doesn't look nearly that big in the chart provided.  [...] 

Time out.  Let me add that the inadvertent reference to an optical illusion concerning foreground and background!  The "outline only" cube on the left should have approximately the same volume as the "solid red" cube on the right (3.01% versus 3.30%)  and yet the red cube appeared quite a bit larger because our eyes reacted to the solid color more than thin outlines.


Matt continued:

In Chart B, [...] Again, the metric in question is bond yields:  ten-year Treasury bond yields compared to investment-grade corporate bond yields.  The 2008 figure for each is shown alongside the five-year average.  This chart uses the area of a circle to express these yields, spreading the one-dimensional value across two dimensions.  As in Chart A, the result is a chart in which the difference between values does not appear as large as it actually is.  

I will also send a simple bar chart version of each chart -- the bar charts should illustrate the differences in yields more effectively than the charts actually used in this article.

These are his revised charts:

Redo_bondyield

We can do even better to convert the chart on the right to a time-series line chart.  Instead of the five-year average, it is better to display the gap beween treasury and corporate bonds for each of the five years plus 2008.  This should make for a more eye-catching graphic.


Reference: "Investment in the bargain bin", CNN Money.




Jan 11, 2009

A Harvard mess 2

Jon's comment on the previous post pretty much anticipated this post.   The prior post concentrated on graphical matters.  However, the biggest issue with that chart is the choice of metrics.  If the idea is to explore the potential adverse effect of a sharp decline in endowment investment performance, then it is not clear why one should be comparing the proportion of endowment funds and the proportion of operating revenues paid for by endowment funds.  A missing element from these two series is the relative size of the budgets of these different departments.

The next chart shows the proportion of each school's operating costs accounted for by endowment funds together with the total size of its operating costs.

Redo_hu3
We can turn the ratio around and directly compute how much of the total amount of endowment funds distributed for operating costs is accounted for by each school.  This is really the simplest metric that gets to the question.

Redo_hu4  

There are really two possible worries: for the School of Arts and Sciences, who pays for just about $1 billion costs with endowment funds, any significant reduction in distribution will leave a gaping hole; for a department like Radcliffe that pays for over 80% of its operating budget out of endowment funds, obviously a reduction in distribution can cause problems but we are talking about a base of $18 million rather than $1 billion.

Reference: Harvard University Financial Report, 2008; Harvard Fact Book 2007


PS. The first source did not contain any data on operating budgets so the first set of graphs (now replaced) did not show what I intended to show.  The new ones used the right data and had the right order of magnitude in terms of budgets ranging from millions to 1 billion.  The 2008 data are not available as of yet.

Jan 08, 2009

A Harvard mess 1

There are some innocent-looking charts that throw up more and more problems, the more you look at it.  This example comes from a magazine sent to Harvard alumni.  We have all heard that their endowment fund suffered some horrific losses in the last few months and so the magazine editor thought it useful to describe the potential impact on the different departments.

Hu_endowment
It's a safe bet our readers would not think to present two related data series as a combination of one bar chart and one column chart. 

As the chart stands, the intended message is completely lost.  It takes a bit of fishing to learn that the Radcliffe Institute has a tiny stake in the endowment fund but supports over 80% of its operating funds from the endowment.

Now looking a little deeper, we find that the scales of the two charts are not standardized so the length of a bar and the length of a column cannot be directly compared.  Nor can the grid-lines as each section accounts for 10% in the bar chart but 20% in the column chart (to make it worse, the larger section represents the smaller percentage!)

Looking further still, we find that "Other" accounts for some 15% of the endowment but apparently consists of entities that do not have operating funds and thus goes missing in the column chart.  In our versions below, we will ignore the "Other" category completely; this is equivalent to assuming that we have allocated to the individual schools a proportional share of "Other"'s share.

Not to mention arranging the schools by alphabetical order.

Much of this mess became apparent when we put the two charts into a uniform setting, like this:

Redo_hu1b
 

A scatter plot provides good information, especially if there is clustering  although we can debate whether it is fit for general publication.

Redo_hu2

More in our next post.


Reference: "The Endowment: Each school's stake", Harvard Magazine, Jan-Feb 2009.


PS.  The initial post switched the axis labels on the two bar charts. Thanks to Jon for pointing this out.

Dec 25, 2008

Seen at Starbucks

Merry Christmas!

If you happened to be in a Starbucks recently, you might have picked up some charts, which was what happened to one of our regular readers and commentators, ZBicyclist, who then tried his hand at chart critique here.  He is worried they may reach millions soon.  So look out!

Treering This graph on the right -- which should rightfully be called a "tree ring graph" -- seems to me a fantastic concept although it is hard to think of data that would deserve this treatment.  Certainly not the retail sales series plotted here! 

This is an object lesson in why bubble charts often fail.

One issue is scale: note the awkward way in which the innermost ring is used to designate the oldest sales data of $375 billion presumably in 1996, and think about how you would decide where to place the 2007 ring.  (It's arbitrary.)

Another problem is labeling: when the growth is slow, the rings are close together, and labels have to be jittered (look at 2001 and 2002).  In this case, a relatively simple solution is to have the entire series of years run diagonally.

Yet another challenge is relative radii versus relative areas.  Inevitably, some readers will respond to the areas while others will respond to radii  ZBicyclist, for example, belongs to the first group while in this case, I find myself siding with the latter.  When the bubbles/rings overlap, it is difficult to assess areas.

Of course, a simple line chart would do the job with minimal fuss.  The following chart issued by the National Retail Federation actually plots the growth rates, rather than the annual sales.

Nrf_sales (Please lose the grid-lines, or else add a vertical axis and lose the data labels.  A column or dot chart is also slightly preferred.)

Now go read ZBicyclist's point of view.  In the meantime, let us know if you think of ways to use the tree ring graph.







Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31