Jun 29, 2009

Round up

Here are some interesting reading from other places:


Blog_foodtag Tag clouds have caught on since we approved them a while ago.  One interesting use was at the Life Vicarious blog.  They use it to compare the inclinations of three New York-based restaurant reviewers.  What they should have done is to remove irrelevant words like  "one", "also", "many", "make"/"made", etc.  In statistics, this is called removing "noise" which helps bring out the "signal".








Nyt_babyimbalance Andrew Gelman discussed the NYT article that reported the finding of unexpected male bias in the children of Asian American families.  He can be counted on to make useful comments on any accompanying graphics.  He rightly pointed out that this is one example of not starting at zero: the relevant baseline is 100 since the metric is essentially the over-age of males relative to females.  I also agree that a line chart with a longer time series plotting percentages rather than over-age would work better.


















Fd_calorie The racetrack chart made an appearance at Flowing Data.  This one is even more busy and just as impossible to decipher.


May 26, 2009

The trouble with two lines

It's never a good idea to put two scales on one chart, and this is another example of what not to do:

Ir_newspaper

The ugliest part of this chart are the duelling gridlines. Because neither axis starts at zero, it is difficult to access whether the number of newspapers was declining at a faster pace than the circulation was. Also, line charts would be better able to trace the evolution over time. The interspersed blue and red columns interfere with each other. Note that the designer lessened our pain by plotting every other year, thus halving the number of columns.

The junkart version also puts two data series on the same chart but on the same
scale. Instead of plotting the raw data, we plot indices, with 2008 as 100. This reveals a pattern that was not apparent in the original chart. There appeared to have been four periods of evolution: up till 1980, both the number of newspapers and total circulation were at a plateau; from 1980 to 1990, the circulation stayed stable while the number of papers dropped drastically, indicating perhaps consolidation; then from 1990 to 2003, both series declined at roughly equal rates; and finally, the bottom dropped off the circulation from 2003.


Redo_newspaper


See our previous discussion of dual axes here, here, and here


Reference: "Channel Shift: Online Circulars: the first step by retailers toward web-to-store harmony? ", Internet Retailer (print), May 2009. Data from the Newspapers Association of America.


Bonus 1: Thinking about the column interspersing trick some more, I realize that it is, em, possible, em, that one series was plotted for odd years and the other series plotted for even years!

Bonus 2: Here is the requested scatter plot (still indiced):

Redo_newspaper2



Apr 19, 2009

Don't mess with the scale

My friend Patrick pointed out the single biggest issue with the chart below -- that the designer chose a scale that precisely undermines the message of the chart.  Undermine may be too mild a word to use here; annihilate may be more apt.

Dow2


The lines in this chart are anchored at the zero point on the time line (horizontal axis) used to indicate the bottoms of various bear markets in the Dow from 1929 to 2007.  From that anchor, time runs to the left showing the amount of time for the Dow to go from peak to bottom (the decline); time runs to the right showing the amount of time for the Dow to climb back to the prior peak (the recovery).  As the caption said, the point of the chart is "if the decline was fast, the recovery took a considerable time".

Funny thing then that the distances from the zero point are roughly comparable on the left as on the right.

This illusion resulted from some very convoluted and perplexing messing around with the horizontal scale.  First, the left-of-center scale is in months while the right-of-center is in years.  Second, the left-of-center scale has normal spacing while the right-of-center seemingly was suffering from spasms.  Take a closer look:

Dow2_right The first five years (0-5) took up about half the scale while the next five (5-10) took maybe one-eighth.  The first year (0-1) took about as much space as the next two years (1-3).

I am not quite sure what is the logic behind this but since the message of the chart has everything to do with the time duration, it is most unfortunate to introduce such distortions.

There is yet another "innovation" in this chart.  Notice that on the right side, the axis labels are irregular (more spasms)... 0,1,3,4,5,10,15,20, 25...  This is as if the designer is posing one of those IQ questions requiring readers to figure out the next number in the sequence.  The specific time intervals selected may have meaning: note that all the lines are straightened out in between these tick marks.  Given that each line represents a different historical sequence, it is difficult to comprehend the regularity of these intervals across history.  Perhaps this will prove to be the key to unlocking the secret of this chart.  Please comment below if you are able to unravel this mystery.

Besides, the same type of "innovation" was not applied to the left side of the chart.  Here, the designer opted to throw out all the data between the peak and the bottom and straightened out all the intermediate fluctuations.

Below are two different versions of this chart, basically restoring the time scale to the normal, equally spaced, symmetric appearance.  The top one used monthly Dow returns where the volatility obstructed our understanding of the trends, requiring the use of color to differentiate the lines.  In the next version, I used R to generate the loess estimates (a type of smoothing) and the trends became clearer.  (There was a prior discussion of loess on Junk Charts here.)


Redo_dow2
 

Now, these pictures are very different from the original graph!

I'd be very cautious about reading into these charts anyway.  This question is not one suitable for statistical analysis.  The sample size of six is far too small.  Each recession is different in terms of causes, remedies and context.  The fact that we call them recessions do not make them comparable.  Further, it is also impossible to know at this stage if the 2007 decline has reached bottom.  The chart designer essentially assumed this to be the case but who knows?


PS. Nick Rapp, one of the designers of the chart, responds in the comments.  He has started a blog to feature the work of his graphics team at AP.  His colleague has created an interactive version.  More than anything, this post highlights an aspect of the chart that Nick and his team clearly spent a lot of time doodling over.  The concept of the chart itself is wonderful actually, if I didn't say so already; it is essentially the same chart as the oft-printed chart where the anchor point is the start of each recession, only here the anchor is the bottom of each recession.

Feb 22, 2009

Scale restoration

The original graph threw us off our sense of scale.  It seemed to be saying all these oil companies are roughly the same size but one grew much faster than the others.  The red color and the setting off of the data above the title of the chart seemed to announce some important find.

The junkart version on the right reversed everything to our normal sense of scale.  It is a version of the bumps chart, one of my favorites.

Redo_total

So we find that Total is the smallest of these oil companies, about half the size of ExxonMobil -- you wouldn't know that from those abysmal bubbles!   Adding to the problem is that the growth data is used to sort the companies while the actual production data is hidden in the data labels.

Total is indeed growing faster but BP is not far behind.  The fall of ExxonMobil and Royal Dutch Shell is equally intriguing.


Reference: "Total, the French Oil Company, Places Its Bets Globally", New York Times, Feb 21, 2009.

Feb 13, 2009

Eternal optimist

Chris P pointed us to the "Financial Comeback" calculator, surely a well-meaning joke from the folks at the Times.  Here is how one gets to make a 40% loss back in just six years!

Nyt_comeback1
Surely, someone has to tell them about simulation.  They have to assume a probability distribution on the annual returns, and show us some sample paths.  Using the average annualized historical return in essence wipes out all variability and no wonder it's smooth sailing upwards.  Eternal optimist.  

Here is Chris' comment:

The bad news is that the range of values it offers does not include the return on the market from last year (-31% to -36%).  I guess they are optimistic.


The interactive features of this chart, however, impressed me.  The smooth adjustment to the chart as one slides the control, including the automatic choice of appropriate axis labels, is very nice indeed.

Okay, they want to trademark the name of the calculator so perhaps this is serious.


Reference: "Calculate your financial comeback", Jan 6 2009.

Jan 28, 2009

Popping the bubble, so to speak

Matt H., who authored the previous post, and Charles G. both pointed to a great example of how people like readers here can make a difference.   A bad chart got made over!

The Financial Times published a chart from a JP Morgan report, using ... you guessed it ... bubbles to illustrate the deep plunge in market capitalization of many banks.

4156

Jpmorgan

Some readers at the blog were none too happy with the choice of bubble charts.   Among other things, the designer made the common mistake of plotting proportional diameters rather than proportional areas.  This is clear from looking at JP Morgan's bubble.

This chart exposes the weakness of bubble charts well.  Look at the top row of bubbles.  Most of them look so similar it is impossible to know, without spending much time studying the circles, which bank was hurt more.

Eventually, a bar chart was produced.  Felix Salmon linked to it.  I am not sure how the banks were ordered in this chart.  It isn't one of the two obvious dimensions, nor is it in alphabetical order.


Bankchart2


In fact, neither the bubble nor the bar chart works well for this case.  What we need is the Case-Schiller style asset-bubble life cycle chart.  In order to interpret these changes in market caps, we need to know how big was the bubble, and then how steep was the consequent decline.  Take a look at our discussion of real estate bubble charts here.

Reference: "Bank capitalization chart of the day", Felix Salmon at Portfolio.com and "Bank picture du jour", FT Alphaville, Jan 21 2008.

Jan 25, 2009

Bond yield makeover

Message to readers: I have a large backlog of reader suggestions.  Please be patient as I slowly get through them.  The frequency of posts will remain lower for the time being as I am busy finalizing a draft of a book.  More on that in the near future.


Matt H, a reader, sent in the following entry (with minor edits).

I saw a couple of bad charts on money.cnn.com and thought I'd submit them to you. 

They're both part of the same feature on investment bargains caused by the recession. 

Here are the links:  

Cnn_municipal_bonds Chart A: Municipal Bonds

Cnn_corporate_bonds Chart B: Corporate Bonds

It seems to me like both charts would have made their points more eloquently by using a much simpler, more common form, like a bar chart

In Chart A, cubes are used to display the difference between treasury bond yields and AAA municipal bond yields at the two-year horizon and the ten-year horizon.  The volume of each cube corresponds to the yield for the given type of bond in the given period (I think), which spreads the one dimension being compared (yields) across three dimensions, making the differences look smaller than they really are.

[...] At the two-year horizon, the two yields being compared are 1.16% for Treasury bonds and 3.01% for AAA municipal bonds.  The yield for AAA municipal bonds in this case is more than 2.5 times larger than the yield for Treasury bonds, but the difference doesn't look nearly that big in the chart provided.  [...] 

Time out.  Let me add that the inadvertent reference to an optical illusion concerning foreground and background!  The "outline only" cube on the left should have approximately the same volume as the "solid red" cube on the right (3.01% versus 3.30%)  and yet the red cube appeared quite a bit larger because our eyes reacted to the solid color more than thin outlines.


Matt continued:

In Chart B, [...] Again, the metric in question is bond yields:  ten-year Treasury bond yields compared to investment-grade corporate bond yields.  The 2008 figure for each is shown alongside the five-year average.  This chart uses the area of a circle to express these yields, spreading the one-dimensional value across two dimensions.  As in Chart A, the result is a chart in which the difference between values does not appear as large as it actually is.  

I will also send a simple bar chart version of each chart -- the bar charts should illustrate the differences in yields more effectively than the charts actually used in this article.

These are his revised charts:

Redo_bondyield

We can do even better to convert the chart on the right to a time-series line chart.  Instead of the five-year average, it is better to display the gap beween treasury and corporate bonds for each of the five years plus 2008.  This should make for a more eye-catching graphic.


Reference: "Investment in the bargain bin", CNN Money.




Dec 25, 2008

Seen at Starbucks

Merry Christmas!

If you happened to be in a Starbucks recently, you might have picked up some charts, which was what happened to one of our regular readers and commentators, ZBicyclist, who then tried his hand at chart critique here.  He is worried they may reach millions soon.  So look out!

Treering This graph on the right -- which should rightfully be called a "tree ring graph" -- seems to me a fantastic concept although it is hard to think of data that would deserve this treatment.  Certainly not the retail sales series plotted here! 

This is an object lesson in why bubble charts often fail.

One issue is scale: note the awkward way in which the innermost ring is used to designate the oldest sales data of $375 billion presumably in 1996, and think about how you would decide where to place the 2007 ring.  (It's arbitrary.)

Another problem is labeling: when the growth is slow, the rings are close together, and labels have to be jittered (look at 2001 and 2002).  In this case, a relatively simple solution is to have the entire series of years run diagonally.

Yet another challenge is relative radii versus relative areas.  Inevitably, some readers will respond to the areas while others will respond to radii  ZBicyclist, for example, belongs to the first group while in this case, I find myself siding with the latter.  When the bubbles/rings overlap, it is difficult to assess areas.

Of course, a simple line chart would do the job with minimal fuss.  The following chart issued by the National Retail Federation actually plots the growth rates, rather than the annual sales.

Nrf_sales (Please lose the grid-lines, or else add a vertical axis and lose the data labels.  A column or dot chart is also slightly preferred.)

Now go read ZBicyclist's point of view.  In the meantime, let us know if you think of ways to use the tree ring graph.







Dec 11, 2008

Food art

Adam, who is the designer behind the Wired graphics special on "The Future of Food", asked about the rest of the series.  We previously made some comments on a set of mini donut charts.

The first thought that came to mind after browsing through all the charts was: what a great job they have done to generate interest in food data, which has no right to be entertaining.  Specifically, this is a list of things I appreciated:

  • An obvious effort was undertaken to extract the most thought provoking data out of a massive amount of statistics collected by various international agencies.  There weren't any chart that is overstuffed, which is a common problem.
  • It would be somewhat inappropriate to use our standard tools to critique these charts.  Clearly, the purpose of the designer was to draw readers into statistics that they might otherwise not care for.   Moreover, the Wired culture has long traded off efficiency for aesthetics, and this showed in a graph such as this, which is basically a line chart with two lines, and a lot of mysterious meaningless ornaments:
  • Wired_feedtheworld
  • A nice use of a dual line chart, though.  It works because both data series share the same scale and only one vertical axis is necessary, which is very subtly annotated here.
  • The maintenance of the same motifs across several charts is well done.  (See the pages on corn, beef, catfish)


Further suggestions:

  • Wired_bar It would be nice if Wired would be brave enough to adopt the self-sufficiency principle, i.e. graphs should not contain a copy of the entire data set being depicted.  Otherwise, a data table would suffice.  The graphical construct should be self-sufficient.  This rule is not often followed because of "loss aversion"; there is the fear that a graph without all the data is like an orphan separated from the parents.  Since, as I noted, these graphs are mostly made for awe, there is really no need to print all the underlying data.  For instance, these "column"-type charts can stand on their own without the data (adding a scale would help).
  • Not sure if sorting the categories alphabetically in the column chart is preferred to sorting by size of the category.  The side effect of sorting alphabetically is that it spreads out the long and the short chunks, which simplifies labelling and thus reading.
  • Not a fan of area charts (see below).  Although it is labelled properly, it is easy at first glance to focus on the orange line rather than the orange area.  That would be a grave mistake.  The orange line actually plots the total of the two types of fish rearing, not the aquaculture component.  The chart is somewhat misleading because it is difficult to assess the growth rate of aquaculture.  Much better to plot the size of both markets as two lines (either indiced or not).
  • Wired_aquaculture 


Reference: "The Future of Food", Wired, Oct 20 2008.

Dec 03, 2008

Mini donuts

Wired_diet As a reader noted, this chart is essentially unreadable.  It contains data for the composition of diets in four countries during two time periods.

What might we want to learn from this data?

Are there major differences in diet between countries?

Within each country, are there changes in diet composition over the thirty years?

If there were changes in diet inside a country over time, did those reflect a worldwide trend or a trend specific to that country?

Unfortunately, the use of donut charts, albeit in small multiples, does not help the cause.  The added dimension of the size of the pies, used to display the total calories per person per day, serves little purpose.  Seriously, who out there is comparing the pie sizes rather than reading off the numbers in the donut holes if she wants to compare total calories?

This data set has much potential, and allows me to show, yet again, why I love "bumps charts".

Here is one take on it.  (Note that the closest data I found was for six different countries - China, Egypt, Mexico, South Africa, Philippines, India - and for different periods.)

Redo_diet1

The set of small multiples recognizes that the comparison between 1970 and 2000 is paramount to the exercise.  There is a wealth of trends that can be pulled out of these charts.  For example, the Chinese and Egyptians take in much more vegetables than the people of the other countries; in particular, the Chinese increased the consumption of vegetables drastically in those 30 years. (top row, second from left)

Or perhaps, for sugars and sweetners, consumption has increased everywhere except for South Africa.  In addition, the Chinese eat a lot less sugars than the other peoples. (top row, right)

Egg consumption also shows an interesting pattern.  In 1970, the countries had similar levels but by 2000, Mexicans and the Chinese have outpaced the other countries. (bottom row, right)

These charts are very versatile.  The example shown above is not yet ready for publication.  The designer must now decide what are the key messages, and then can use color judiciously to draw the reader's attention to the relevant parts.

Also, some may not like the default scaling of the vertical axes. That can be easily fixed.


Finally, here is another take which focuses on countries rather than food groups.  We note that too many categories of foods make it hard to separate them.

Redo_diet2


References: "Who's Eating What?", Wired, Oct 2008; "The Double burden of malnutrition", FAO, 2006.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31