David Leonhardt's article on the graduation rates of public universities caught my attention for both graphical and statistical reasons.
David Leonhardt's article on the graduation rates of public universities caught my attention for both graphical and statistical reasons.
Posted on Sep 14, 2009 at 11:45 PM | Permalink | Comments (14) | TrackBack (0)
Here are some interesting reading from other places:
Tag clouds have caught on since we approved them a while ago. One interesting use was at the Life Vicarious blog. They use it to compare the inclinations of three New York-based restaurant reviewers. What they should have done is to remove irrelevant words like "one", "also", "many", "make"/"made", etc. In statistics, this is called removing "noise" which helps bring out the "signal".
Andrew Gelman discussed the NYT article that reported the finding of unexpected male bias in the children of Asian American families. He can be counted on to make useful comments on any accompanying graphics. He rightly pointed out that this is one example of not starting at zero: the relevant baseline is 100 since the metric is essentially the over-age of males relative to females. I also agree that a line chart with a longer time series plotting percentages rather than over-age would work better.
The racetrack chart made an appearance at Flowing Data. This one is even more busy and just as impossible to decipher.
Posted on Jun 29, 2009 at 08:58 AM | Permalink | Comments (2) | TrackBack (0)
Here are two more versions of the greenhouse gas chart.
The first one is a Marimekko which many would consider to be appropriate for this type of data. It is essentially a stacked bar chart where the width of the bar is scaled to the proportion of the type of gas. Here's what one would be looking at:
Merimekkos (also called Mosaic charts) share many of the problems of pie charts. Note the need to use multi-color, the difficulty in comparing the areas of the pieces (even worse than looking at sectors), and the difficulty in comparing across categories since the pieces float in irregular space (take for example the three pink pieces). My rule is: avoid at all costs. (Well, like the pie chart, when the data is sufficiently simple, with very few pieces and with some outliers, these could be acceptable.)
Secondly, here is a recycled junkart chart, with all white space removed from the interior. (Thanks to Derek for the suggestion.)
Depending on what the purpose of the chart is, one can decide what is the base for the proportions. My version preserves equity between the two dimensions. Anything else will require the designer to make a choice. If, for example, the base is 100% for each type of gas emitted, then the reader could not derive from the same chart the proportion of each source of emission (across all types of gases).
Posted on May 12, 2009 at 11:10 PM | Permalink | Comments (4) | TrackBack (0)
A particular genre of graphics is designed to induce awe: certain bits are allowed to stick out like a sore thumb. Via reader Andre L., and an archive of US Army medical photos and illustrations:
Posted on May 06, 2009 at 09:14 PM | Permalink | Comments (5) | TrackBack (0)
Message to readers: I have a large backlog of reader suggestions. Please be patient as I slowly get through them. The frequency of posts will remain lower for the time being as I am busy finalizing a draft of a book. More on that in the near future.
Matt H, a reader, sent in the following entry (with minor edits).
Time out. Let me add that the inadvertent reference to an optical illusion concerning foreground and background! The "outline only" cube on the left should have approximately the same volume as the "solid red" cube on the right (3.01% versus 3.30%) and yet the red cube appeared quite a bit larger because our eyes reacted to the solid color more than thin outlines.
Matt continued:
These are his revised charts:
We can do even better to convert the chart on the right to a time-series line chart. Instead of the five-year average, it is better to display the gap beween treasury and corporate bonds for each of the five years plus 2008. This should make for a more eye-catching graphic.
Reference: "Investment in the bargain bin", CNN Money.
Posted on Jan 25, 2009 at 05:51 PM | Permalink | Comments (3) | TrackBack (0)
I ran across this hugely successful chart on Dean Foster's home page (and noted that he and his Wharton colleagues have a nice blog picking apart statistical errors committed in public.)
This is a histogram plotting the historical year-on-year returns of the S&P 500 index, binned into 10%-levels. It succeeds on two levels: the innovation of printing the years inside little blocks provides extra information without distracting the overall picture; the key message of this plot, that the negative return of 2008 is a negative outlier in the history of returns, is extremely clear.
This, in my mind, is a superior presentation than the usual time-series line chart that we see in every economics publication. For some purposes, it is better to unshackle ourselves from the linear time dimension, and this is a good example.
One question/comment: within each 10% level, the years are arranged in reverse chronological order fro top to bottom. This facilitates searching for a particular year. The obvious alternative is to order by the actual level of return, so that the result is akin to a stem-and-leaf plot.
While I like the graphical aspect of the chart, I feel like it has limited function. This graph appears useful to anyone who has a one-year investment horizon. If I want to predict what next year's S&P 500 return is, I might take a random sample from this distribution. However, as a lazy investor, I never look at a one-year horizon so this creates two problems: if I am looking five years out, I can't take five samples from this distribution because there is serial correlation in this data for sure; even if I could take those five samples, it is difficult to compute the five-year return in my head.
So what I did was to take the data and replicate this histogram for 2-year, 3-year, 5-year, 10-year, etc. returns. The results are as follows. I decided to simplify further and use Tukey's boxplot instead of the histogram. The data are real compounded total returns from S&P 500 from 1910-2008.
The boxplot on the top right shows that there is about a 25% chance that an investment in the S&P 500 will return negative in real terms in any three-year period (below the green line). At the other end, there is a 25% chance of getting earning more than 50% on the principal during those three years.
The next set of boxplots compared 5-year returns to 10-year returns and 10-year returns to 20-year returns. If we have a 10-year horizon, there is still positive chance of reaching the end of the decade and finding the investment under water! The median 10-year return is approximately doubling the principal (about 8% per annum compounded).
In a twenty-year period, there is hardly any chance of not making money on the S&P. There were two positive outliers of over 1000% (about 13% per annum compounded over 20 years).
Reference: Data from Global Financial Data
Posted on Jan 16, 2009 at 08:16 AM | Permalink | Comments (6) | TrackBack (0)
Merry Christmas!
If you happened to be in a Starbucks recently, you might have picked up some charts, which was what happened to one of our regular readers and commentators, ZBicyclist, who then tried his hand at chart critique here. He is worried they may reach millions soon. So look out!
This graph on the right -- which should rightfully be called a "tree ring graph" -- seems to me a fantastic concept although it is hard to think of data that would deserve this treatment. Certainly not the retail sales series plotted here!
This is an object lesson in why bubble charts often fail.
One issue is scale: note the awkward way in which the innermost ring is used to designate the oldest sales data of $375 billion presumably in 1996, and think about how you would decide where to place the 2007 ring. (It's arbitrary.)
Another problem is labeling: when the growth is slow, the rings are close together, and labels have to be jittered (look at 2001 and 2002). In this case, a relatively simple solution is to have the entire series of years run diagonally.
Yet another challenge is relative radii versus relative areas. Inevitably, some readers will respond to the areas while others will respond to radii ZBicyclist, for example, belongs to the first group while in this case, I find myself siding with the latter. When the bubbles/rings overlap, it is difficult to assess areas.
Of course, a simple line chart would do the job with minimal fuss. The following chart issued by the National Retail Federation actually plots the growth rates, rather than the annual sales.
(Please lose the grid-lines, or else add a vertical axis and lose the data labels. A column or dot chart is also slightly preferred.)
Now go read ZBicyclist's point of view. In the meantime, let us know if you think of ways to use the tree ring graph.
Posted on Dec 25, 2008 at 07:54 PM | Permalink | Comments (5) | TrackBack (0)
Adam, who is the designer behind the Wired graphics special on "The Future of Food", asked about the rest of the series. We previously made some comments on a set of mini donut charts.
The first thought that came to mind after browsing through all the charts was: what a great job they have done to generate interest in food data, which has no right to be entertaining. Specifically, this is a list of things I appreciated:
Further suggestions:
Reference: "The Future of Food", Wired, Oct 20 2008.
Posted on Dec 11, 2008 at 01:18 AM | Permalink | Comments (2) | TrackBack (0)
When comparing two time series, one typically wants to discuss the size of the gap as it changes over time. This Business Week chart, for example, depicted for readers the expanding gap between intra-day high and low prices of the S&P 500 for 2008.
This chart construct is effective at pointing out large changes but lacks precision in conveying smaller differences, or trends. It is always a good idea to plot the gap directly, as we will show below.
More importantly, a better choice of scale can help a lot. By focusing exclusively on variability (extreme values), this chart hides the relevant information of the closing prices of the S&P. A point spread of a 100 points means more when the index is at 800 than at 1200. In order to capture this, we can divide the point spread by the opening price of that day so we say the gap is one-eighth or one-twelfth of the opening price.
The junkart version makes both changes. The top chart fixes the scale, plotting the point spread as a percentage of daily opening prices. Relative to the original chart, the variability in the front part of 2008 was muted because the index was at higher levels back then.
The bottom chart plots the gap sizes (lengths of the high-low lines). It is without doubt that directly plotting the gaps showcases the key message. The current level of volatility is more than double what occurred at the beginning of the year.
If one wants to illuminate the trend as opposed to daily fluctuations, a further improvement will be using moving averages.
For those interested, shown below is a scatter plot that compares the original point spread and the derived point spread, which shows that the change is not trivial.
Reference: "The Market: A Daily Roller Coaster", Business Week, Oct 27 2008.
Posted on Oct 21, 2008 at 12:21 AM | Permalink | Comments (2) | TrackBack (0)
Frederic M. sent in this chart, together with his commentary.
Bubbles across rows have vastly different numbers but their circles are of identical size (or vice versa). It borders on the ridiculous that all bubbles of the US row have the same size... The question if teenage birth rates and teen sex are correlated cannot be eye-balled with this kind of display. The fact that you cannot compare across rows make this an instance of “chart junk”.
I add:
White spaces -- always dangerous. Does lack of bubble imply no data or no abortions/sex?
Sorting -- this is what Howard Wainer called "Arizona first" with a twist (United States)
Loss aversion -- would U.S. readers be resentful if countries like Iceland are excluded? A much reduced version comparing U.S. to say Canada, U.K, Japan and Germany may yield more information for the reader.
Sufficiency -- if all the data are printed as in a table, why do we need the bubbles?
Reference: "Let's Talk About Sex ", New York Times, Sep 6 2008.
Posted on Sep 20, 2008 at 09:24 PM | Permalink | Comments (11) | TrackBack (0)


Recent Comments