A particular genre of graphics is designed to induce awe: certain bits are allowed to stick out like a sore thumb. Via reader Andre L., and an archive of US Army medical photos and illustrations:
This is a small multiples graph designed to display the somewhat seasonal pattern of deaths due to influenza over years. Basically, we see a U shape in almost every year; however, the height of the peak, and the timing of the peak shows quite a lot of variation. Further, some years exhibit more of an L-shape than U-shape.
But the attention grabber here is the massive peak that occurred between 1918 and 1919. It was unusual in many ways... it was the second big peak during 1918, it occurred late in the year and ellided with the next year's peak. The designer allowed these two components to bleed into the other charts.
From the perspective of scale, readability, cleanliness, this bit sticks out like a sore thumb! But one has to say it is effective.
A log scale is often used to deal with data containing such outliers. But while this makes neater charts, the impact of the orders-of-magnitude difference is lost on the reader, except in her imagination.
We introduced the racetrack chart before. Via Zero Hedge, we find a version of it, perhaps a race for animals. In a race for humans, they run in concentric circles; animals are not so tame, they may stray off the track, or just refuse to continue.
The designers certainly tried very hard to make the numbers palatable. Indeed, given how much of our taxpayer funds are being thrown to the fire these days, any informed citizen ought to know how the money was being spent. Their hard work, unfortunately, was not rewarded as the various constructs failed to improve our understanding of the data.
The three annotations on the right tell us that the arc width at the left indicates the allocated funds while the arc width at the right indicates the actual amounts spent as of end of April. In addition, the breakpoint on each arc in relationship to the fan of lines indicate the date at which the funds were allocated.
In reality, things are a bit more complicated. When all allocated funds have been spent, as apparently the case of Fed funds for AIG, the arc has no break point and thus the date of the allocation is missing. Also, when the same use soaks up funds from multiple sources, the width on the right gets confusing: take for example FDIC funds for unlocking credits; it's unclear how the two arcs add up to 1.8 trillion.
Perhaps a flow chart might work well for this sort of data.
Reference: "Visual Representation of the Government Intervention Programs", Zero Hedge blog, April 8 2009.
Message to readers: I have a large backlog of reader suggestions. Please be patient as I slowly get through them. The frequency of posts will remain lower for the time being as I am busy finalizing a draft of a book. More on that in the near future.
Matt H, a reader, sent in the following entry (with minor edits).
I saw a couple of bad charts on money.cnn.com and thought I'd submit them to you.
They're both part of the same
feature on investment bargains caused by the recession.
It seems to me like both charts would have made their points more
eloquently by using a much simpler, more common form, like a bar
In Chart A, cubes are used to display the difference between
treasury bond yields and AAA municipal bond yields at the two-year
horizon and the ten-year horizon. The volume of each cube corresponds
to the yield for the given type of bond in the given period (I think),
which spreads the one dimension being compared (yields) across three
dimensions, making the differences look smaller than they really are. [...] At the two-year horizon, the two yields being compared are 1.16% for
Treasury bonds and 3.01% for AAA municipal bonds. The yield for AAA
municipal bonds in this case is more than 2.5 times larger than the
yield for Treasury bonds, but the difference doesn't look nearly that
big in the chart provided. [...]
Time out. Let me add that the inadvertent reference to an optical illusion concerning foreground and background! The "outline only" cube on the left should have approximately the same volume as the "solid red" cube on the right (3.01% versus 3.30%) and yet the red cube appeared quite a bit larger because our eyes reacted to the solid color more than thin outlines.
In Chart B, [...] Again, the
metric in question is bond yields: ten-year Treasury bond yields
compared to investment-grade corporate bond yields. The 2008 figure
for each is shown alongside the five-year average. This chart uses the
area of a circle to express these yields, spreading the one-dimensional
value across two dimensions. As in Chart A, the result is a chart in
which the difference between values does not appear as large as it
I will also send
a simple bar chart version of each chart -- the bar charts should illustrate the differences in yields more effectively than the charts actually used in this article.
These are his revised charts:
We can do even better to convert the chart on the right to a time-seriesline chart. Instead of the five-year average, it is better to display the gap beween treasury and corporate bonds for each of the five years plus 2008. This should make for a more eye-catching graphic.
Reference: "Investment in the bargain bin", CNN Money.
If you happened to be in a Starbucks recently, you might have picked up some charts, which was what happened to one of our regular readers and commentators, ZBicyclist, who then tried his hand at chart critique here. He is worried they may reach millions soon. So look out!
This graph on the right -- which should rightfully be called a "tree ring graph" -- seems to me a fantastic concept although it is hard to think of data that would deserve this treatment. Certainly not the retail sales series plotted here!
One issue is scale: note the awkward way in which the innermost ring is used to designate the oldest sales data of $375 billion presumably in 1996, and think about how you would decide where to place the 2007 ring. (It's arbitrary.)
Another problem is labeling: when the growth is slow, the rings are close together, and labels have to be jittered (look at 2001 and 2002). In this case, a relatively simple solution is to have the entire series of years run diagonally.
Yet another challenge is relative radii versus relative areas. Inevitably, some readers will respond to the areas while others will respond to radii ZBicyclist, for example, belongs to the first group while in this case, I find myself siding with the latter. When the bubbles/rings overlap, it is difficult to assess areas.
Of course, a simple line chart would do the job with minimal fuss. The following chart issued by the National Retail Federation actually plots the growth rates, rather than the annual sales.
We hope this is indication that the British paper Guardian (with one of the best websites out there) is joining the fun. It appears that they have quietly debuted an interactive graphics feature. The first edition addressed the oil price crisis.
The use of inflation-adjusted figures seems obvious but we don't see much of these in the press. Highlighting the peaks and providing annotation (when moused over) is an excellent touch. The gridlines and axis labels (especially the year axis) are thankfully restrained. We don't see the need for the unadjusted series (blue line), however. The fact that the gap grew larger the more time we went back told us little, as it invited readers to read into it more than what it truly was, the time value of money.
Later on, they used an oil barrel object to illustrate the components of retail oil price. The height of the cylinder is indeed proportional to the data plotted. If only they colored the end of the cylinder gray instead of green! As it stands, the green portion has about the same area as the red.
Perhaps harkening to the close race between Obama and Clinton, the designer chose to illustrate this with what we have called the "racetrack" graph. We have previously discussed the problems here and here.
In this rendition, a pie chart was divided into three race tracks with "cities" getting the inside track and "rural/small cities" getting the outside track. (As the Clinton supporters might say, elitism was in the air.) There were two great choices: the courage to not print the data and let the chart speak for itself, and the wisdom to white out the votes for "others".
Nevertheless, as we discussed before, the data is coded into the angles rather than the lengths of the strips, which presents a real problem in comparing vote shares. For example, try figuring out if there were more Obama supporters in rural Tennessee than there were Clinton supporters in cities in Tennessee (bottom right).
Also note where the white "others" space were, and the impossibliity of comparing them.
The arrangement for Wisconsin, meanwhile, posed a challenge for anyone who wanted to estimate how many rural Wisconsin voters went for "others".
In the junkart version, we go with the two-sided bar chart, typically found in population pyramids. The information presented jumps out at you.
This chart is essentially the same as the racetrack; one just needs to straighten out the strips from the original chart, and pull the Clinton ones clockwise, and Obama ones anti-clockwise.
Reference: some recent issue of New York Times magazine.
Avinash has an interesting piece about some examples of visualization of Web data. That's a very rich area since there is so much data. I agree with his observation that there are precious few truly great charts that have thus far appeared. (Note, though, that typically the more data, the more noise. See this post.)
He discussed a tag cloud display of the top cities from which website visitors hail. We like tag clouds too. See here, here and here.
He praised a particular pie chart because "the pie ... is just a stage prop". It worked because all the data was printed on the chart itself. This violates our self-sufficiency principle: if all the data is printed on the chart, and the only way to read it is to look at the data, then the chart serves no purpose. More here.
He liked the Amazon's feature of customer ratings distributions. Me too. A powerful example of small graphics that make a huge impact. Here is the typical Web rating display: Almost everyone uses the statistical average. This hides information about how dispersed (or not) customer's reactions were. The current Amazon display gives us this information: Notice that 108 customers actually gave this book the lowest rating even though the average was four stars.
The most intriguing example was Google's comparison of keyword performance to the site average. It's a good idea but the execution is wanting.
Firstly, I believe the percentages are much better presented as index values, with 100 being the site average. Secondly, it is unnerving to have red associated with positive values, green with negative values, or to have negative values on the right of positive values. I think they realize green and to the right should represent "good" (bounce rate of visitors lower than average) but this just doesn't work.Thirdly, are the data labels really necessary? they impede our sight lines when comparing bars. And do we need to know to two decimal places?
PS. Apologies for the inconsistent font. Typepad continues its mischief: I couldn't change the font size after adding a hyperlink. Apparently I have to fix the font size before adding a link. You also might notice the changing font size as I write this paragraph. Don't know why there was a switch; I didn't ask for it.
Oftentimes, picking the right scale for a chart makes all the difference. The following chart showed up in the New York Times Magazine some time ago. Readers will immediately recognize this as "infotainment" rather than a serious attempt to convey the data.
The data came from a study by the Center on Education Policy which counted the amount of instruction time spent on various subjects at a sample of elementary schools in the U.S.
A simple bar chart would make a nice graphic, as shown on the right. Instead of sorting by decreasing minutes, we pulled out "lunch" and "recess" since they belong to a separate category.
Our main focus, though, is on the scale. The original report - and thus the original graphic - used minutes per week. We contend minutes per day (or even hours per day) to be more user-friendly. This is because any number makes sense only in comparison to other numbers. There is no easy reference to a number such as 500 minutes per week. However, being told it's 100 minutes per day (or 1 hr 40 min per day) means a lot because everyone knows there are 24 hours in a day.
This is a small example of a larger problem with using averages. The media loves to give out statistics like six people are dying of diabetes every minute (e.g. here). This is typically done by dividing the total number of diabetes-related deaths in a year by the number of minutes in a year. Why divide by total number of minutes in a year? The fallacy of such a calculation is evident if one applies this logic to natural deaths (since we all have to die some day). As the world population grows, there will just be more and more people dying every minute!
Choosing the appropriate reference point -- just like picking the right scale -- is the beginning of any good analysis.
Reference: New York Times magazine, April 27 2008; Center on Education Policy.
Joran E. pointed to this "icky" chart he found on Clive Crooks' blog at the Atlantic.
He ordered a "junkchart treatment", so here it comes.
First we wanted to process the triangles, dots and squares to make sense of this data. We noted that the data came from a single year (2005) so the chart did not trace the development of the education sector over time. But wait, it used a different route to get at the same idea. The author compared different generations within each country to see if more and more citizens took university degrees. So each vertical "arrow" was kind of a historical record of different generations within a country. Under this criterion, Korea and Japan had come a long way while the US and China stagnated.
The chart is quite impossible to read as designed. There is little reason to sort by 25-34-year-old proportion when the message concerns improvement over generations. Besides, what about countries that apparently retrogressed? (like Russia and Germany)
For this data, I returned to my favored bumps chart. Here is version one. There are two ways to read this chart: across countries, we note that most of the European states (blue) had similar profiles showing roughly a constant rate of growth. The Asian duo of Japan and Korea (brown) had the most marked growth. Of North America (black), Canada diverged from the US since the 35-44 generation.
Alternatively, we can focus on the change generation-over-generation. From 55-64 to 45-54, almost all countries in this sample (except Japan) grew at the same rate. Then between 45-54 and 35-44, the two Asian countries clearly set the pace. The generation between 35-44 and 25-34 is most interesting: Korea has not slowed, Japan has slowed a little but still grew as fast as Canada. A trio of European countries (Spain, Ireland, France) outpaced their neighbors.
Below I show version two. This one combines bumps chart with small multiples. North America, Europe and Asia/Australia are now in separate charts. This removes clutter.
Reader Nick B. sent in this example calling it "interesting". The chart tells a compelling story once we figure out what it is. Grasping the tree structure is key.
It illustrates the important idea that averaging sometimes masks variations in the data. For example, while the province of Guerrero scored 78% on literacy, the municipalities within Guerrero had scores ranging from 28% to 90%.
It also shows that the gender gap was larger in lesser Metlatonoc municipality than in more literate Cuautitian.
In addition, it tells us that while Mexico on average measured very well on literacy, subpopulations within Mexico spanned the world's best and worst (from about Mali's level to Italy's).
While I find this chart adequate, the pieces hanging off each other did not seem ideal, especially the two overlapping municipality pieces which were placed next to each other. However, it is tough to come up with an alternative. Here's one attempt; the changes are mild.
I prefer the horizontal orientation.
The branches are emphasized (as opposed to the "T" junction) because that's a key part of the story.
The national level, especially the span between Mali and Italy, is de-emphasized; I treat it as gridlines.
Instead of placing the overlapping pieces next to each other, I let the ranges literally overlap, which serves to stress this feature.