This WSJ graphic gives me a reason to talk about the self-sufficiency test: go ahead, and block out the data labels on the chart, you are left with concentric circles but no way to learn anything from the chart, not the absolute dollar values, nor the relative dollar values. In other words, the only way to read this chart is to look at the data labels.
The online article does not include the graphic. It's an article talking about Neil Armstrong's death. Here's the same data using bar charts:
The chart would be much improved if a longer time series is included giving us values for each year. It's pretty clear that this data is subject to sudden jumps (e.g. Armstrong's death) and so picking arbitrary years will likely cause is to miss important events.
Circles are also subject to various types of optical illusion. Before you use bubble plots, give the following a look:
Can we judge the size of circles in relation to other circles? (credit)
Can we judge the relative distance between circles? (credit)
Can we judge the relative sizes of circles within circles? (credit)
An email lay in my inbox with the tantalizing subject line: "How to Create Good Infographics Quickly and Cheaply?" It's a half-spam from one of the marketing sites that I signed up for long time ago. I clicked on the link, which led me to a landing page which required yet another click to get to the real thing (link). (Now, you wonder why marketers keep putting things in your inbox!)
The article was surprisingly sane. The author, Carrie Hill, suggests that the first thing to do is to ask "who cares?" This is the top corner of my Trifecta Checkup, asking what's the point of the chart. Some of us not so secretly hope that answer to "who cares?" is no one.
Carrie then lists a number of resources for creating infographics "quickly and cheaply".
Easel.ly caught my eye. This website offers templates for creating infographics. You want time-series data depicted as a long, hard road ahead, you have this on the right.
You want several sections of multi-colored bubble charts, you have this theme:
In total, they have 15 ready-made templates that you can use to make infographics. I assume paid customers will have more.
infogr.am is another site with similar capabilities, and apparently for those with some data in hand.
Based on this evidence, the avanlanche of infographics is not about to pass. In fact, we are going to see the same styles repetitively. It's like looking at someone's Powerpoint presentation and realizing that they are using the "Advantage" theme (one of the less ugly themes loaded by default). In the same way, we will have a long, winding road of civil rights, and a long, winding road of Argentina's economy, and a long, winding road of Moore's Law, etc.
But I have long been an advocate of drag-and-drop style interfaces for producing statistical charts. So I hope the vendors out there learn from these websites and make your products ten times better so that it is as "quick and cheap" to make nice statistical charts as it is to make infographics.
Reader Sushil B. offers this chart from Business Week on hedge fund returns. (link)
Unmoored bubbles, slanted text, positive and negative returns undifferentiated, bubble within bubble, paired data scattered apart, and it's not even that attractive.
Here is a Bumps-chart style version of this data:
The author never explained how the five funds were chosen so it's hard to know what's the point of the chart. It appears like Harbinger Capital Partners had a similar experience as Paulson. In addition, given the potentially huge gyrations from year to year, it's very odd that we are not shown the annual returns between 2007 and 2011... we can't be sure that some of the three other funds suffered a particularly bad year in between the end points shown here.
Ryan McCarthy linked to a post by Ruchir Sharma running on Ezra Klein's blog analyzing global billionaires.
It has an accompanying chart, which fails our self-sufficiency test. That test involves erasing raw data from a chart, and figuring out how much information the graphical elements themselves convey.
The primary metric used by Sharma is the billionares' total net worth as a percentage of the country's GDP. This metric is embedded in double concentric circles. Unfortunately, without mental gymnastics, readers can't tell what the proportion is. This means we must look at the raw data which is supplied as a column on the right of the graphic. If readers are taking the information from the column of raw data, then why draw a chart?
The actual data is revealed on the left . Don't tell anyone you read it here but pie charts would work well with this dataset. You might complain that there is a conceptual problem - that if we sum up the net worth of everyone in a country, it would not equal GDP. I think the sum doesn't work - economists can chime in about this. Sharma seems to imply that the total would sum to 1. Anyone's net worth is accumulated over a number of years in which the GDP is fluctuating while the total GDP is given for a specific end of quarter of some year so does it make sense to divide one by the other?
Also, the fact that some people may have negative net worth creates problems with the pie-chart format and it's not much better in a concentric-circle format either.
*** A maddening decision puts the United States, which is the biggest circle, at the bottom of the chart. Notice that the countries are sorted from larger billionaires' share to smaller. The U.S. belongs to the top 5 nations with the worst inequality by this metric and yet a cheeky little bookmark sends us to the bottom of the list together with the more-equal nations.
Not only is the location of U.S. privileged, the location of the text, the number of decimal places given in the net worth amount, and the presence of the GDP value all set the U.S. apart from the other countries plotted.
The most interesting piece of information is waiting to be reconstructed. In Malaysia, nine citizens own as much as 18.3% of the country's GDP. In Mexico, 11 people own 10.9% of the country's GDP.
To make the number even more telling, we have to incorporate the population size. For Malaysia it is 28 million. This means that the top 0.000032% of the population owns 18.3%. In the case of perfect equality, this proportion would own 0.000032%. We can say the inequality index is 570,000. In Mexico, the index is 1.1 million. So in fact, the concentration of wealth at the time is worse in Mexico than in Malaysia. For reference, the U.S. comes in at 78,000.
Of course, the use of billionaires as a filtering device to determine who to count or not is completely arbitrary. In measuring income inequality, one should look at what proportion of the population control 50% of the wealth, for example.
There is no explanation for the choice of countries. The U.S. is the only developed nation in the entire chart.
NYC mayor Michael Bloomberg is getting mixed reviews for his proposal to ban super-sized sugary drinks. Reader John O. wasn't impressed with this graphical effort (link):
The key problem: this picture is not scary at all. The reason it's not horrifying is that there is no context. People who have knowledge about healthy eating habits will get the message but that's preaching to the choir.
If you know that the recommended consumption of daily sugars for adults is roughly 20-36 grams, then you can see that one sugary drink of 12 ounces or higher would take you over the daily limit. A 64-ounce drink would give you more than 7 times what you need in a day. That's a powerful message but you won't know it from this chart. Not from the sugar cubes doubling as shadows, which is a cute, creative concept.
Also, make use of the chart-title real estate! Instead of "Sugar & Calories per Fountain Drink", say something memorable. "Fountain drinks make you fat and sick".
There is something else fishy about this graphic. What are the most prominent data being displayed?
You got it. They're 7, 12, 16, 32, 64. Where have we seen this type of data display?
Yup. This format is lifted from a menu in a Starbucks or a McDonald's (without prices).
Is this a health warning? Or a restaurant menu?
Also slightly confused about the slightly non-linear relationship between calories and drink size. Maybe volume of ice is held constant...
It is in fact a proportional relationship. The confusion arises from the non-linear increase in cup size from 7 to 64 ounces. The math is roughly 11 calories per ounce, and 3g of sugar per ounce. I wonder if it is better to show those two numbers instead of the ten not-very-memorable numbers shown on the chart itself.
In case you're wondering, the heights (thus areas) of the cups have no relationship with any of the data, not calories, not sugars, and not the cup size.
PS. John also wrote: "The soda cup graph reminds me of the chart from Pravda that Tufte cites in 'Cognitive Style of Powerpoint'. " If you know what he's talking about, please post a link to the chart. Thanks.
Reader Joe DiNoto sent me to the following National Post (Canada) chart via Twitter, complaining about the circles. (The full chart is found here.)
This chart is supposed to show that the students in Quebec are wrong to go on strike against a roughly 10% increase in tuition fees because the cost of education in Quebec is dwarfed by those in other provinces. This particular message is visible by virtue of the small amount of space occupied by the Quebec "flower" relative to other provinces.
However, to convey that message would require only a chart of the average tuition of the seven provinces. The dataset here contains a lot more information than just the average: it has the tuition by major. But, does the general pattern of relative tuitions apply to individual majors? This chart type (a disguised bubble chart) does the reader few favors. (At least, the designer managed to keep each "petal" at the same angles; otherwise it would make our lives even harder.)
In order to bring out the tuition by major comparison, the following set of dot plots helps:
The purple dots are Quebec tuitions. The gray dots are the remaining provinces. We find that Quebec is at the bottom of the cost scale for every major. We also learn that the variance of tuition for dentistry, medicine, and law is very high. Surprisingly, the business degree is rather cheap - maybe the demand for it up north is lower?
While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link).
How embarrassing is the cost effectiveness of U.S. health care spending?
When a chart is executed well, no further words are necessary.
I'd only add that the other countries depicted are "wealthy nations".
Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. An important point to note is that the U.S. started out in 1970 similar to the other nations.
Let's appreciate this beauty:
Let the data speak for itself. Time goes from bottom left to upper right. As more money is spent, life expectancy goes up. However, the slope of the line is much smaller for the US than the other countries. There is no need to add colors, data labels, interactivity, animation, etc.
Recognize what's important, what's not. The US line is in a different color, much thicker and properly made the foreground of the chart.
Rather than clutter up the chart, the other 19 lines are anonymized. They all have the same color and thickness, and all given one aggregate label. This is an example of overcoming loss aversion (see this post for more): it is ok to suppress some of the data.
The axis labeling is superb. Tufte preaches this clean style. There is no need to use regularly-spaced axis labels... use data-informed labels. Unfortunately, software is way behind on this issue. You can do this in R but that's about it.
@TheChadd submitted the following chart via Twitter.
I don't know if "fun fairs" mean the same thing to me as to you but that's where I got introduced to spinning wheel games. You stand 10 feet away from a multi-colored pie chart, you are supposed to throw darts (or other objects) at the circle, you win gigantic teddy bears if you hit the narrow wedge and maybe a sweet if you hit the big wedge.
To add to the fun, the pie chart is made to spin around slowly.
Well, we are at the fun fair and here is the spinning pie chart:
My friend Augustine sends me to this press release by Kantar Research, via PaidContent (link).
This article expresses alarm that advertisers have cut their spending on online advertising in Q4 of 2011, especially on search and display advertising. An important person is quoted as saying that a shift to mobile ads explains this phenomenon.
Throughout this piece, it's hard to keep track of whether the growth rate is full year 2010 v. full year 2011, or Q4 2010 v. Q4 2011, or Q3 2011 v. Q4 2011. Based on the data table attached to the end, I think they use the first two metrics although the sentence "paid search fell in the 4th quarter by 1 percent" is often interpreted as falling 1 percent from Q3 to Q4.
The labeling on the following chart doesn't help:
Comparing Q4 2010 to Q4 2011 (and Q4 2009 to Q4 2010) is one way to do a crude seasonal adjustment, and I'm assuming that's what they did. If so, then each rate can be considered an annual growth rate for a particular quarter and the following chart would bring out the dramatic decline in a much clearer manner:
Instead of starting another debate about line charts versus bar charts, I show them both, but continue to recommend the line chart.
In the original chart, either the data labels or the scaffolding (the vertical axis and gridlines) should be removed. If the data set is entirely printed on the chart, the designer expresses no confidence in the graphical elements.
The curiousity in this press release is the absence of mobile ad data. Apparently the key message of the article is not supported by the data set, which makes this a case of "story time". (I write about story time in the sister blog.)