Phil, over at the Gelman blog, nominates this jaw-dropping graphic as the worst of the year. I have to agree:
Should we complain about the "pie chart"/4 quadrants representation with no reference to the underlying data? Or the "pie within a pie within a pie" invention, again defiantly not to scale? Or the creative liense to exaggerate the smallest numbers in the chart ($2 billion, $0.3 billion) making it disproportionate with the other pieces? Or the complete usurping of proportions (e.g. the $0.2 billion green strip on the top right quadrant compared to the $0.3 billion tiny blue arc on the top left quadrant)?
Or the random sprinkling of labels and numbers around the circle even though if one takes the time, one notices that the entire chart contains only 8 numbers, as follows:
Instead, we can display the data with a small multiples layout showing readers how the data is structured along two dimensions.
Or a profile chart may also work:
My coworker pointed me to a Huffington Post article claiming a Bill Gates byline that contains some highly dubious analysis and a horrific chart. We presume Gates was fed this information by some analysts but even so, one wishes he wouldn't promote innumeracy. But then, he has a history: Howard Wainer demolished analysis by his foundation used to channel lots of dollars to the "small schools" movement a few years ago; I wrote about that before.
First, the offensive chart:
Using double axes earns justified heckles but using two gridlines is a scandal! A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)
I can't understand the choice of scale for the score axis. The orange line, for instance, seems to have a positive slope. In any case, since these scores are "scaled", and the "standard error" is about 1 (this number is surprisingly hard to find, even on Google), it would appear that between 300 and 400 on the score axis, there are 100 units of standard error. By convention, three units of standard error away from the average is considered rare (events). There is no conceivable way that the average score could jump by that much.
The analysis is also flawed. Here's the key paragraph:
Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries... For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.
This argument contains several statistical fallacies:
In the same article, Gates asserts that quality of teaching is the greatest decisive factor explaining student achievement. Which study proves that we are not told. How one can measure such an intangible quantity as "excellent teaching" we are not told. How student achievement is defined, well, you guessed it, we are not told.
It's great that the Gates Foundation supports investment in education. Apparently they need some statistical expertise so that they don't waste more money on unproductive projects based on innumerate analyses.
Imagine that one knows nothing about tectonic theory and geography. Such a map would be quite illuminating.
The comments section contains an exchange between two Martins about the hazard of non-experts plotting and then commenting on data they know little about. In the past, this conversation would take place behind closed doors as the statistician works with the geologist, each learning from the other. In our modern age, it is common for both sides to take up arms in public with colorful language.
The lesson for us spectators is the importance of understanding how data is collected, how data is defined, how data is processed, etc.
The plots Martin made do give me a chance to talk about a few interesting statistical issues relating to plotting data.
The chart doesn't have a vertical scale so it is hard to judge whether the growth is meaningful or not. The US Geological Service doesn't seem to think this is a remarkable event, as the other Martin pointed out. One explanation is the increasing sensitivity of measuring equipment, which leads to more small-magnitude earthquakes being recorded over time.
Whether or not this hypothesis is true in the context of earthquakes, this is a very important phenomenon that occurs often. One of the mysteries in the epidemiology of autism is the recent rise in the number of cases but this is made complicated by the much higher probability of diagnosis in recent times. Similarly, as the stigma against reporting rapes, harassment and other crimes dissipate, more such criminal reports will be filed, and how does one distinguish between higher reporting and higher incidence, both of which lead to higher counts?
This same phenomenon happened during the Toyota brake scare last year. It appeared as if the problem was getting worse while the scandal brewed. But as the awareness of the potential risk increases, so too will the probability of someone reporting an issue.
One other feature of the bar chart worth noting is the faux plunge in the number of earthquakes in 2011. My recommendation is either to omit the incomplete year, or to forecast the end-of-year count (labelling it clearly as a forecast).
Finally, for data relating to rare events like earthquakes, one should try to take as long a view as possible. Twenty years is good but a hundred years would be better.
Next, Martin created this histogram that plots the number of earthquakes of different magnitudes over the last two decades or so.
As Martin noted, Richter scale is a log scale, meaning a 1-unit increase on the horizontal direction is really a 10 fold increase.
Martin's point is to show how strong the earthquake in Japan was compared to historical earthquakes. This is a useful and interesting question to ask, and his choice of graphics cannot be faulted.
What the other Martin was complaining about is the third corner of our Trifecta checkup: whether the right data has been used. Martin used what he was able to find. The histogram in fact looked quite nice, was rather symmetric, and suggested a tractable mathematical model. But if we proceed along that path, we would have chosen the wrong model.
The trap is a missing data problem that will not be evident to non-experts. As discussed above, small-magnitude quakes are not being tracked. Thus, the left side of the histogram severely under-represents the true frequency of such small tremors.
There is a Guthenberg-Richter law relating earthquake magnitude and frequency. It's a power law. Roughly speaking, for each 1 unit decrease in Richter magnitude, earthquakes of that magnitude occur about one-tenth as frequently. This law fits the actual data really well when Richter magnitude is high but at lower magnitudes, geologists believe that our records are severely under-counted.
In the chart shown on the right, which comes from this paper (PDF link) about predicting earthquakes, the dots sit on the line at higher magnitudes but the observed frequencies fall markedly below the line (projection) for magnitude lower than 2. It suggests that the bars on the left side of the histogram above should have been much taller than we are seeing. For a proper power law, the bars on the left should be the tallest of all!
Guess what the designer at Nielsen wanted to tell you with this chart:
Maybe those are the messages; if so, there is no need to present a bivariate plot (the so-called "mosaic" plot, or in consulting circles, the Marimekko). Having two charts carrying one message each would accomplish the job cleanly.
The two columns, counting from the right, contain rectangles that appear to be of different sizes, and yet the data labels claim each piece represents 1%, and in some cases "< 1%". The simultaneous manipulation of both the height and the width plays mind tricks.
Also, while one would ordinarily applaud the dropping of decimals from a chart like this, doing so actually creates the ugly problem that the five pieces of 1% (on the left column shown here) have the same width but clearly varying heights!
What about this section of the plot shown on the left? Does the smaller green box look like it's less than 1/3 the size of the longer green box? This chart is clearly not self-sufficient, and as such one might prefer a simple data table.
The downfall of the mosaic plot is that it gives the illusion of having two dimensions but only an illusion: in fact, the chart is dominated by one dimension, as all proportions are relative to the grand total.
For instance, the chart says that 6% of all smartphone users are between the ages of 18 and 24 AND uses an Android phone. It also tells us that 2% of all smartphone users are between 35 and 44 AND uses a Palm phone. Those are not two numbers anyone would desire to compare. There are hardly any practical questions that require comparing them.
Sometimes, the best way to handle two dimensions is not to use two dimensions.
The original article notes that "Of the three most popular smartphone operating systems, Android seems to attract more young consumers." In the chart shown below, we assume that the business question is the relative popularity of phone operating systems across age groups.
The right metric for comparison is the market share of each OS within an age group.
For example, tracing the black line labeled "Android", this chart tells us that Android has 37% of the 18-24 market while it has about 20% of the 65 and up market.
Android has an overall market share of about 30%, and that average obscures a youth bias that is linear with age.
On the other hand, the iPhone (green line) has also an average market share of about 30% but its profile is pretty flat in all age groups except 65 and up where it has considerable strength.
Further, the gap between Android and iPhone at the older age group actually opens up at 55 years and up. In the 55-64 age group, the iPhone holds a market share that is similar to its overall average while the Android performs quite a bit worse than its average. We note that Palm OS has some strength in the older age groups as well while the Blackberry also significantly underperforms in 65 and over.
Why aren't all these insights visible in the mosaic chart? It all because the chosen denominator of the entire market (as opposed to each age group) makes a lot of segments very small, and then the differences between small segments become invisible when placed beside much larger segments.
Now, the reconstituted chart gives no information about the relative sizes of the age groups. The market size for the older groups is quite a bit smaller than the younger groups. This information should be provided in a separate chart, or as a little histogram tucked under the age-group axis.
This chart highlighted by the Economix blog at the New York Times caught a bit of attention.
Catherine Rampell wrote "awesome chart" on the margin of Branko Milanovic's book which first published this, also conceding that it is a chart that can "take a few minutes" to understand, "but trust me, it's worth it".
The question for me is: is the reward worth the effort?
The answer is no. This chart does not address an interesting question, and it tempts readers to infer things that the chart doesn't say.
The message of this data is that there are rich people (by world standard) in poor countries. For me, this isn't very interesting but I can understand if others find it shocking, edifying or even satisfying.
I'd point you to a different visualization, done by the now-famous Hans Rosling, years ago (I discussed his team's work here). If he used lines instead of areas for the distributions, the chart would be even better.
I much prefer this chart.
Comparing the two also surfaces another difference. The four countries chosen in the Milanovic chart are highly selective. (I snicker at the title which announces "Inequality in the world".) It's comparing the U.S. against three developing nations with high income inequality. What about showing us also a few lines of nations with lower inequality, like Scandinavian countries?
Rampell's conclusions, in particular, are not well supported by her beloved chart. First, she said:
All people born in rich countries thus receive a location premium or a location rent; all those born in poor countries get a location penalty.
I don't disagree that it's better to be born with money. But my takeaway from the chart is the opposite of hers: that you can't generalize entire countries; that you can live in a poor country, and you can be extremely rich. As Rampell pointed out earlier in the article, "[Brazil] this one country covers a very broad span of income groups". So, if anything, the chart undermines the point that "all people" in any one country receive a location premium/rent.
She also said the following:
How can there be so many people in the world who make less than America's poorest, many of whom make nothing each year? Remember that we're looking at the entire bottom chunk of Americans, some of whom make as much as $6700; that may be extremely poor by American standards, but that amounts to a relatively good standard of living in India, where about a quarter of the population lives on $1 a day.
Given that the data has been adjusted for PPP, or in Rampell's words, different costs of living around the world, or really, it has been adjusted for different standards of living, it makes little sense to explain a difference in the adjusted amounts based on "standard of living". In fact, my understanding (unless something changed recently) is that the PPP adjustment uses the US living standard as the reference level.
The $6700 that she describes as the maximum income of the bottom chunk of Americans--if this amount is earned by the Indian, would put him/her in the very top bucket of Indians, according to the Milanovic chart. I'd call that a super high standard of living in India, not merely "relatively good".
A few comments on the statistics.
The last quotation above shows a confusion between averages and extreme values. The $6,700 is the maximum income of the bottom chunk of Americans; it cannot be compared to the $1 a day, which by the way, should be written as $365 a year, but in any case, this amount is the average income of the bottom 25% of Indians. One can't compare an average to a maximum, nor an annual number to a daily number.
A number of readers conclude from the chart that the income inequality problem in the U.S. is overblown. You just can't see it on that chart. That's because the chart literally hides this information. As we know, the top 20% of the U.S. population holds 84% of the wealth, and it gets worse with the top 1%, top 0.1%, etc. The precision of the horizontal axis of the chart is the "ventile", which are 5% buckets.
Also, notice that this type of chart is used to compare one distribution against another distribution. The notion of currencies has been entirely removed. It's similar to converting data from absolute units to rankings. You lose that sense of scale. (This is the reason why it appears as if no one in India makes more than anyone in the States. If a finer scale were to be used, at the upper end of the Indian income distribution, I'm sure you find otherwise.)
Here is a close-up of California:
Anytime someone expands the possibilities of a chart type, like the word cloud, it's a commendable project. So I'm quite enthusiastic about what they tried to do here. Not every new feature is successful, though.
These are the things I like:
These are things I don't like:
So, I think they did a reasonable job in rethinking the possibilities of word clouds. It's well intentioned and there is room for improvement.
Lastly, they might get some ideas from the Baby Names navigator.
Here is one of my several suggestions for word-cloud design: encode the data in the area enclosed by each word, not the length of each word.
Every word cloud out there contains a distortion because readers are sizing up the areas, not the lengths of words. The extent of the distortion is illustrated in this example:
The word "promise" is about 3.5 times the size of "McCain" but the ratio of frequency of occurrence is only 1.6 times.
Reader Brian R. could not believe the Atlantic magazine would print a pile of chartjunk like this, and neither do we.
Pretty much every chart deserves its own entry, and they all fail our self-sufficiency test: when the actual data is removed from each chart, the failure is exposed, as one realizes that the graphical constructs do not add to the readers’ experience, and frequently subtract from it.
We'll focus on three examples where they tried to innovate, badly. The data has been stripped from each chart.
The chart shown on the right compares the amount of time spent reading by 15 to 19 year olds in 2007 and in 2009. We definitely see the severe drop in time spent but how many times higher were the average minutes in 2007?
(Amusingly, these books have 13 lines per page, not 12 lines, not 10 lines, not 15 lines.)
The next chart is similar, but comparing the minutes spent playing games. It’s a pie chart! Did our kids spend 100% of their weekend days in 2009 playing games?
No, it can’t be a pie chart. The caption said “average minutes”, not a proportion of a total; it’s a clock. Is it a 60-minute clock? But it’s a weekend day so maybe it’s a 24-hour clock. That can’t be, since the kids won’t be spending every hour of each weekend day playing games, would they? They do need to sleep, don’t they?
So we cheat and look at the data. Average minutes in 2007: 46.8 minutes; in 2009, 61.2 minutes. Oh, it’s a malfunctioning clock. In the 2007 version, it’s about a minute too fast, and in the 2009 version, it’s a minute too slow. But who can blame the 2009 clock? You can’t show 61.2 minutes in a 60-minute clock.
With just two pieces of data, it's often the case that graphics are superfluous. Even if "entertainment" is desired, one ought to keep that from obscuring the data. Perhaps like this:
OK, just one more. Not surprisingly, US book sales are shown using stacks of books except that the data was not encoded in the height of the stacks, the thickness of the books, the number of books, or other usual suspects. The data is embedded into the width of a page plus the thickness of a book, assuming every book is identical in design.
Since the data is given, we can use a little bit of algebra to figure out how many units are represented by the long side (L) and the short side (S) respectively:
What this means is that the difference shown in the picture of one long line is vastly exaggerated; the same difference in units would have been equivalent to one-third of the short line.
Other problems noticed by Brian:
Use of what looks like a Gaussian distribution instead of a bar.
The piggy bank graphic that distorts the saving rate.
The redundancy of pie charts next to simple percentages.
Also the presentation of statistics without any apparent relationship between the theme being presented. For example, what does the increase in 3-D movies being produced have to do with the recession?
My guess more 3-D movies is more due to technology advances and implementation than recession economics.
Information graphics is one of many terms used to describe charts showing data -- and a very ambitious one at that. It promises the delivery of "information". Too often, readers are disappointed, sometimes because the "information" cannot be found on the chart, and sometimes because the "information" is resolutely hidden behind thickets.
Statistical techniques are useful to expose the hidden information. They work by getting rid of the extraneous or misleading bits of data, and by accentuating the most informative parts. A statistical graphic distinguishes itself by not showing all the raw data.
Here is the Guardian's take on the OECD PISA scores that were released recently. (Perhaps some of you are playing around with this data, which I featured in the Open Call... alas, no takers so far.) I only excerpted the top part of the chart.
This graphic is not bad, could have been much worse, and I'm sure there are much worse out there.
But think about this for a moment: what question did the designer hope to address with this chart? The headline says comparing UK against other OECD countries, which is a simple objective that does not justify such a complex chart.
The most noticeable feature are the line segments showing the correlation of ranks among the three subject areas within each country. So, South Korea is ranked first in reading and math, and third in science. Equally prominent is the rank of countries shown on the left-hand-side of the chart (which, on inspection, shows the ranking of reading scores); this ranking also determines the colors used, another eye-catching part of this chart. (The thick black UK line is, of course, important also.)
In my opinion, those are not the three or four most interesting questions about this data set. In such a rich data set, there could be dozens of interesting questions. I'm not arguing that we have to agree on which ones are the most prominent. I'm saying the designer should be clear in his or her own mind what questions are being answered -- prior to digging around the data.
With that in mind, I decided that a popular question concerns the comparison of scores between any pair of countries. From there, I worked on how to simplify the data to bring out the "information". Specifically, I used a little statistics to classify countries into 7 groups; countries within each group are judged to have performed equally well in the test and any difference could be considered statistical noise. (I will discuss how I put countries into these groups in a future post, just focusing on the chart here.)
Here is the result: (PS. Just realized the axis should be labelled "PISA Reading Score Differentials from the Reference Country Group" as they show pairwise differences, not scores.)
Each row uses one of the country groups as the reference level. For example, the first row shows that Finland and South Korea, the two best performing countries, did significantly better than all other country groups, except those in A2. The relative distance of each set of countries from the reference level is meaningful, and gives information about how much worse they did.
(The standard error seems to be about 3-6 based on some table I found on the web, which may or may not be correct. This value leads to very high standardized score differentials, indicating that the spread between countries are very wide.
I have done this for the reading test only. The test scores were standardized, which is not necessary if we are only concerned about the reading test. But since I was also looking at correlations between the three subjects, I chose to standardize the scores, which is another way of saying putting them on an identical scale.)
Before settling on the above chart, I produced this version:
This post is getting too long so I'll be brief on this next point. You may wonder whether having all 7 rows is redundant. The reason why they are all there is that the pairwise differences lack "transitivity": e.g., the difference between Finland and UK is not the difference between Finland and Sweden plus the difference between Sweden and the UK. The right way to read it is to cling to the reference country group, and only look at the differences between the reference group and each of the other groups. The differences between two country groups neither of which is a reference group should be ignored in this chart: instead look up the two rows for which those countries are a reference group.
Before that, I tried a more typical network graph. It looks "sophisticated" and is much more compact but it contains less information than the previous chart, and gets murkier as the number of entities increases. Readers have to work hard to dig out the interesting bits.