« March 2011 | Main | May 2011 »

Worst statistical graphic nominated

Phil, over at the Gelman blog, nominates this jaw-dropping graphic as the worst of the year. I have to agree:

GR_GraficFIN-web

Should we complain about the "pie chart"/4 quadrants representation with no reference to the underlying data? Or the "pie within a pie within a pie" invention, again defiantly not to scale? Or the creative liense to exaggerate the smallest numbers in the chart ($2 billion, $0.3 billion) making it disproportionate with the other pieces? Or the complete usurping of proportions (e.g. the $0.2 billion green strip on the top right quadrant compared to the $0.3 billion tiny blue arc on the top left quadrant)?

Or the random sprinkling of labels and numbers around the circle even though if one takes the time, one notices that the entire chart contains only 8 numbers, as follows:

Energysub_data

***

Instead, we can display the data with a small multiples layout showing readers how the data is structured along two dimensions.

Redo_energysub1

Or a profile chart may also work:

Redo_energysub2

 

 


Light entertainment: a self-referencing chart

Reader Jeannie C. succeeded in coaxing me to put up an infographics poster. It's predictable that such a thing would eventually arise: an infographics about infographics...

Infographics_squared

If you click on Ivan's link, you will find the real "data" behind this infographics.

Let's do spot the "errors".

I'll go for the easy one. 22+24+24+32 = 102. Typically, one could call this rounding error. But it has always mystified me why the designer does not "round" down the individual numbers such that the total adds to 100%. In either case, we have to accept a small amount of imprecision. (For example, 23+23+23+31 = 100).

Your turn next...

 

PS. I'm reading this chart as a parody on the infographics genre.

Ivan pointed out that my contribution above is illegitimate since there weren't any rounding. So, my turn again: the yellow bars are not to scale, China is not 4 times the length of U.S.


Gelman joins in the fun

The great Andrew Gelman did a Junk Charts style post today, and very well indeed.

The offending Economist plot is the donut chart, which is a favorite of that magazine.  I commented on this type of chart before.

Econ_timespent

Andrew created two alternatives, one is a line chart (profile chart) which is often a better option (despite the data being categorical), the other is more creative, and the better of the two.

Redo_timespent1

 

Redo_timespent2

Some of Gelman's readers complained that he arbitrarily "standardized" the data by indexing against the average of the countries depicted; one can further grumble that a 50% "excess" may sound impressive but it would be equivalent to less than an hour, perhaps not as startling. These types of complaints are fair but do realize that blog posts like these are primarily concerned with how data is best visualized. If one prefers a different indexing method, or a different set of countries, or a different color for the lines, etc., one can easily revise the chart to reflect those preferences.

The easiest way to see why the third chart is better than the first is that the strongest message coming off the first chart is that there are no material differences between these six countries in terms of time usage but in the third chart, the designer (here, it's Gelman) is asserting that there are interesting differences.


Another view of the Indian states

The previous post has elicited protests of "it's not that bad" from some corners. Well, it's bad. Let's look at it from another angle.

***

We start with the Economist chart, and ask what is the message.

Economist_indianratio The chart is saying that in 2006-8, there are 10 Indian states that have female-to-male-babies ratios below the world average (the so-called "natural ratio"). For those who know their Indian geography, the chart gives the names of these anomalous states. The chart also tells us among these 10 states, some have been gaining and others have been losing ground when compared to the 2001-3 period. There is no obvious pattern as to which states are gaining, and which losing.

That's pretty much everything that one can discern from this chart.

The problem is, the average Economist reader already knows that in India, as in China and many Asian countries, there are more male babies than female ones than in other parts of the world. If he or she doesn't know this fundamental statistic, the chart does not help because it says nothing about the other 24 states that make up the Indian average.

Worse, the chart raises the suspicion of voodoo statistics. It suggests that the other 24 states have a gender ratio that is at least equal to, if not above, the natural ratio. One would then have to believe that either the overall Indian average is higher than the natural ratio, or the negative deviations from the world average (as shown on the chart) are quite a bit larger than the positive deviations (not shown), or that the states with positive deviations (not shown) are generally less populous than the ones shown.

Either of the last two conclusions, if true, would be interesting because it implies that the cultural norms, typically claimed to explain this anomaly, are entrenched only within certain geographies. Then, it is inappropriate to speak of India's sex ratio, given this variability between states.

***

As I pointed out in the prior post, with two data series (two observation dates of the same statistic) at their disposal, the Economist chart focuses on the more recent data. This self-imposed restriction obscures meaningful differences between states over time.

Redo_indianratio The junkcharts version shows that the current 11 "worst" states could be clustered into two groups: the first group (black lines) has gained ground over the last decade, while the second group (gray) has stagnated, and in some cases, lost ground.

What's more, we learn that every one of the states in the gray group is ranked higher than those in the black group at the start of the decade.

Further, while the distance between the black and gray groups have narrowed over the decade, the gray group, despite the slight decline, is still ranked above the black grup except for Kerala, which has seen dramatic improvement.

Those who know their Indian geography might have further insights as to why the states cluster in this way.

In my view, these findings are much more interesting than the things one can learn from the original chart.

 

 

 


A skewed view of ten Indian states

Economist_indiasexratio The Economist published this chart to illustrate the problem of the "missing girls" in Indian society.

The girls-to-boys ratio (ages 0-6) should be about 952 but in India, it is 914. That's an average number for 35 territories, and the most skewed ratio was 830 in Punjab.

Curiously, the Economist chose to focus on only 11 states instead of showing all 35. The first 10 of these had sex ratio below the natural number of 952 while the last one was over the average. Nowhere on the chart or in the article is it explained whether the unmentioned 24 states all had above-average sex ratios: unlikely, unless certain states have much higher youth population than others.

In fact, the reference line of 952 is misplaced. Readers will find that there are two metrics depending on which survey one is looking at, either sex ratio at birth or sex ratio for children aged 0-6. The natural ratio of 952 is for the 0-6 measure but the data by territory are all for the at-birth measure. Instead, the dotted red line needs to be at 904, which is the national average sex ratio at birth for India for the 2006-8 period.

***


The lethal error in this chart is not starting the horizontal axis at zero. 
Redo_indiangirls1 By cutting off the same amount from each bar, this chart messes up the ratio of lengths, and presents a misleading picture of the relative sex ratio between territories.  We may think Punjab's sex ratio is half that of Gujarat (in the original chart) but as the chart on the right shows, that is far from the truth!

***

The other unfortunate practice, typical of the Economist, is to stick a second set of data on the right of the chart as an afterthought. In fact, that data representing the change in the sex ratio over time is more interesting than what the exact sex ratio was in each territory in 2006-8.

A much better way to present the data, without favoring one series or another, is the Bumps chart, as shown below. We can clearly see that the improvement in sex ratio is concentrated on those states that started out the decade in a worse shape.

Redo_indiangirls2

 


Bill Gates should hire a statistical advisor

My coworker pointed me to a Huffington Post article claiming a Bill Gates byline that contains some highly dubious analysis and a horrific chart. We presume Gates was fed this information by some analysts but even so, one wishes he wouldn't promote innumeracy. But then, he has a history: Howard Wainer demolished analysis by his foundation used to channel lots of dollars to the "small schools" movement a few years ago; I wrote about that before.

***

First, the offensive chart:

Gates_studentspend

 Using double axes earns justified heckles but using two gridlines is a scandal!  A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)

I can't understand the choice of scale for the score axis. The orange line, for instance, seems to have a positive slope. In any case, since these scores are "scaled", and the "standard error" is about 1 (this number is surprisingly hard to find, even on Google), it would appear that between 300 and 400 on the score axis, there are 100 units of standard error. By convention, three units of standard error away from the average is considered rare (events). There is no conceivable way that the average score could jump by that much.

***

 The analysis is also flawed. Here's the key paragraph:

Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries... For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.

This argument contains several statistical fallacies:

  • Comparing apples and oranges: a glaring piece of missing information is whether other countries have increased their per-student spending on education, and if so, how fast the growth is compared to that in the U.S. Without this, the analysis makes no sense.
  • Confusing correlation and causation: so spending increased while test scores stagnated.  In order to conclude that there is something wrong with the spending, one must first believe that spending has a causal effect on test scores. Observe that this is not a conclusion from the data; it is an assumption going into the analysis, neither supported nor disputed by the data since the data merely show a (lack of) correlation. This is another instance of "story time": we see data, we see conclusion, we are misled into thinking that data supports conclusion but in fact, the data is an irrelevant distraction. (For other instances of "story time", see this link to my book blog.)
  • Fallacy #1 and fallacy#2 combined: even if you believe that spending affects test scores, it is still a stretch to say that spending in U.S. schools affects the gap in test scores between U.S. students and foreign students. In the world where foreign countries are frozen in time, maybe so but where foreign countries are investing in education, one can't say anything about the test score gap without first knowing what's going on overseas.
  • Assumption invalidating the analysis: In a short breath, the analyst admits the possibility of (a) spending increase together with flat scores and (b) score increase together with flat spending. One model under which both of those possibilities coexist is one in which test scores are independent of spending. If so, why would one even look at a plot of these two quantities?
  • The dilemma of being together (a la Chapter 3 of Numbers Rule Your World): sorry to say but the spending on pupils is likely to have a highly skewed distribution depending on school district. Also, the average test scores is likely to have high variability across school districts. Thus, using an average for the entire country muddies the water.
  • Needless to say, test scores are a poor measure of the quality of education, especially in light of the frequent discovery of large-scale coordinated cheating by principals and teachers driven by perverse incentives of the high-stakes testing movement.

 In the same article, Gates asserts that quality of teaching is the greatest decisive factor explaining student achievement. Which study proves that we are not told. How one can measure such an intangible quantity as "excellent teaching" we are not told. How student achievement is defined, well, you guessed it, we are not told.

It's great that the Gates Foundation supports investment in education. Apparently they need some statistical expertise so that they don't waste more money on unproductive projects based on innumerate analyses.

 


The hazard of casual analysis of hazards

Martin's quick analysis of the Japanese earthquake relative to other earthquakes produced a number of intriguing charts, including the following map which illustrates the "ring of fire" (Wiki link):

Martin-Map-earthquakes

Imagine that one knows nothing about tectonic theory and geography. Such a map would be quite illuminating.

***

The comments section contains an exchange between two Martins about the hazard of non-experts plotting and then commenting on data they know little about.  In the past, this conversation would take place behind closed doors as the statistician works with the geologist, each learning from the other. In our modern age, it is common for both sides to take up arms in public with colorful language.

The lesson for us spectators is the importance of understanding how data is collected, how data is defined, how data is processed, etc.

***

The plots Martin made do give me a chance to talk about a few interesting statistical issues relating to plotting data.

First, this bar chart of the number of earthquakes over time seems to indicate that earthquake activity has been increasing over time:

Martin-bar-earthquake

The chart doesn't have a vertical scale so it is hard to judge whether the growth is meaningful or not. The US Geological Service doesn't seem to think this is a remarkable event, as the other Martin pointed out. One explanation is the increasing sensitivity of measuring equipment, which leads to more small-magnitude earthquakes being recorded over time.

Whether or not this hypothesis is true in the context of earthquakes, this is a very important phenomenon that occurs often. One of the mysteries in the epidemiology of autism is the recent rise in the number of cases but this is made complicated by the much higher probability of diagnosis in recent times. Similarly, as the stigma against reporting rapes, harassment and other crimes dissipate, more such criminal reports will be filed, and how does one distinguish between higher reporting and higher incidence, both of which lead to higher counts?

This same phenomenon happened during the Toyota brake scare last year. It appeared as if the problem was getting worse while the scandal brewed. But as the awareness of the potential risk increases, so too will the probability of someone reporting an issue.

One other feature of the bar chart worth noting is the faux plunge in the number of earthquakes in 2011. My recommendation is either to omit the incomplete year, or to forecast the end-of-year count (labelling it clearly as a forecast).

Finally, for data relating to rare events like earthquakes, one should try to take as long a view as possible. Twenty years is good but a hundred years would be better.

***

Next, Martin created this histogram that plots the number of earthquakes of different magnitudes over the last two decades or so.

Martin-hist-earthquake

As Martin noted, Richter scale is a log scale, meaning a 1-unit increase on the horizontal direction is really a 10 fold increase.

Martin's point is to show how strong the earthquake in Japan was compared to historical earthquakes. This is a useful and interesting question to ask, and his choice of graphics cannot be faulted.

What the other Martin was complaining about is the third corner of our Trifecta checkup: whether the right data has been used. Martin used what he was able to find. The histogram in fact looked quite nice, was rather symmetric, and suggested a tractable mathematical model. But if we proceed along that path, we would have chosen the wrong model.

The trap is a missing data problem that will not be evident to non-experts. As discussed above, small-magnitude quakes are not being tracked. Thus, the left side of the histogram severely under-represents the true frequency of such small tremors.

There is a Guthenberg-Richter law relating earthquake magnitude and frequency. It's a power law. Roughly speaking, for each 1 unit decrease in Richter magnitude, earthquakes of that magnitude occur about one-tenth as  frequently. This law fits the actual data really well when Richter magnitude is high but at lower magnitudes, geologists believe that our records are severely under-counted.

In the chart shown on the right, which Earthquake_fitcomes from this paper (PDF link) about predicting earthquakes, the dots sit on the line at higher magnitudes but the observed frequencies fall markedly below the line (projection) for magnitude lower than 2. It suggests that the bars on the left side of the histogram above should have been much taller than we are seeing. For a proper power law, the bars on the left should be the tallest of all!

 

 

 

 


Too much art, not enough science in infographics

Note: I contributed the following post to Statistics Forum, which is a new blog sponsored by the American Statistical Association (ASA), curated by Andrew Gelman. 

A reader from Twitter @meprieb suggested that I discuss a particular set of "infographics", one of which is shown below:

Psfk_Infographic3

This chart is unquestionable easy on the eyes, and engages our brain cells. The use of a real-world object to simulate a pie chart is cute and even ingenious. According to the description at PSFK, the data tell how "Danish people feel about publicly wearing religious symbols".

The key to reading this chart is to read it as an illustration, an art piece. The fact that this is described as "infographics" reveals a wide divide between the artists and the scientists who work in visualizing data. 

This chart fails completely as data graphics. The size of the pie quadrants has no relationship with the data at all, and the four percentages on the chart add up to much more than 100%, and obviously not proportions. The same problem plagues every one of these charts in the set.

Further reading: Andrew Gelman has recently made comments about the divide between the statistical graphics and infographics communities here