« Using data tables | Main | The nature of variation 1 »


Sean Devine

Great post. I sat through a presentation from Booz Allen last week on the globalization of the service sector of the world economy that had a number of charts that I'd love to see you comment on. Should I send you the charts?

Zuil Serip

Could the scale be "# of standard deviations away from the mean of the international reference"? I agree that whatever it is, it should be indicated explicitly.

While I also agree that a boxplot (or one of its many variants) can convey the same information more economically and precisely, you really have to take the intended audience in account.

I would guess that a substantial number of intelligent and otherwise well-educated readers are not familiar with the conventions of a box-plot. And even if they were, people don't have well formed intuitions of the relative magnitude of standard deviations unless they work with such concepts on a day-to-day basis. I believe non-statisticians would be much better able to understand and compare the relative distributions much more effectively by looking at the actual curves (I do agree that bell-curves are cliches, but they can be very useful in allowing non-specialists to understand distributions)

John S.

A couple of weeks ago I was making a presentation and wanted to show the effect of a certain rule change on the price of a commodity. The before and after data were both normal, and I made a box plot, including the whiskers and the outliers. It was a beautiful plot, and clearly showed how this rule change had not only lowered prices, but reduced the variance as well. Nevertheless, I was told to take it out of the presentation because "our clients would not understand it".

Patrick O'Shei

While you make some good comments about the misuse and abuse of bell curves. You are wrong when you say the heights do not matter.

Bell curves represent probabilty through the area under the curve. To compare area you need height.
The markings on the x-axis are std. dev. units for the INTL REF.

If you want to compare the probability of an Indian child being underweight to the International reference, you would compare how much area is under the blue (INTL) line to the comparable area under the red (INDIA)line. For SEVERE Underweight, you can see the area under the blue line in this region (left of -3.0) is miniscule while the area under the red line is about 40x as large. An Indian child is about 40x more likely to be severly underweight than the INT reference.

I would not expect the average person to be able to properly interpret the curves and that is a problem.


Patrick, that is precisely my point, which is that it is impossible for even trained people to visually compare areas under two curves with different widths and heights.

In addition to my original comments, another reason why curves do not solve the problem is that these curves stretch to infinity on both tails.

The reasons why a boxplot suffices are that (1) we are assuming normal distributions thereby fixing the areas under the curve as a function of standard deviation from the mean; and (2) we are assuming a statistically literate audience.

This also explains why John had trouble using the boxplot with a lay audience. I'd try to annotate the chart by pointing out that the middle 50% of the data is contained in the box (assuming the sides are the 25th and 75th percentiles).

Why on earth would you assume a statistically literate audience? Presumably the authors of this report on malnutrition wanted to reach a broad audience, not the 0.2% of the population who can read a boxplot.


Anon - the glib response is: most college educated people whether their degree was in engineering, economics, or psychology would have at least taken one statistics course, which qualifies them as statistically literate.

A more serious response is: as I outlined, for the uninitiatied, a boxplot with some text explaining how to read it is sufficient.

I also do not believe overlapping probability curves can be properly interpreted by statistical "illerates" either.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Good Books

Keep in Touch

follow me on Twitter