Seth on bar charts
Jul 18, 2008
Seth followed up his post about graphics with a specific post about pie charts versus bar charts. He prefers pie charts. We happen to agree with his unhappiness of grouped bar charts. Unfortunately he compared an univariate pie chart (depicting point-in-time data) with a multivariate bar chart (iluustrating time-series data).
Here we present a different example, derived from a NYT article on diabetes in America. The original chart is a series of pie charts, one for each age group, and one for the aggregate data.
The junkart version uses a bar chart. Readers can get a more precise comparison of the prevalence rates across age groups because it is easier to judge lengths than areas. This has been scientifically proven by the likes of Cleveland.
Dirty trick, you might say because the original chart actually prints the data in each pie.
So now there is no mistaking the data. This raises a philosophical question: why bother graphing the data if the reader needs to read the data in order to understand the chart? We call this the self-sufficiency test. The graphical elements of a pie chart can't stand on their own.
Reference: "Diabetes - underrated, insidious, and deadly", New York Times, July 18 2008.
I tend to dislike a bar chart that has data by category (e.g., age groupings) and a final bar for All Groups. It makes your barchart look like there's a peak age group for the prevalence of diabetes, and after that peak, prevalence decreases.
There are two ways to get around this illusion:
1. Color the All Groups bar differently.
2. Use a line that crosses all categories to show the All Groups value.
Posted by: Jon Peltier | Jul 18, 2008 at 08:26 AM
My concern is showing a bar chart that viewers could be confused into thinking that the percent within category actually is percent of total.
It would be nice to show the category rates per 1,000, along with total category population to get a sense of the total distribution.
Posted by: Robert | Jul 18, 2008 at 01:22 PM
This case is so simple the data should be in a table. What are you trying to show with the diagram? That 25 is greater than 8? Simple text is best for that.
Seth's complaints reiterate what we all agree with and that is to show the real story that the data supports explicitly and directly. Unfortunately this is difficult to do, especially with data that is not very interesting. The bar chart is probably the easiest and most abused way of presenting data (see sales presentations).
Good luck teaching the world on charting. We all need some help
Posted by: prices | Jul 18, 2008 at 01:50 PM
About the 2nd post of Seth: he could have "crammed in" the time line of data by showing a series of 3 fat stacked bar charts, 1 for each year: trolls still will jump out as the obvious category to go for, but the interested reader could spot the advance of the billy goats as well in the last year.
Posted by: Jan Schultink | Jul 22, 2008 at 01:32 PM
So your solution is to replace a long, thin, rectangular chart with a square chart that takes up three times as much space as the original? I fail to see how that's a design solution, or even an improvement.
Still, it is a good example of a recurring problem on Junk Charts: the redesigns never take into account the size or dimensions of the original chart. If you're going to bother redesigning a chart, why not redesign it so that it could actually replace the original?
Bar charts are generally better than pie charts? Yes, I would tend to agree. The Excel-style bar chart shown above is better than these pie charts? No, that's laughable. The bar chart shown here is just plain ugly.
Graphical excellence is more than just the type of chart you choose. This bar chart may be "scientifically proven," but that doesn't make it any good.
Posted by: J | Jul 24, 2008 at 10:09 PM
This particular instance of a series of pie charts is indeed a very poor use of the chart type.
However, the orientation of the bar chart is poorly chosen as the percentage of prevalence would go better on the y axis and the age groups along the x. And the total prevalence should either be excluded or physically separated from the incremental data.
Posted by: Michael D. Houst | Aug 01, 2008 at 08:46 AM