What if the RNC assigned seating randomly
Various ways of showing distributions 2

Various ways of showing distributions

The other day, a chart about the age distribution of Olympic athletes caught my attention. I found the chart on Google but didn't bookmark it and now I couldn't retrieve it. From my mind's eye, the chart looks like this:

Age_olympics_stackedbars

This chart has the form of a stacked bar chart but it really isn't. The data embedded in each bar segment aren't proportions; rather, they are counts of athletes along a standardized age scale. For example, the very long bar segment on the right side of the bar for alpine skiing does not indicate a large proportion of athletes in that 30-50 age group; it's the opposite: that part of the distribution is sparse, with an outlier at age 50.

The easiest way to understand this chart is to transform it to histograms.

Redo_age_olympics_histo2

In a histogram, the counts for different age groups are encoded in the heights of the columns. Instead, encode the counts in a color scale so that taller columns map to darker shades of blue. Then, collapse the columns to the same heights. Each stacked bar chart is really a collapsed histogram.

***

The stacked bar chart reminds me of boxplots that are loved by statisticians.

Redo_age_olympics_boxplot2b

In a boxplot, the box contains the middle 50% of the athletes in each sport (this directly maps to the dark blue bar segments from the chart above). Outlier values are plotted individually, which gives a bit more information about the sparsity of certain bar segments, such as the right side of alpine skiing.

The stacked bar chart can be considered a nicer-looking version of the boxplot.

 

 

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Evan

This might be the original:
http://www.washingtonpost.com/wp-srv/special/sports/profiles-in-speed/age/sports-by-age.html

Chris Pudney

And then there's the violin plot: https://en.wikipedia.org/wiki/Violin_plot

Doug Dame

Personally, I like the boxplots the best, I think they're the easiest to understand, the fastest to ingest at a glance, and the most informative for deeper review.

For a technical audience, this is a very familiar presentation that needs little explanation.

For a non-technical audience, a single explanatory set of labels would be appropriate, as was done on the top chart that spawned the discussion. The short discussion about how to interpret boxplots would also be good practice, as was done above.

Alexander Mou

Great post! Each chart has its own merits and limitations.
Maybe it's a good idea to use all of them together if space permits.

I had this combination of overlaying boxplot over histogram chart.
http://vizdiff.blogspot.com/2015/11/overlaying-histogram-with-box-and.html

The comments to this entry are closed.