## Various ways of showing distributions

##### Aug 04, 2016

The other day, a chart about the age distribution of Olympic athletes caught my attention. I found the chart on Google but didn't bookmark it and now I couldn't retrieve it. From my mind's eye, the chart looks like this:

This chart has the form of a stacked bar chart but it really isn't. The data embedded in each bar segment aren't proportions; rather, they are counts of athletes along a standardized age scale. For example, the very long bar segment on the right side of the bar for alpine skiing does not indicate a large proportion of athletes in that 30-50 age group; it's the opposite: that part of the distribution is sparse, with an outlier at age 50.

The easiest way to understand this chart is to transform it to histograms.

In a histogram, the counts for different age groups are encoded in the heights of the columns. Instead, encode the counts in a color scale so that taller columns map to darker shades of blue. Then, collapse the columns to the same heights. Each stacked bar chart is really a collapsed histogram.

***

The stacked bar chart reminds me of boxplots that are loved by statisticians.

In a boxplot, the box contains the middle 50% of the athletes in each sport (this directly maps to the dark blue bar segments from the chart above). Outlier values are plotted individually, which gives a bit more information about the sparsity of certain bar segments, such as the right side of alpine skiing.

The stacked bar chart can be considered a nicer-looking version of the boxplot.

You can follow this conversation by subscribing to the comment feed for this post.

This might be the original:
http://www.washingtonpost.com/wp-srv/special/sports/profiles-in-speed/age/sports-by-age.html

And then there's the violin plot: https://en.wikipedia.org/wiki/Violin_plot

Personally, I like the boxplots the best, I think they're the easiest to understand, the fastest to ingest at a glance, and the most informative for deeper review.

For a technical audience, this is a very familiar presentation that needs little explanation.

For a non-technical audience, a single explanatory set of labels would be appropriate, as was done on the top chart that spawned the discussion. The short discussion about how to interpret boxplots would also be good practice, as was done above.

Great post! Each chart has its own merits and limitations.
Maybe it's a good idea to use all of them together if space permits.

I had this combination of overlaying boxplot over histogram chart.
http://vizdiff.blogspot.com/2015/11/overlaying-histogram-with-box-and.html

The comments to this entry are closed.