The other day, a chart about the age distribution of Olympic athletes caught my attention. I found the chart on Google but didn't bookmark it and now I couldn't retrieve it. From my mind's eye, the chart looks like this:
This chart has the form of a stacked bar chart but it really isn't. The data embedded in each bar segment aren't proportions; rather, they are counts of athletes along a standardized age scale. For example, the very long bar segment on the right side of the bar for alpine skiing does not indicate a large proportion of athletes in that 30-50 age group; it's the opposite: that part of the distribution is sparse, with an outlier at age 50.
The easiest way to understand this chart is to transform it to histograms.
In a histogram, the counts for different age groups are encoded in the heights of the columns. Instead, encode the counts in a color scale so that taller columns map to darker shades of blue. Then, collapse the columns to the same heights. Each stacked bar chart is really a collapsed histogram.
The stacked bar chart reminds me of boxplots that are loved by statisticians.
In a boxplot, the box contains the middle 50% of the athletes in each sport (this directly maps to the dark blue bar segments from the chart above). Outlier values are plotted individually, which gives a bit more information about the sparsity of certain bar segments, such as the right side of alpine skiing.
The stacked bar chart can be considered a nicer-looking version of the boxplot.