Jeff, a reader of the blog, asks for comment on this blog post of his (link).
The highlight of the post is this chart, which shows an uneven distribution.
The message of the chart is that a large amount of donations (about 25%) came from the top 3 percent of donors. This is a long-tailed distribution, and quite typical of much data that have to do with financial matters. Thus, it is a general problem as many of us encounter this type of data.
One of the insights from Jeff's post is that with some tricks, one can generate a chart that looks like the above using Excel. This is pretty impressive, and he credits Peltier for the pointer.
Now, let's see if there are other ways to present this data. One issue I have with the chart is that the most important statistics are found in the text labels. These are of the form: "X% of customers contribute Y% of revenues". So, in effect, there are two relevant data series, one of the share of people and then the share of revenues.
The following is a stacked column chart:
Here, the information is primarily encoded in the dotted guide lines between the two columns. It has the advantage of showing both the absolute share of people as well as of revenues, plus showing the uneven distribution between the two data series.
But it is also less fun to look at. The advantage of the original chart is that one can imagine that all the donors are being lined up along the horizontal axis from those who gave the least to those who gave the most. That's a pretty powerful mental picture. The weakness of the original is that few of us can mentally tally up the strangely shaped areas to learn the share of revenues.
The next version is a kind of profile chart:
I like this one because it places the two data series on equal footing, and allows for efficient comparison of the two sets of proportions. It also has the feature of showing all the shares, just like the stacked columns.
PS. Jeff has taken some of his readers' comments into account, and has evolved his original design to this one:
I can see these changes:
- customers ordered with the most important on the left and the least on the right. To me, a neutral change
- The vertical axis is labelled "subscription value" instead of "How much do we get for each subscription". This is a slight improvement, using fewer words to convey the same point.
- The breakpoints have been set differently to split the revenues into five so that each segment now accounts for exactly 20% of the revenues. I actually prefer the original segmentation -- that one visually picks out the breakpoints in the data, thus it is empirical rather than canonical. Look at the split between the gray and the yellow segments in the new chart. Does it make sense to split customers with the same subscription value into two groups?