Visualizing the 80/20 rule, with the bar-density plot
Mar 25, 2019
Through Twitter, Danny H. submitted the following chart that shows a tiny 0.3 percent of Youtube creators generate almost 40 percent of all viewing on the platform. He asks for ideas about how to present lop-sided data that follow the "80/20" rule.
In the classic 80/20 rule, 20 percent of the units account for 80 percent of the data. The percentages vary, so long as the first number is small relative to the second. In the Youtube example, 0.3 percent is compared to 40 percent. The underlying reason for such lop-sidedness is the differential importance of the units. The top units are much more important than the bottom units, as measured by their contribution to the data.
I sense a bit of "loss aversion" on this chart (explained here). The designer color-coded the views data into blue, brown and gray but didn't have it in him/her to throw out the sub-categories, which slows down cognition and adds hardly to our understanding.
I like the chart title that explains what it is about.
Turning to the D corner of the Trifecta Checkup for a moment, I suspect that this chart only counts videos that have at least one play. (Zero-play videos do not show up in a play log.) For a site like Youtube, a large proportion of uploaded videos have no views and thus, many creators also have no views.
***
My initial reaction on Twitter is to use a mirrored bar chart, like this:
I ended up spending quite a bit of time exploring other concepts. In particular, I like to find an integrated way to present this information. Most charts, such as the mirrored bar chart, a Bumps chart (slopegraph), and Lorenz chart, keep the two series of percentages separate.
Also, the biggest bar (the gray bar showing 97% of all creators) highlights the least important Youtubers while the top creators ("super-creators") are cramped inside a slither of a bar, which is invisible in the original chart.
What I came up with is a bar-density plot, where I use density to encode the importance of creators, and bar lengths to encode the distribution of views.
Each bar is divided into pieces, with the number of pieces proportional to the number of creators in each segment. This has the happy result that the super-creators are represented by large (red) pieces while the least important creators by little (gray) pieces.
The embedded tessellation shows the structure of the data: the bottom third of the views are generated by a huge number of creators, producing a few views each - resulting in a high density. The top 38% of the views correspond to a small number of super-creators - appropriately shown by a bar of low density.
For those interested in technicalities, I embed a Voronoi diagram inside each bar, with randomly placed points. (There will be a companion post later this week with some more details, and R code.)
Here is what the bar-density plot looks like when the distribution is essentially uniform:
The density inside each bar is roughly the same, indicating that the creators are roughly equally important.
P.S.
1) The next post on the bar-density plot, with some experimental R code, will be available here.
2) Check out my first Linkedin "article" on this topic.
It seems like first dividing the dataset so that each set of creators produces the exact same amount of content would help. So it might be 0.27%, 4% and ~96% of creators, each producing 1/3 of content.
I could even see using a 1/3 1/3 1/3 pie chart. Normally I wouldn't, but in this case you're showing that the numbers are the same and add up to the whole, so the usual issues with pie charts might not apply. And people love pie charts despite their inefficacy for some purposes.
You could stick with the tessellation, and actually say in the figure how many creators each polygon represents. If you had that tesselation reflect the actual data, it might grade from larger polygons representing the largest contributors of each category to smaller polygons. That grading would be easiest with a bar not a pie...
I feel like I'd like to know how many minor users upload the same number of videos as a single super-creator. Conceptually something like 1/3:1/3:1/3 = 1:37:462
Posted by: Bretwood Higman | Mar 25, 2019 at 03:53 PM
BH: There will be more examples in my next post, including the "pie-density" plot, which gets at some of the technical issues (and open questions).
Posted by: Kaiser | Mar 25, 2019 at 04:35 PM