« Visualizing the 80/20 rule, with the bar-density plot | Main | The Economist on the Economist: must read now »


Peter H

Creative solution, though I think many readers would have trouble interpreting the graph at first glance.

For example, the second pie chart ("if the distribution were more even...") really helps clarify how to read the first. Which begs the question, if the second pie chart is necessary, is the first really clear by itself?

I think I like the mirrored bar charts better, which are more easily understood. Though neither solution helps me quickly get a "feel" for the data. I think this is just a classically hard relationship to convey graphically.

Peter H

Perhaps a cumulative density plot is the best way to show this?

Something like: http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/images/ex35out.png


PH: That's a "Lorenz" curve which I address in the Linkedin article as one of the more popular options. It's another option in which the large group of least important people get the most attention so I'm not too happy with it either. I liked it until I realized how hard it is to explain to non-technical people.

Jamie Briggs

I think this is very interesting, conceptually.

Looking at the first bar, I find it very difficult to think anything other than "one person made a *lot* of content, and five other people made quite a bit too".

I am really struggling to imagine how this can move from academic exercise to useful real-world visualization in a way that's more effective than the simpler methods explored.

Jon Peltier

I had the same impression as Jamie. How about if you cheat with the voronoi graphic and space the points more uniformly before drawing the cells boundaries?


JB & JP: Thanks for the valuable comments.

Let me provide some more behind-the-scenes thinking that didn't make it to the blog post.

1) The fact that you are drawn to the large creators tells me that the chart is succeeding in its key objective. The problem with all of the common ways of visualizing 80/20 data is that they fail to bring this message out. Instead, those charts say "look, here is a very small group of creators" and "look, in aggregate, this group makes a lot of content". In all the real-world cases in which I or my analysts are presenting 80/20 data, we want to draw attention to the big key accounts.

2) The common visualizations are not "simpler." It appears simpler for those of us who have learned how to read the Lorenz curve or the stacked bar charts. On the occasion when I had to explain those to someone who do not have the background, it is clear that they are not easy to understand. Part of the reason, I suspect, is that those charts do not visually tell the full story - the reader has to infer in his/her head that there are a few really big creators because (a) there are a small number of them and (b) in aggregate, they make up a big chunk of views.

3) It initially bothers me that a reader may be tempted to read the size of each piece of tessellation literally, even though the encoding of size into density is a deliberate decision of visualizing with less precision, much like encoding anything into color gradation. But I believe it is better than other alternatives.

JP suggested to divide up each bar/sector into equal areas. This change does not solve the problem if the reader insists on interpreting the size of each piece of tessellation. In fact, in real datasets, it is quite likely that the sizes are unevenly distributed with a heavy skew, especially when the number of pieces is small. So the real critique is that the relative sizes of the pieces do not reflect the true relative sizes of the individual creators.

4) This last question was mentioned briefly in the original post: the designer can choose to make the tessellation pieces reflect the exact relative sizes of individual creators. I believe the extra work is not merited. The appearance of the graphic will not change substantially. Further, when the total number of units is large, like in the case of Youtube, the added precision is destructive.

When the number of units are large, smoothing is a common solution. (This is the extra work.) You'd produce a smoothed version of the views distribution within each segment. Then divide each bar/sector into a specific number of pieces. Next, you have to solve the problem of creating a tessellation that fits the pieces into the bar/sector.

5) One reason why I ended up liking the Voronoi approach is that its lack of precision forces the reader to think conceptually about the existence of some very important creators. The irregular shapes and sizes make it impossible to dwell on the specific individual comparisons.

If the goal of the data visualization is to highlight the specific individual contributions of the top creators, then a simple bar chart showing Top 10 Creators is clearly superior. If the goal of the data visualization is to provide specific, precise data on the proportions of creators and views, then a data table is clearly superior.

My goal here is to come up with a visual design that directly tells the story of Youtube stars, the super-rich, superstars, etc.; that is more intuitive to a non-technical audience; and that is engaging.

The comments to this entry are closed.

Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter