Further exploration of tessellation density
Jun 16, 2021
Last year, I explored using bar-density (and pie-density) charts to illustrate 80/20-type distributions, which are very common in real life (link).
The key advantage of this design is that the most important units (i.e. the biggest stars/creators) are represented by larger pieces while the long tail is shown by little pieces. The skewness is encoded in the density of the tessellation.
So when the following chart showed up on my Twitter feed, I returned to the idea of using tessellation density as a visual cue.
This wbur chart is a good statistical chart - effiicient at communicating the data, but "boring". The only things I'd change is to remove the vertical axis, gridlines, and the decimals.
In concept, the underlying data is similar to the Youtube data. Less than 0.5 percent of Youtubers produced 38% of the views on the platform. The richest 1% of the population took 15% of Harvard's spots; the richest 20% took 70%.
As I explore this further, the analogy falls apart. In the Youtube scenario, the stars should naturally occupy bigger spaces. In the Harvard scenario, letting the children of the top 1% taking up more space on the chart doesn't really make sense since each incoming Harvard student has equal status.
Instead of going down that potential deadend, I investigated how tessellation density can be used for visualization. For one thing, tessellations are pretty things and appealing.
Here is something I created:
The chart is read vertically by comparing Harvard's selection of students with the hypothetical "ideal" of equal selection. (I don't agree that this type of equality is the right thing but let me focus on the visualization here.) This, selectivity is coded in the density. Selectivity is defined here as the over/under representation. Harvard is more "selective" in lower-income groups.
In the first and second columns, we see that Harvard's densities are lower than the densities as expected in the general population, indicating that the poorest 20%, and the middle 20% of the population are under-represented in Harvard's student body. Then in the third column, the comparison flips. The density in the top box is about 3-4 times as high as the bottom box. You may have to expand the graphic to see the 1% slither, which also shows a much higher density in the top box.
I was surprised by how well I was able to eyeball the relative densities. You can try it and let me know how you fare.
(There is even a trick to do this. From the diagram with larger pieces, pick a representative piece. Then, roughly estimate how many smaller pieces from the other tessellation can fit into that representative piece. Using this guideline, I estimate that the ratios of the densities to be 1:6, 1:2, 3:1, 10:1. The actual ratios are 1:6.7, 1:2.5, 3:1, 15:1. I find that my intuition gets me most of the way there even if I don't use this trick.)
Density encoding is under-used as a visual cue. I think our ability to compare densities is surprisingly good (when the units are not overlapping). Of course, you wouldn't use density if you need to be precise, just as you wouldn't use color, or circular areas. Nevertheless, there are many occasions where you can afford to be less precise, and you'd like to spice up your charts.
Why do you design it to look like a Voronoi map, with polygons, instead of, say, a Marimekko, with rectangles? Is it to suppress the impulse to measure the areas of the rectangles?
Posted by: derek | Jun 17, 2021 at 09:01 AM
Derek: Great question. Your intuition is right. I want to convey density as opposed to granular data. The points within each bar are randomly dispersed. In most Pareto type analysis, the data have already been aggregated to a small number of groups that convey the skewness.
As a practical matter, for example, in the Youtube scenario, I'm pretty sure if we look insider the top 0.5% of super-creators, you have another skewed distribution because it's something like Zipf's distribution so you'd likely end up with one piece dominating the entire bar which doesn't help convey density.
I'm researching the distribution of cell areas inside a Voronoi tessellation. Intuitively, it seems like when there is a sufficient number of cells, there is a relatively narrow range of cell areas around the average value. I wonder how close this average is to the "standard" of dividing the bar into n equal cells. I thought about doing just a grid of n equal cells but the tessellation appearance is so much nicer!
Posted by: Kaiser | Jun 17, 2021 at 11:51 AM