## Further exploration of tessellation density

##### Jun 16, 2021

Last year, I explored using bar-density (and pie-density) charts to illustrate 80/20-type distributions, which are very common in real life (link).

The key advantage of this design is that the most important units (i.e. the biggest stars/creators) are represented by larger pieces while the long tail is shown by little pieces. The skewness is encoded in the density of the tessellation.

So when the following chart showed up on my Twitter feed, I returned to the idea of using tessellation density as a visual cue.

This wbur chart is a good statistical chart - effiicient at communicating the data, but "boring". The only things I'd change is to remove the vertical axis, gridlines, and the decimals.

In concept, the underlying data is similar to the Youtube data. Less than 0.5 percent of Youtubers produced 38% of the views on the platform. The richest 1% of the population took 15% of Harvard's spots; the richest 20% took 70%.

As I explore this further, the analogy falls apart. In the Youtube scenario, the stars should naturally occupy bigger spaces. In the Harvard scenario, letting the children of the top 1% taking up more space on the chart doesn't really make sense since each incoming Harvard student has equal status.

Instead of going down that potential deadend, I investigated how tessellation density can be used for visualization. For one thing, tessellations are pretty things and appealing.

Here is something I created:

The chart is read vertically by comparing Harvard's selection of students with the hypothetical "ideal" of equal selection. (I don't agree that this type of equality is the right thing but let me focus on the visualization here.) This, selectivity is coded in the density. Selectivity is defined here as the over/under representation. Harvard is more "selective" in lower-income groups.

In the first and second columns, we see that Harvard's densities are lower than the densities as expected in the general population, indicating that the poorest 20%, and the middle 20% of the population are under-represented in Harvard's student body. Then in the third column, the comparison flips. The density in the top box is about 3-4 times as high as the bottom box. You may have to expand the graphic to see the 1% slither, which also shows a much higher density in the top box.

I was surprised by how well I was able to eyeball the relative densities. You can try it and let me know how you fare.

(There is even a trick to do this. From the diagram with larger pieces, pick a representative piece. Then, roughly estimate how many smaller pieces from the other tessellation can fit into that representative piece. Using this guideline, I estimate that the ratios of the densities to be 1:6, 1:2, 3:1, 10:1. The actual ratios are 1:6.7, 1:2.5, 3:1, 15:1. I find that my intuition gets me most of the way there even if I don't use this trick.)

Density encoding is under-used as a visual cue. I think our ability to compare densities is surprisingly good (when the units are not overlapping). Of course, you wouldn't use density if you need to be precise, just as you wouldn't use color, or circular areas. Nevertheless, there are many occasions where you can afford to be less precise, and you'd like to spice up your charts.

## The unreasonable effect of chart labels

##### Apr 01, 2019

In discussing the bar-density and pie-density charts with a buddy (thanks LB!), it became obvious that the labeling is a challenge. And he's right.

Here is the pie-density chart for the Youtube views with the labels as originally conceived.

These labels are trying too hard to provide precise data to the reader.

Here are some simplified labels that get at the message rather than the data:

Here is a slightly different version:

## Bar-density and pie-density plots for showing relative proportions

##### Mar 26, 2019

In my last post, I described a bar-density chart to show paired data of proportions with an 80/20-type rule. The following example illustrates that a small proportion of Youtubers generate a large proportion of views.

Other examples of this type of data include:

• the top 10% of families own 75% of U.S. household wealth (link)
• the top 1% of artists earn 77% of recorded music income (link)
• Five percent of AT&T customers consume 46% of the bandwidth (link)

In all these examples, the message of the data is the importance of a small number of people (top earners, superstars, bandwidth hogs). A good visual should call out this message.

The bar-density plot consists of two components:

• the bar chart which shows the distribution of the data (views, wealth, income, bandwidth) among segments of people;
• The embedded Voronoi diagram within each bar that encodes the relative importance of each people segment, as measured by the (inverse) density of the population among these segments - a people segment is more important if each individual accounts for more of the data, or in other words, the density of people within the group is lower.

The bar chart can adopt a more conventional horizontal layout.

Voronoi tessellation

To understand the Voronoi diagram, think of a fixed number (say, 100) of randomly placed points inside a bar. Then, for any point inside the bar area, it has a nearest neighbor among those 100 fixed points. Assign every point on the surface to its nearest neighbor. From this, one can draw a boundary around each of the 100 points to include all its nearest neighbors. The resulting tessellation is the Voronoi diagram. (The following illustration comes from this AMS column.)

The density of points in the respective bars encodes the relative proportions of people within those groups. For my example, I placed 6 points in the red bar, 666 points in the yellow bar, and ~2000 points in the gray bar, which precisely represents the relative proportions of creators in the three segments.

Density is represented statistically

Notice that the density is represented statistically, not empirically. According to the annotation on the original chart, the red bar represents 14,000 super-creators. Correspondingly, there are 4.5 million creators in the gray bar. Any attempt to plot those as individual pieces will result in a much less impactful graphic. If the representation is interpreted statistically, as relative densities within each people segment, the message of relative importance of the units within each group is appropriately conveyed.

A more sophisticated way of deciding how many points to place in the red bar is to be developed. Here, I just used the convenient number of 6.

The color shades are randomly applied to the tessellation pieces, and used to facilitate reading of densities.

***

In this section, I provide R code for those who want to explore this some more. This is code used for prototyping, and you're welcome to improve them. The general strategy is as follows:

• Set the rectangular area (bar) in which the Voronoi diagram is to be embedded. The length of the bar is set to the proportion of views, appropriately scaled. The code utilizes the dirichlet function within the spatstat package to generate the fixed points; this requires setting up the owin parameter to represent a rectangle.
• Set the number of points (n) to be embedded in the bar, determined by the relative proportion of creators, appropriately scaled. Generate a data frame containing the x-y coordinates of n randomly placed points, within the rectangle defined above.
• Use the ppp function to generate the Voronoi data
• Set up a colormap for plotting the Voronoi diagram
• Plot the Voronoi diagram; assign shades at random to the pieces (in a production code, these random numbers should be set as marks in the ppp but it's easier to play around with the shades if placed here)

The code generates separate charts for each bar segment. A post-processing step is currently required to align the bars to attain equal height. I haven't figured out whether the multiplot option helps here.

library(spatstat)

# enter the scaled proportions of creators and views
# the Youtube example has three creator segments

# number of randomly generated points should be proportional to proportion of creators. Multiply nc by a scaling factor if desired

nc = c(3, 33, 965)*2

# bar widths should be proportional to proportion of views
# total width should be set based on the width of your page

wide = c(378, 276, 346)/2

# set bar height, to attain a particular aspect ratio
bar_h = 50

# define function to generate points
# defines rectangular window

makepoints = function (n, wide, height) {
df <- data.frame(x = runif(n,0,wide),y = runif(n,0,height))
W <- owin( c(0, wide), c(0,height) ) # rectangular window
pp1 <- as.ppp( df, W )
y <- dirichlet(pp1)
# y\$marks <- sample(0:wide, n, replace=T) # marks are for colors
return (y)
}

y_red = makepoints(nc[1], wide[1], bar_h) # height of each bar fixed
y_yel = makepoints(nc[2], wide[2], bar_h)
y_gry = makepoints(nc[3], wide[3], bar_h)

# setting colors (4 shades per bar, one color per bar)

cr_red = colourmap(c("lightsalmon","lightsalmon2", "lightsalmon4", "brown"), breaks=round(seq(0, wide[1],length.out=5)))

cr_yel = colourmap(c("burlywood1", "burlywood2", "burlywood3", "burlywood4"), breaks=round(seq(0, wide[2],length.out=5)))

cr_gry = colourmap(c("gray80", "gray60", "gray40", "gray20"), breaks=round(seq(0, wide[3],length.out=5)))

# plotting

par(mar=c(0,0,0,0))

# add png to save image to png

# remove values= if colors set in ppp

plot.tess(y_red, main="", border="pink3", do.col=T, values = sample(0:wide[1], nc[1], replace=T), col=cr_red, xlim=c(0, wide[1]), ylim=c(0,bar_h), ribbon=F)

plot.tess(y_yel, main="", border="darkgoldenrod4", do.col=T, values=sample(0:wide[2], nc[2], replace=T), col=cr_yel, xlim=c(0, wide[2]), ylim=c(0,bar_h), ribbon=F)

plot.tess(y_gry, main="", border="darkgray", do.col=T, values=sample(0:wide[3], nc[3], replace=T), col=cr_gry, xlim=c(0, wide[3]), ylim=c(0,bar_h), ribbon=F)

# because of random points, the tessellation looks different each time
# post-processing: make each bar the same height when aligned side by side

***

A cousin of the bar-density plot is the pie-density plot. Since I'm using only three creator segments, which each account for about 30-40% of the total views, it is natural to use a pie chart. In this case, we embed the Voronoi diagrams into the pie sectors.

If the distribution were more even, that is to say, the creators are more or less equally important, the pie-density plot looks like this:

***

Something that is more like 80/20

The original chart shows the top 0.3 percent generating almost 40 percent of the views. A more typical insight is top X percent generates 80 percent of the data. For the YouTube data, X is 11 percent. What does the pie-density chart look like if  top 11 percent <-> 80 percent, middle 33 percent <-> 11 percent, bottom 56 percent <-> 8 percent?

Roughly speaking, the second segment includes 3 times the people as the largest, and the third has 5 times as the largest.

P.S.

1) Check out my first Linkedin "article" on this topic.

2) The first post on bar-density charts is here.