Unlocking the secrets of a marvellous data visualization
Election visuals: three views of FiveThirtyEight's probabilistic forecasts

Putting vaccine trials in boxes

Bloomberg Businessweek has a special edition about vaccines, and I found this chart on the print edition:


The chart's got a lot of white space. Its structure is a series of simple "treemaps," one for each type of vaccine. Though simple, such a chart burns a few brain cells.

Here, I've extracted the largest block, which corresponds to vaccines that work with the virus's RNA/DNA. I applied a self-sufficiency test, removing the data from the boxes. 


What proportion of these projects have moved from pre-clinical to Phase 1?  To answer this question, we have to understand the relative areas of boxes, since that's how the data are encoded. How many yellow boxes can fit into the gray box?

It's not intuitive. We'd need a ruler to do this task properly.

Then, we learn that the gray box is exactly 8 times the size of the yellow box (72 projects are pre-clinical while 9 are in Phase I). We can cram eight yellows into the gray box. Imagine doing that, and it's pretty clear the visual elements fail to convey the meaning of the data.

Self-sufficiency is the idea that a data graphic should not rely on printed data to convey its meaning; the visual elements of a data graphic should bear much of the burden. Otherwise, use a data table. To test for self-sufficiency, cover up the printed data and see if the chart still works.


A key decision for the designer is the relative importance of (a) the number of projects reaching Phase III, versus (b) the number of projects utilizing specific vaccine strategies.

This next chart emphasizes the clinical phases:



Contrast this with the version shown in the online edition of Bloomberg (link), which emphasizes the vaccine strategies.


If any reader can figure out the logic of the ordering of the vaccine strategies, please leave a comment below.


Feed You can follow this conversation by subscribing to the comment feed for this post.


The vaccine strategies are ordered by how similar they are to what people usually think of as a vaccine (that is, giving someone a weaker form of the virus). The top one is using the virus itself, the middle one is using a piece of the virus, and the bottom one is just using a small number of genes. You could also think of it as being ordered by "size of piece", from the full virus (tiny but still the largest thing on the list) to genes (much smaller).

Note that the article itself has them in the same order from top to bottom, so the graph parallels the text.

Elliot Bentley

My uninformed guess is that the vaccine strategies are ordered by the size of the approach’s target. Whole pathogen > subunits > nucleic acid?


The other commenters argue that the strategies are an ordinal category per Stevens, and thus non-reorderable per Bertin.

But even if they were nominal and thus re-orderable, they are still in the order fewest trials to most, with Unknown being a last catchall category put at the end as one category often is, outside the order of the others.


Thanks for the comments. I think we have two possible ordering schemes here. Ultimately, it's the designer's call. Each has its own weakness - it makes more sense to order from most to fewest trials if one assumes that the number of trials is an indicator of importance, and the ordering by size of target is a milder version of alphabetical. Reading this chart, I hope they would add some subjective assessment of how likely these trials will succeed - I'm not sure that the number of trials conveys the likelihood.

The comments to this entry are closed.