A beautiful curve and its deadly misinterpretation

When the preliminary analyses of their Phase 3 trials came out , vaccine developers pleased their audience of scientists with the following data graphic:

Pfizerfda_cumcases

The above was lifted out of the FDA briefing document for the Pfizer / Biontech vaccine.

Some commentators have honed in on the blue line for the vaccinated arm of the Pfizer trial.

Junkcharts_pfizerfda_redo_vaccinecases

Since the vertical axis shows cumulative number of cases, it is noted that the vaccine reached peak efficacy after 14 days following the first dose. The second dose was administered around Day 21. At this point, the vaccine curve appeared almost flat. Thus, these commentators argued, we should make a big bet on the first dose.

***

The chart is indeed very beautiful. It's rare to see such a huge gap between the test group and the control group. Notice that I just described the gap between test and control. That's what a statistician is looking at in that chart - not the blue line, but the gap between the red and blue lines.

Imagine: if the curve for the placebo group looked the same as that for the vaccinated group, then the chart would lose all its luster. Screams of victory would be replaced by tears of sadness.

Here I bring back both lines, and you should focus on the gaps between the lines:

Junkcharts_pfizerfda_redo_twocumcases

Does the action stop around day 14? The answer is a resounding No! In fact, the red line keeps rising so over time, the vaccine's efficacy improves (since VE is a ratio between the two groups).

The following shows the vaccine efficacy curve:

Junkcharts_pfizerfda_redo_ve

Right before the second dose, VE is just below 50%. VE keeps rising and reaches 70% by day 50, which is about a month after the second dose.

If the FDA briefing document has shown the VE curve, instead of the cumulative-cases curve, few would argue that you don't need the second dose!

***

What went wrong here? How come the beautiful chart may turn out to be lethal? (See this post on my book blog for reasons why I think foregoing or delaying the second dose will exacerbate the pandemic.)

It's a bit of bait and switch. The original chart plots cumulative case counts, separately for each treatment group. Cumulative case counts are inputs to computing vaccine efficacy. It is true that as the blue line for the vaccine flattens, VE would likely rise. But the case count for the vaccine group is an imperfect proxy for VE. As I showed above, the VE continues to gain strength long after the vaccine case count has levelled.

The important lesson for data visualization designers is: plot the metric that matters to decision-makers; avoid imperfect proxies.

 

P.S. [1/19/2021: For those who wants to get behind the math of all this, the following several posts on my book blog will help.

One-dose Pfizer is not happening, and here's why

The case for one-dose vaccines is lacking key details

One-dose vaccine strategy elevates PR over science

 


Handling partial data on graphics

Last week, I posted on the book blog a piece about excess deaths and accelerated deaths (link). That whole piece is about how certain types of analysis have to be executed at certain moments of time.  The same analysis done at the wrong time yields the wrong conclusions.

Here is a good example of what I'm talking about. This is a graph of U.S. monthly deaths from Covid-19 during the entire pandemic. The chart is from the COVID Tracking Project, although I pulled it down from my Twitter feed.

Covidtracking_monthlydeaths

There is nothing majorly wrong with this column chart (I'd remove the axis labels). But there is a big problem. Are we seeing a boomerang of deaths from November to December to January?

Junkcharts_covidtrackingproject_monthlydeaths_1

Not really. This trend is there only because the chart is generated on January 12. The last column contains 12 days while the prior two columns contain 30-31 days.

Junkcharts_covidtrackingproject_monthlydeaths_2

The Trifecta Checkup picks up this problem. What the visual is showing isn't what the data are saying. I'd call this a Type D chart.

***

What to fix this?

One solution is to present partial data for all the other columns, so that the readers can compare the January column to the others.

Junkcharts_covidtrackingmonthydeaths_first12days

One critique of this is the potential seasonality. The first 38% (12 out of 31) of a month may not be comparable across months. A further seasonal adjustment makes this better - if we decide the benefits outweight the complexity.

Another solution is to project the full-month tally.

Junkcharts_covidtrackingmonthydeaths_projected

The critique here is the accuracy of the projection.

But the point is that not making the adjustment would be worse.

 

 


Dreamy Hawaii

I really enjoyed this visual story by ProPublica and Honolulu Star-Advertiser about the plight of beaches in Hawaii (link).

The story begins with a beautiful invitation:

Propublica_hawaiibeachesfrontimage

This design reminds me of Vimeo's old home page. (It no longer looks like this today but this screenshot came from when I was the data guy there.) In both cases, the images are not static but moving.

Vimeo-homepage

The tour de force of this visual story is an annotated walk along the Lanikai Beach. Here is a snapshot at one of the stops:

Propublica_hawaiibeaches_1368MokuluaDr_small

This shows a particular homeowner who, according to documents, was permitted to rebuild a destroyed seawall even though officials were supposed to disallow reconstruction in order to protect beaches from eroding. The property is marked on the map above. The image inside the box is a gif showing waves smashing the seawall.

As the reader scrolls down, the image window runs through a carousel of gifs of houses along the beach. The images are synchronized to the reader's progress along the shore. The narrative makes stops at specific houses at which point a text box pops up to provide color commentary.

***

The erosion crisis is shown in this pair of maps.

Propublica_hawaiibeaches_oldnewshoreline-sm

There's some fancy work behind the scenes to patch together images, and estimate the boundaries of th beaches.

***

The following map is notable for its simplicity. There are no unnecessary details and labels. We don't need to know the name of every street or a specific restaurant. Removing excess details makes readers focus on the informative parts. 

Propublica_hawaiibeaches_simplemap-sm

Clicking on the dots brings up more details.

***

Enjoy the entire story here.


These are the top posts of 2020

It's always very interesting as a writer to look back at a year's of posts and find out which ones were most popular with my readers.

Here are the top posts on Junk Charts from 2020:

How to read this chart about coronavirus risk

This post about a New York Times scatter plot dates from February, a time when many Americans were debating whether Covid-19 was just the flu.

Proportions and rates: we are no dupes

This post about a ArsTechnica chart on the effects of Covid-19 by age is an example of designing the visual to reflect the structure of the data.

When the pie chart is more complex than the data

This post shows a 3D pie chart which is worse than a 2D pie chart.

Twitter people upset with that Covid symptoms diagram

This post discusses some complicated graphics designed to illustrate complicated datasets on Covid-19 symptoms.

Cornell must remove the logs before it reopens in the fall

This post is another warning to think twice before you use log scales.

What is the price of objectivity?

This post turns an "objective" data visualization into a piece of visual story-telling.

The snake pit chart is the best election graphic ever

This post introduces my favorite U.S. presidential election graphic, designed by the FiveThirtyEight team.

***

Here is a list of posts that deserve more attention:

Locating the political center

An example of bringing readers as close to the insights as possible

Visualizing change over time

An example of designing data visualization to reflect the structure of multivariate data

Bloomberg made me digest these graphics slowly

An example of simple and thoughtful graphics

The hidden bad assumption behind most dual-axis time-series charts

Read this before you make a dual-axis chart

Pie chart conventions

Read this before you make a pie chart

***
Looking forward to bring you more content in 2021!

Happy new year.