Reading an infographic about our climate crisis

Let's explore an infographic by SCMP, which draws attention to the alarming temperature recorded at Verkhoyansk in Russia on June 20, 2020. The original work was on the back page of the printed newspaper, referred to in this tweet.

This view of the globe brings out the two key pieces of evidence presented in the infographic: the rise in temperature in unexpected places, and the shrinkage of the Arctic ice.

Scmp_russianheat_1a

A notable design decision is to omit the color scale. On inspection, the scale is present - it was sewn into the graphic.

Scmp_russianheat_colorscale

I applaud this decision as it does not take the reader's eyes away from the graphic. Some information is lost as the scale isn't presented in full details but I doubt many readers need those details.

A key takeaway is that the temperature in Verkhoyansk, which is on the edge of the Arctic Circle, was the same as in New Delhi in India on that day. We can see how the red was encroaching upon the Arctic Circle.

***Scmp_russianheat_2a

Next, the rapid shrinkage of the Arctic ice is presented in two ways. First, a series of maps.

The annotations are pared to the minimum. The presentation is simple enough such that we can visually judge that the amount of ice cover has roughly halved from 1980 to 2009.

A numerical measure of the drop is provided on the side.

Then, a line chart reinforces this message.

The line chart emphasizes change over time while the series of maps reveals change over space.

Scmp_russianheat_3a

This chart suggests that the year 2020 may break the record for the smallest ice cover since 1980. The maps of Australia and India provide context to interpret the size of the Arctic ice cover.

I'd suggest reversing the pink and black colors so as to refer back to the blue and pink lines in the globe above.

***

The final chart shows the average temperature worldwide and in the Arctic, relative to a reference period (1981-2000).

Scmp_russianheat_4

This one is tough. It looks like an area chart but it should be read as a line chart. The darker line is the anomaly of Arctic average temperature while the lighter line is the anomaly of the global average temperature. The two series are synced except for a brief period around 1940. Since 2000, the temperatures have been dramatically rising above that of the reference period.

If this is a stacked area chart, then we'd interpret the two data series as summable, with the sum of the data series signifying something interesting. For example, the market shares of different web browsers sum to the total size of the market.

But the chart above should not be read as a stacked area chart because the outside envelope isn't the sum of the two anomalies. The problem is revealed if we try to articulate what the color shades mean.

Scmp_russianheat_4_inset

On the far right, it seems like the dark shade is paired with the lighter line and represents global positive anomalies while the lighter shade shows Arctic's anomalies in excess of global. This interpretation only works if the Arctic line always sits above the global line. This pattern is broken in the late 1990s.

Around 1999, the Arctic's anomaly is negative while the global anomaly is positive. Here, the global anomaly gets the lighter shade while the Arctic one is blue.

One possible fix is to encode the size of the anomaly into the color of the line. The further away from zero, the darker the red/blue color.

 

 


A beautiful curve and its deadly misinterpretation

When the preliminary analyses of their Phase 3 trials came out , vaccine developers pleased their audience of scientists with the following data graphic:

Pfizerfda_cumcases

The above was lifted out of the FDA briefing document for the Pfizer / Biontech vaccine.

Some commentators have honed in on the blue line for the vaccinated arm of the Pfizer trial.

Junkcharts_pfizerfda_redo_vaccinecases

Since the vertical axis shows cumulative number of cases, it is noted that the vaccine reached peak efficacy after 14 days following the first dose. The second dose was administered around Day 21. At this point, the vaccine curve appeared almost flat. Thus, these commentators argued, we should make a big bet on the first dose.

***

The chart is indeed very beautiful. It's rare to see such a huge gap between the test group and the control group. Notice that I just described the gap between test and control. That's what a statistician is looking at in that chart - not the blue line, but the gap between the red and blue lines.

Imagine: if the curve for the placebo group looked the same as that for the vaccinated group, then the chart would lose all its luster. Screams of victory would be replaced by tears of sadness.

Here I bring back both lines, and you should focus on the gaps between the lines:

Junkcharts_pfizerfda_redo_twocumcases

Does the action stop around day 14? The answer is a resounding No! In fact, the red line keeps rising so over time, the vaccine's efficacy improves (since VE is a ratio between the two groups).

The following shows the vaccine efficacy curve:

Junkcharts_pfizerfda_redo_ve

Right before the second dose, VE is just below 50%. VE keeps rising and reaches 70% by day 50, which is about a month after the second dose.

If the FDA briefing document has shown the VE curve, instead of the cumulative-cases curve, few would argue that you don't need the second dose!

***

What went wrong here? How come the beautiful chart may turn out to be lethal? (See this post on my book blog for reasons why I think foregoing or delaying the second dose will exacerbate the pandemic.)

It's a bit of bait and switch. The original chart plots cumulative case counts, separately for each treatment group. Cumulative case counts are inputs to computing vaccine efficacy. It is true that as the blue line for the vaccine flattens, VE would likely rise. But the case count for the vaccine group is an imperfect proxy for VE. As I showed above, the VE continues to gain strength long after the vaccine case count has levelled.

The important lesson for data visualization designers is: plot the metric that matters to decision-makers; avoid imperfect proxies.

 

P.S. [1/19/2021: For those who wants to get behind the math of all this, the following several posts on my book blog will help.

One-dose Pfizer is not happening, and here's why

The case for one-dose vaccines is lacking key details

One-dose vaccine strategy elevates PR over science

]

[1/21/2021: The Guardian chimes in with "Single Covid vaccine dose in Israel 'less effective than we thought'" (link). "In remarks reported by Army Radio, Nachman Ash said a single dose appeared “less effective than we had thought”, and also lower than Pfizer had suggested." To their credit, Pfizer has never publicly recommended a one-dose treatment.]

[1/21/2021: For people in marketing or business, I wrote up a new post that expresses the one-dose vs two-dose problem in terms of optimizing an email drip campaign. It boils down to: do you accept that argument that you should get rid of your latter touches because the first email did all the work? Or do you want to run an experiment with just one email before you decide? You can read this on the book blog here.]


Handling partial data on graphics

Last week, I posted on the book blog a piece about excess deaths and accelerated deaths (link). That whole piece is about how certain types of analysis have to be executed at certain moments of time.  The same analysis done at the wrong time yields the wrong conclusions.

Here is a good example of what I'm talking about. This is a graph of U.S. monthly deaths from Covid-19 during the entire pandemic. The chart is from the COVID Tracking Project, although I pulled it down from my Twitter feed.

Covidtracking_monthlydeaths

There is nothing majorly wrong with this column chart (I'd remove the axis labels). But there is a big problem. Are we seeing a boomerang of deaths from November to December to January?

Junkcharts_covidtrackingproject_monthlydeaths_1

Not really. This trend is there only because the chart is generated on January 12. The last column contains 12 days while the prior two columns contain 30-31 days.

Junkcharts_covidtrackingproject_monthlydeaths_2

The Trifecta Checkup picks up this problem. What the visual is showing isn't what the data are saying. I'd call this a Type D chart.

***

What to fix this?

One solution is to present partial data for all the other columns, so that the readers can compare the January column to the others.

Junkcharts_covidtrackingmonthydeaths_first12days

One critique of this is the potential seasonality. The first 38% (12 out of 31) of a month may not be comparable across months. A further seasonal adjustment makes this better - if we decide the benefits outweight the complexity.

Another solution is to project the full-month tally.

Junkcharts_covidtrackingmonthydeaths_projected

The critique here is the accuracy of the projection.

But the point is that not making the adjustment would be worse.

 

 


Dreamy Hawaii

I really enjoyed this visual story by ProPublica and Honolulu Star-Advertiser about the plight of beaches in Hawaii (link).

The story begins with a beautiful invitation:

Propublica_hawaiibeachesfrontimage

This design reminds me of Vimeo's old home page. (It no longer looks like this today but this screenshot came from when I was the data guy there.) In both cases, the images are not static but moving.

Vimeo-homepage

The tour de force of this visual story is an annotated walk along the Lanikai Beach. Here is a snapshot at one of the stops:

Propublica_hawaiibeaches_1368MokuluaDr_small

This shows a particular homeowner who, according to documents, was permitted to rebuild a destroyed seawall even though officials were supposed to disallow reconstruction in order to protect beaches from eroding. The property is marked on the map above. The image inside the box is a gif showing waves smashing the seawall.

As the reader scrolls down, the image window runs through a carousel of gifs of houses along the beach. The images are synchronized to the reader's progress along the shore. The narrative makes stops at specific houses at which point a text box pops up to provide color commentary.

***

The erosion crisis is shown in this pair of maps.

Propublica_hawaiibeaches_oldnewshoreline-sm

There's some fancy work behind the scenes to patch together images, and estimate the boundaries of th beaches.

***

The following map is notable for its simplicity. There are no unnecessary details and labels. We don't need to know the name of every street or a specific restaurant. Removing excess details makes readers focus on the informative parts. 

Propublica_hawaiibeaches_simplemap-sm

Clicking on the dots brings up more details.

***

Enjoy the entire story here.