Speaking to the choir

A friend found the following chart about the "carbon cycle", and sent me an exasperated note, having given up on figuring it out. The chart came from a report, and was reprinted in Ars Technica (link).


The problem with the chart is that the designer is speaking to the choir. One must know a lot about the carbon cycle already to make sense of everything that's going on.

We see big and small arrows pointing up or down. Each arrow has a number attached to it, plus a range inside brackets. These numbers have no units, and it's not obvious what they are measuring.

The arrows come in a variety of colors. The colors are explained by labels but the labels dexcribe apparently unrelated concepts (e.g. fossil CO2 and land-use change).

Interspersed with the arrows is a singular dot. The dot also has a number attached to it. The number wears a plus sign, which signals it's being treated differently than the quantities with up arrows.

The singular dot is an outcast, ostracized from the community of dots in the bottom part of the chart. These dots have labels but no numbers. They come in different sizes but no scale is provided.

The background is divided into three parts, showing the atmosphere, the land mass, and the ocean. The placement of the arrows and dots suggests each measured quantity concerns one of these three parts. Well... except the dot labeled "surface sediments" that sit on the boundary of the land mass and the ocean.

The three-way classification is only one layer of the chart. A different classification is embedded in the color scheme. The gray, light green, and aquamarine arrows in the sky find their counterparts in the dots of the land mass, and the ocean.

What's more, the boundaries between land and sky, and between land and ocean are also painted with those colors. These boundary segments have been given different colors so that the lengths of these segments seem to contain data but we aren't sure what.

At this point, I noticed thin arrows which appear to depict back and forth flows. There may be two types of such exchanges, one indicated by a cycle, the other by two straight arrows in opposite directions. The cycles have no numbers while each pair of straight thin arrows gets two numbers, always identical.

At the bottom of the chart is a annotation in red: "Budget imbalance = -1.0". Presumably some formula ties the numbers shown above to this -1.0 result. We still don't know the units, and it's unclear if -1.0 is a bad number. A negative number shown in red typically indicates a bad number but how bad is it?

Finally, on the top right corner, I found a legend. It's not obvious at first because the legend symbols (arrows and dots) are shown in gray, a color not used elsewhere on the chart. It appears as if it represents another color category. The legend labels do little for me. What is an "anthropogenic flux"? What does the unit of "GtCO2" stand for? Other jargon includes "carbon cycling" and "stocks". The entire diagram is titled "carbon cycle" while the "carbon cycling" thin arrows are only a small part of the diagram.

The bottom line is I have no idea what this chart is saying to me, other than that the earth is a complex system, and that the designer has tried valiantly to impregnate the diagram with lots of information. If I am well read in environmental science, my experience is likely different.






A little stitch here, a great graphic is knitted

The Wall Street Journal used the following graphic to compare hurricanes Ida and Katrina (link to paywalled article).


This graphic illustrates the power of visual communications. Readers can learn a lot from it.

The paths of the storms can be compared. The geographical locations of the landfalls are shown. The strengthening of wind speeds as the hurricanes moved toward Louisiana is also displayed. Ida is clearly a lesser storm than Katrina: its wind speed never reached Category 5, and is generally lower at comparable time points.

The greatest feature of the WSJ graphic is how the designer stitches the two plots into one graphic. The anchors are two time points: when each storm attained enough wind speed to be classified as a hurricane (indicated by open dots), and when each storm made landfall in Louisiana. It is this little-noticed feature that makes it so easy to place each plot in context of the other.


Simple charts are the hardest to do right

The CDC website has a variety of data graphics about many topics, one of which is U.S. vaccinations. I was looking for information about Covid-19 data broken down by age groups, and that's when I landed on these charts (link).


The left panel shows people with at least one dose, and the right panel shows those who are "fully vaccinated." This simple chart takes an unreasonable amount of time to comprehend.


The analyst introduces three metrics, all of which are described as "percentages". Upon reflection, they are proportions of the people in specific age ranges.

Readers are thus invited to compare these proportions. It's not clear, however, which comparisons are intended. The first item listed in the legend states "Percent among Persons who completed all recommended doses in last 14 days". For most readers, including me, this introduces an unexpected concept. The 14 days here do not refer to the (in)famous 14-day case-counting window but literally the most recent two weeks relative to when the chart was produced.

It would have been clearer if the concept of Proportions were introduced in the chart title or axis title, while the color legend explains the concept of the base population. From the lighter shade to the darker shade (of red and blue) to the gray color, the base population shifts from "Among Those Who Completed/Initiated Vaccinations Within Last 14 Days" to "Among Those Who Completed/Initiated Vaccinations Any Time" to "Among the U.S. Population (regardless of vaccination status)".

Also, a reverse order helps our comprehension. Each subsequent category is a subset of the one above. First, the whole population, then those who are fully vaccinated, and finally those who recently completed vaccinations.

The next hurdle concerns the Q corner of our Trifecta Checkup. The design leaves few hints as to what question(s) its creator intended to address. The age distribution of the U.S. population is useless unless it is compared to something.

One apparently informative comparison is the age distribution of those fully vaccinated versus the age distribution of all Americans. This is revealed by comparing the lengths of the dark blue bar and the gray bar. But is this comparison informative? It's telling me that people aged 50 to 64 account for ~25% of those who are fully vaccinated, and ~20% of all Americans. Because proportions necessarily add to 100%, this implies that other age groups have been less vaccinated. Duh! Isn't that the result of an age-based vaccination prioritization? During the first week of the vaccination campaign, one might expect close to 100% of all vaccinations to be in the highest age group while it was 0% for the other age groups.

This is a chart in search of a question. The 25% vs 20% comparison does not assist readers in making a judgement. Does this mean the vaccination campaign is working as expected, worse than expected or better than expected? The problem is the wrong baseline. The designer of this chart implies that the expected proportions should conform to the overall age distribution - but that clearly stands in the way of CDC's initial prioritization of higher-risk age groups.


In my version of the chart, I illustrate the proportion of people in each age group who have been fully vaccinated.


Among those fully vaccinated, some did it within the most recent two weeks:



Elsewhere on the CDC site, one learns that on these charts, "fully vaccinated" means one shot of J&J or 2 shots of Pfizer or Moderna, without dealing with the 14-day window or other complications. Why do we think different definitions are used in different analyses? Story-first thinking, as I have explained here. When it comes to telling the story about vaccinations, the story is about the number of shots in arms. They want as big a number as possible, and abandon any criterion that decreases the count. When it comes to reporting on vaccine effectiveness, they want as small a number of cases as possible.






Check your presumptions while you're reading this chart about Israel's vaccination campaign

On July 30, Israel began administering third doses of mRNA vaccines to targeted groups of people. This decision was controversial since there is no science to support it. The policymakers do have educated guesses by experts based on best-available information. By science, I mean actual evidence. Since no one has previously been given three shots, there can be no data on which anyone can root such a decision. Nevertheless, the pandemic does not always give us time to collect relevant data, and so speculative analysis has found its calling.

Dvir Aran, at Technion, has been diligently tracking the situation in Israel on his Twitter. Ten days after July 30, he posted the following chart, which immediately led many commentators to bounce out of their seats crowning the third shot as a magic bullet. Notably, Dvir himself did not endorse such a claim. (See here to learn how other hasty conclusions by experts have fared.)

When you look at Dvir's chart, what do we see?


Possibly one of the following two things, depending on what concern you have in your head.

1) The red line sits far above the other two lines, showing that unvaccinated people are much more likely to get infected.

2) The blue line diverges from the green line almost immediately after the 3rd shots started getting into arms, showing that the 3rd shot is super effective.

If you take another moment to look, you might start asking questions, as many in Twitter world did. Dvir was startlingly efficient at answering these queries.

A) Does the green line represent people with 2 or 3 doses, or is it strictly 2 doses? Aron asked this question and got the answer (the former):


It's time to check our presumptions. When you read that chart, did you presume it's exactly 2 doses or did you presume it's 2 or 3 doses? Or did you immediately spot the ambiguity? As I said in this article, graphs attain efficiency at communication because the designer leverages unspoken rules - the chart conveys certain information without explicitly placing it on the chart. But this can backfire. In this case, I presumed the three lines to display three non-overlapping groups of people, and thus the green line indicates those with 2 doses but not 3. That presumption led me to misinterpret what's on the chart.

B) What is the denominator of the case rates? Is it literal - by that I mean, all unvaccinated people for the red line, and all people with 3 doses for the blue line? Or is the denominator the population of Israel, the same number for all three lines? Lukas asked this question, and got the answer (the former).


C) Since third shots are recommended for 60 year olds and over who were vaccinated at least 5 months ago, and most unvaccinated Israelis are below 60, this answer opens the possibility that the lines compare apples and oranges. Joe. S. asked about this, and received an answer (all lines display only 60 year olds and over.)


Jason P. asked, and learned that the 5-month-out criterion is immaterial since 90% of the vaccinated have already reached that time point.


D) We have even more presumptions. Like me, did you presume that the red line represents the "unvaccinated," meaning people who have not had any vaccine shots? If so, we may both be wrong about this. It has become the norm by vaccine researchers to lump "partially vaccinated" people with "unvaccinated", and call this combined group "unvaccinated". Here is an excerpt from a recent report from Public Health Ontario (link to PDF), which clearly states this unintuitive counting rule:


Notice that in this definition, someone who got infected within 14 days of the first shot is classified as an "unvaccinated" case and not a "partially vaccinated case".

In the following tweet, Dvir gave a hint of what he plotted:


In a previous analysis, he averaged the rates of people with 0 doses and 1 dose, which is equivalent to combining them and calling them unvaccinated. It's unclear to me what he did to the 1-dose subgroup in our featured chart - did it just vanish from the chart? (How people and cases are classified into these groups is a major factor in all vaccine effectiveness calculations - a topic I covered here. Unfortunately, most published reports do a poor job explaining what the analysts did).

E) Did you presume that all three lines are equally important? That's far from true. Since Israel is the world champion in vaccination, the bulk of the 60+ population form the green line. I asked Dvir and he responded that only 7.5%, or roughly 100K are unvaccinated.


That means 1.2 million people are part of the green line, 12 times higher. There are roughly 50 cases per day among unvaccinated, and 370 daily cases among those with 2 or 3 doses. In other words, vaccinated people account for almost 90% of all cases.

Yes, this is inevitable when over 90% of the age group have been vaccinated (but it is predictable on the first day someone blasted everywhere that real-world VE is proved by the fact that almost all new cases were in the unvaccinated.)

If your job is to minimize infections, you should be spending most of your time thinking about the 370 cases among vaccinated than the 50 cases among unvaccinated. If you halve the case rate, that would be a difference of 185 cases vs 25. In Israel, the vaccination campaign has already succeeded; it's time to look forward, which is exactly why they are re-focusing on the already vaccinated.


If what you worry about most is the effectiveness of the original two-dose regimen, Dvir's chart raises a puzzle. Ignore the blue line, and remember that the green line already includes everybody represented by the blue line.

In the following chart, I removed the blue line, and added reference lines in dashed purple that correspond to 25%, 50% and 75% vaccine effectiveness. The data plotted on this chart are unadjusted case rates. A 75% effective vaccine cuts case rate by three quarters.


This chart shows the 2-dose mRNA vaccine was nowhere near 90% effective. (As regular readers know, I don't endorse this simplistic calculation and have outlined the problems here, but this style of calculation keeps getting published and passed around. Those who use it to claim real-world studies confirm prior clinical trial outcomes can either (a) insist on using it and retract their earlier conclusions, or (b) admit that such a calculation was, and is, a bad take.)

Also observe how the vaccinated (green) line is moving away from the unvaccinated (red) line. The vaccine apparently is becoming more effective, which runs counter to the trend used by the Israeli government to justify third doses. This improvement also precedes the start of the third-shot campaign. When the analytical method is bad, it generates all sorts of spurious findings.


As Dvir said, it is premature to comment on the third doses based on 10 days of data. For one thing, the vaccine developers insist that their vaccines must be given 14 days to work. In a typical calculation, all of the cases in the blue line fall outside the case-counting window. The effective number of cases that would be attributed to the 3-dose group right now is zero, and the vaccine effectiveness using the standard methodology is 100%, even better than shown in the chart.

There is an alternative interpretation of this graph. Statisticians call this the selection effect. On July 30, the blue line split out of the green: some people were selected to receive the 3rd dose - this includes an official selection (the government makes certain subgroups eligible) as well as a self-selection (within the eligible subgroup, certain people decide to get the 3rd shot earlier.) If those who are less exposed to the virus, or more risk averse, get the shots first, then all that is happening may be that we have split off a high VE subgroup from the green line. Even if the third shot were useless, the selection effect itself could explain the gap.

Statistics is about grays. It's not either-or. It's usually some of each. If you feel like Groundhog Day, you're getting the picture. When they rolled out two doses, we lived through an optimistic period in which most experts rejoiced about 90-100% real-world effectiveness, and then as more people get vaccinated, the effect washed away. The selection effect gradually disappears when vaccination becomes widespread. Are we starting a new cycle of hope and despair? We'll find out soon enough.

Ranking data provide context but can also confuse

This dataviz from the Economist had me spending a lot of time clicking around - which means it is a success.


The graphic presents four measures of wellbeing in society - life expectancy, infant mortality rate, murder rate and prison population. The primary goal is to compare nations across those metrics. The focus is on comparing how certain nations (or subgroups) rank against each other, as indicated by the relative vertical position.

The Economist staff has a particular story to tell about racial division in the US. The dotted bars represent the U.S. average. The colored bars are the averages for Hispanic, white and black Americans. The wider the gap between the colored bars, the more variant is the experiences between American races.

The chart shows that the racial gap of life expectancy is the widest. For prison population, the U.S. and its racial subgroups occupy many of the lowest (i.e. least desirable) ranks, with the smallest gap in ranking.


The primary element of interactivity is hovering on a bar, which then highlights the four bars corresponding to the particular nation selected. Here is the picture for Thailand:


According to this view of the world, Thailand is a close cousin of the U.S. On each metric, the Thai value clings pretty near the U.S. average and sits within the range by racial groups. I'm surprised to learn that the prison population in Thailand is among the highest in the world.

Unfortunately, this chart form doesn't facilitate comparing Thailand to a country other than the U.S as one can highlight only one country at a time.


While the main focus of the chart is on relative comparison through ranking, the reader can extract absolute difference by reading the lengths of the bars.

This is a close-up of the bottom of the prison population metric:

Econ_useexcept_prisonpop_bottomThe length of each bar displays the numeric data. The red line is an outlier in this dataset. Black Americans suffer an incarceration rate that is almost three times the national average. Even white Americans (blue line) is imprisoned at a rate higher than most countries around the world.

As noted above, the prison population metric exhibits the smallest gap between racial subgroups. This chart is a great example of why ranking data frequently hide important information. The small gap in ranking masks the extraordinary absolute difference in incareration rates between white and black America.

The difference between rank #1 and rank #2 is enormous.

Econ_useexcept_lifeexpect_topThe opposite situation appears for life expectancy. The life expectancy values are bunched up especially at the top of the scale. The absolute difference between Hispanic and black America is 82 - 75 = 7 years, which looks small because the axis starts at zero. On a ranking scale, Hispanic is roughly in the top 15% while black America is just above the median. The relative difference is huge.

For life expectancy, ranking conveys the view that even a 7-year difference is a big deal because the countries are tightly bunched together. For prison population, ranking shows the view that a multiple fold difference is "unimportant" because a 20-0 blowout and a 10-0 blowout are both heavy defeats.


Whenever you transform numeric data to ranks, remember that you are artificially treating the gap between each value and the next value as a constant, even when the underlying numeric gaps show wide variance.






Stumped by the ATM

The neighborhood bank recently installed brand new ATMs, with tablet monitors and all that jazz. Then, I found myself staring at this screen:


I wanted to withdraw $100. I ordinarily love this banknote picker because I can get the $5, $10, $20 notes, instead of $50 and $100 that come out the slot when I don't specify my preference.

Something changed this time. I find myself wondering which row represents which note. For my non-U.S. readers, you may not know that all our notes are the same size and color. The screen resolution wasn't great and I had to squint really hard to see the numbers of those banknote images.

I suppose if I grew up here, I might be able to tell the note values from the figureheads. This is an example of a visualization that makes my life harder!

I imagine that the software developer might be a foreigner. I imagine the developer might live in Europe. In this case, the developer might have this image in his/her head:


Euro banknotes are heavily differentiated - by color, by image, by height and by width. The numeric value also occupies a larger proportion of the area. This makes a lot of sense.

I like designs to be adaptable. Switching data from one country to another should not alter the design. Switching data at different time scales should not affect the design. This banknote picker UI is not adaptable across countries.


Once I figured out the note values, I learned another reason why I couldn't tell which row is which note. It's because one note is absent.


Where is the $10 note? That and the twenty are probably the most frequently used. I am also surprised people want $1 notes from an ATM. But I assume the bank knows something I don't.

Same data + same chart form = same story. Maybe.

We love charts that tell stories.

Some people believe that if they situate the data in the right chart form, the stories reveal themselves.

Some people believe for a given dataset, there exists a best chart form that brings out the story.

An implication of these beliefs is that the story is immutable, given the dataset and the chart form.

If you use the Trifecta Checkup, you already know I don't subscribe to those ideas. That's why the Trifecta has three legs, the third is the question - which is related to the message or the story.


I came across the following chart by Statista, illustrating the growth in Covid-19 cases from the start of the pandemic to this month. The underlying data are collected by WHO and cover the entire globe. The data are grouped by regions.


The story of this chart appears to be that the world moves in lock step, with each region behaving more or less the same.

If you visit the WHO site, they show a similar chart:


On this chart, the regions at the bottom of the graph (esp. Southeast Asia in purple) clearly do not follow the same time patterns as Americas (orange) or Europe (green).

What we're witnessing is: same data, same chart form, different stories.

This is a feature, not a bug, of the stacked area chart. The story is driven largely by the order in which the pieces are stacked. In the Statista chart, the largest pieces are placed at the bottom while for WHO, the order is exactly reversed.

(There are minor differences which do not affect my argument. The WHO chart omits the "Other" category which accounts for very little. Also, the Statista chart shows the smoothed data using 7-day averaging.)

In this example, the order chosen by WHO preserves the story while the order chosen by Statista wipes it out.


What might be the underlying question of someone who makes this graph? Perhaps it is to identify the relative prevalence of Covid-19 in different regions at different stages of the pandemic.

Emphasis on the word "relative". Instead of plotting absolute number of cases, I consider plotting relative number of cases, that is to say, the proportion of cases in each region at given times.

This leads to a stacked area percentage chart.


In this side-by-side view, you see that this form is not affected by flipping the order of the regions. Both charts say the same thing: that there were two waves in Europe and the Americas that dwarfed all other regions.



Dreamy Hawaii

I really enjoyed this visual story by ProPublica and Honolulu Star-Advertiser about the plight of beaches in Hawaii (link).

The story begins with a beautiful invitation:


This design reminds me of Vimeo's old home page. (It no longer looks like this today but this screenshot came from when I was the data guy there.) In both cases, the images are not static but moving.


The tour de force of this visual story is an annotated walk along the Lanikai Beach. Here is a snapshot at one of the stops:


This shows a particular homeowner who, according to documents, was permitted to rebuild a destroyed seawall even though officials were supposed to disallow reconstruction in order to protect beaches from eroding. The property is marked on the map above. The image inside the box is a gif showing waves smashing the seawall.

As the reader scrolls down, the image window runs through a carousel of gifs of houses along the beach. The images are synchronized to the reader's progress along the shore. The narrative makes stops at specific houses at which point a text box pops up to provide color commentary.


The erosion crisis is shown in this pair of maps.


There's some fancy work behind the scenes to patch together images, and estimate the boundaries of th beaches.


The following map is notable for its simplicity. There are no unnecessary details and labels. We don't need to know the name of every street or a specific restaurant. Removing excess details makes readers focus on the informative parts. 


Clicking on the dots brings up more details.


Enjoy the entire story here.

Convincing charts showing containment measures work

The disorganized nature of U.S.'s response to the coronavirus pandemic has created a sort of natural experiment that allows data journalists to explore important scientific questions, such as the impact of containment measures on cases and hospitalizations. This New York Times article represents the best of such work.

The key finding of the analysis is beautifully captured by this set of scatter plots:


Each dot is a state. The cases (left plot) and hospitalizations (right plot) are plotted against the severity of containment measures for November. The negative correlation is unmistakable: the more containment measures taken, the lower the counts.

There are a few features worth noting.

The severity index came from a group at Oxford, and is a number between 0 and 100. The journalists decided to leave out the numerical labels, instead simply showing More and Fewer. This significantly reduces processing time. Readers won't be able to understand the index values anyway without reading the manual.

The index values are doubly encoded. They are first encoded by the location on the horizontal axis and redundantly encoded on the blue-red scale. Ordinarily, I do not like redundant encoding because the reader might assume a third dimension exists. In this case, I had no trouble with it.

The easiest way to see the effect is to ignore the muddy middle and focus on the two ends of the severity index. Those states with the fewest measures - South Dakota, North Dakota, Iowa - are the worst in cases and hospitalizations while those states with the most measures - New York, Hawaii - are among the best. This comparison is similar to what is frequently done in scientific studies, e.g. when they say coffee is good for you, they typically compare heavy drinkers (4 or more cups a day) with non-drinkers, ignoring the moderate and light drinkers.

Notably, there is quite a bit of variability for any level of containment measures - roughly 50 cases per 100,000, and 25 hospitalizations per 100,000. This indicates that containment measures are not sufficient to explain the counts. For example, the hospitalization statistic is affected by the stock of hospital beds, which I assume differ by state.

Whenever we use a scatter plot, we run the risk of xyopia. This chart form invites readers to explain an outcome (y-axis values) using one explanatory variable (on x-axis). There is an assumption that all other variables are unimportant, which is usually false.


Because of the variability, the horizontal scale has meaningless precision. The next chart cures this by grouping the states into three categories: low, medium and high level of measures.


This set of charts extends the time window back to March 1. For the designer, this creates a tricky problem - because states adapt their policies over time. As indicated in the subtitle, the grouping is based on the average severity index since March, rather than just November, as in the scatter plots above.


The interplay between policy and health indicators is captured by connected scatter plots, of which the Times article included a few examples. Here is what happened in New York:


Up until April, the policies were catching up with the cases. The policies tightened even after the case-per-capita started falling. Then, policies eased a little, and cases started to spike again.

The Note tells us that the containment severity index is time shifted to reflect a two-week lag in effect. So, the case count on May 1 is not paired with the containment severity index of May 1 but of April 15.


You can find the full article here.




Why you should expunge the defaults from Excel or (insert your favorite graphing program)

Yesterday, I posted the following chart in the post about Cornell's Covid-19 case rate after re-opening for in-person instruction.


This is an edited version of the chart used in Peter Frazier's presentation.


The original chart carries with it the burden of Excel defaults.

What did I change and why?

I switched away from the default color scheme, which ignores the relationships between the two lines. In particular, the key comparison on this chart should be the actual case rate versus the nominal case rate. In addition, the three lines at the top are related as they all come from the same underlying mathematical model. I used the same color but different shades.

Also, instead of placing the legend as far away from the data labels as possible, I moved the line labels next to the data labels.

Instead of daily date labels, I moved to weekly labels, and set the month names on a separate level than the day names.

The dots were removed from the top three lines but I'd have retained them, perhaps with some level of transparency, if I spent more time making the edits. I'd definitely keep the last dot to make it clear that the blue lines contain one extra dot.


Every graphing program has defaults, typically computed by some algorithm tuned to the average chart. Don't settle for the average chart. Get rid of any default setting that slows down understanding.