To explain or to eliminate, that is the question

Today, I take a look at another project from Ray Vella's class at NYU.

Rich Get Richer Assigment 2 top

(The above image is a honeypot for "smart" algorithms that don't know how to handle image dimensions which don't fit their shadow "requirement". Human beings should proceed to the full image below.)

As explained in this post, the students visualized data about regional average incomes in a selection of countries. It turns out that remarkable differences persist in regional income disparity between countries, almost all of which are more advanced economies.

Rich Get Richer Assigment 2 Danielle Curran_1

The graphic is by Danielle Curran.

I noticed two smart decisions.

First, she came up with a different main metric for gauging regional disparity, landing on a metric that is simple to grasp.

Based on hints given on the chart, I surmised that Danielle computed the change in per-capita income in the richest and poorest regions separately for each country between 2000 and 2015. These regional income growth values are expressed in currency, not indiced. Then, she computed the ratio of these growth rates, for each country. The end result is a simple metric for each country that describes how fast income has been growing in the richest region relative to the poorest region.

One of the challenges of this dataset is the complex indexing scheme (discussed here). Carlos' solution keeps the indices but uses design to facilitate comparisons. Danielle avoids the indices altogether.

The reader is relieved of the need to make comparisons, and so can focus on differences in magnitude. We see clearly that regional disparity is by far the highest in the U.K.

***

The second smart decision Danielle made is organizing the countries into clusters. She took advantage of the horizontal axis which does not encode any data. The branching structure places different clusters of countries along the axis, making it simple to navigate. The locations of these clusters are cleverly aligned to the map below.

***

Danielle's effort is stronger on communications while Carlos' effort provides more information. The key is to understand who your readers are. What proportion of your readers would want to know the values for each country, each region and each year?

***

A couple of suggestions

a) The reference line should be set at 1, not 0, for a ratio scale. The value of 1 happens when the richest region and the poorest region have identical per-capita incomes.

b) The vertical scale should be fixed.


Displaying convoluted indices

I reviewed another batch of projects from Ray Vella's class at NYU. The following piece by Carlos Lasso made an impression on me. There are no pyrotechnics but he made one decision that added a lot of clarity to the graphic.

The Rich get Richer - Carlos Lasso

The underlying dataset gauges the income disparity of regions within nine countries. The richest and the poorest regions are selected for each country. Two time points are shown. Altogether, there are 9x2x2 = 36 numbers.

***

Let's take a deeper look at these numbers. Notice they are not in dollars, or any kind of currency, despite being about incomes. The numbers are index values, relative to 100. What does the reference level of 100 represent?

The value of 100 crosses every bar of the chart so that 100 has meaning in each country and each year. In fact, there are 18 definitions of 100 in this chart with 36 numbers, one for each country-year pair. The average national income is set to 100 for each country in each year. This is a highly convoluted indexing strategy.

The following chart is a re-visualization of the bottom part of Carlos' infographic.

Junkcharts_richricher2021_2columns

I shifted the scale of the horizontal axis. The value of zero does not hold special meaning in Carlos' chart. I subtracted 100 from the relative regional income indices, thus all regions with income above the average have positive values while those below the national average have negative values. (There are other challenges with the ratio scale, which I'll skip over in this post. The minimum value is -100 while the maximum value can be very large.)

The rescaling is not really the point of this post. To see what Carlos did, we have to look at the example shown in class. The graphic which the students were asked to improve has the following structure:

Junkcharts_richricher2021_1column

This one-column structure places four bars beside each country, grouped by year. Carlos pulled the year dimension out, and showed the same dataset in two columns.

This small change makes a great difference in ease of comprehension. Carlos' version unpacks the two key types of comparisons one might want to make: trend within a given country (horizontal comparison) and contrast between countries in a given year (vertical comparison).

***

I always try to avoid convoluted indexing. The cost of using such indices is the big how-to-read-this box.


Asymmetry and orientation

An author in Significance claims that a single season of Premier League football without live spectators is enough to prove that the so-called home field advantage is really a live-spectator advantage.

The following chart depicts the data going back many seasons:

Significance_premierleaguehomeadvantage_chart_2

I find this bar chart challenging.

It plots the ratio of home wins to away wins using an odds scale, which is not intuitive. The odds scale (probability of success divided by probability of failure) runs from 0 to positive infinity, with 1 being a special value indicating equal odds. But all the values for which away wins exceed home wins are squeezed into the interval between 0 and 1 while the values for which home wins exceed away wins are laid out between 1 and infinity. So it's an inherently asymmetric graphic for a symmetric formula.

The section labeled "more away wins than home wins" are filled with red bars for all those seasons with positive home field advantage while the most recent season, the outlier, has a shorter bar in that section than the rest.

Here's an alternative view:

Redo_significance_premierleaguehomeawaywins_2

I have incorporated dual axes here - but both axes are different only by scaling. There are 380 games in a Premier League season so the percentage scale is just a re-expression of the counts.

 

 


Tongue in cheek but a master stroke

Andrew jumped on the Benford bandwagon to do a tongue-in-cheek analysis of numbers in Hollywood movies (link). The key graphic is this:

Gelman_hollywood_benford_2-1024x683

Benford's Law is frequently invoked to prove (or disprove) fraud with numbers by examining the distribution of first digits. Andrew extracted movies that contain numbers in their names - mostly but not always sequences of movies with sequels. The above histogram (gray columns) are the number of movies with specific first digits. The red line is the expected number if Benford's Law holds. As typical of such analysis, the histogram is closely aligned with the red line, and therefore, he did not find any fraud. 

I'll blog about my reservations about Benford-style analysis on the book blog later - one quick point is: as with any statistical analysis, we should say there is no statistical evidence of fraud (more precisely, of the kind of fraud that can be discovered using Benford's Law), which is different from saying there is no fraud.

***

Andrew also showed a small-multiples chart that breaks up the above chart by movie groups. I excerpted the top left section of the chart below:

Gelman_smallmultiples_benford

The genius in this graphic is easily missed.

Notice that the red lines (which are the expected values if Benford Law holds) appear identical on every single plot. And then notice that the lines don't represent the same values.

It's great to have the red lines look the same everywhere because they represent the immutable Benford reference. Because the number of movies is so small, he's plotting counts instead of proportions. If you let the software decide on the best y-axis range for each plot, the red lines will look different on different charts!

You can find the trick in the R code from Gelman's blog.

First, the maximum value of each plot is set to the total number of observations. Then, the expected Benford proportions are converted into expected Benford counts. The first Benford count is then shown against an axis topping out at the total count, and thus, relatively, what we are seeing are the Benford proportions. Thus, every red line looks the same despite holding different values.

This is a master stroke.

 

 

 


Working hard at clarity

As I am preparing another blog post about the pandemic, I came across the following data graphic, recently produced by the CDC for a vaccine advisory board meeting:

CDC_positivevaccineintent

This is not an example of effective visual communications.

***

For one thing, readers are directed to scour the footnotes to figure out what's going on. If we ignore those for the moment, we see clusters of bubbles that have remained pretty stable from December 2020 to August 2021. The data concern some measure of Americans' intent to take the COVID-19 vaccine. That much we know.

There may have been a bit of an upward trend between January and May, although if you were shown the clusters for December, February and April, you'd think the trend's been pretty flat. 

***

But those colors? What could they represent? You'd surely have to fish this one out of the footnotes. Specifically, this obtuse sentence: "Surveys with multiple time points are shown with the same color bubble for each time point." I had to read it several times. I think it simply means "Color represents the pollster." 

Then it adds: "Surveys with only one time point are shown in gray." which simply means "All pollsters who have only one entry in the dataset are grouped together and shown in gray."

Another problem with this chart is over-plotting. Look at the July cluster. It's impossible to tell how many polls were conducted in July because the circles pile on top of one another. 

***

The appearance of the flat trend is a result of two unfortunate decisions made by the designer. If I retained the chart form, I'd have produced something that looks like this:

Junkcharts_redo_cdcvaccineintent_sameform

The first design choice is to expand the vertical axis to range from 0% to 100%. This effectively squeezes all the bubbles into a small range.

Junkcharts_redo_cdcvaccineintent_startatzero

The second design choice is to enlarge the bubbles causing copious amount of overlapping. 

Junkcharts_redo_cdcvaccineintent_startatzero_bigdots

In particular, this decision blows up the Pew poll (big pink bubble) that contained 10 times the sample size of most of the other polls. The Pew outcome actually came in at 70% but the top of the pink bubble extends to over 80%. Because of this, the outlier poll of December 2020 - which surprisingly printed the highest number of all polls in the entire time window - no longer looks special. 

***

Now, let's see what else we can do to enhance this chart. 

I don't like how bubble size is used to encode the sample size. It creates a weird sensation for anyone who's familiar with sampling errors, and confidence regions. The Pew poll with 10 times the sample size is the most reliable poll of them all. Reliability means the error bars around the Pew poll outcome is the smallest of them all. I tend to think of the area around a point estimate as showing the sampling error so the Pew poll would be a dot, showing the high precision of that estimate. 

But that won't work because larger bubbles catch more of the reader's attention. So, in the following version, all dots have the same size. I encode reliability in the opacity of the color. The darker dots are polls that are more reliable, that have larger sample sizes.

Junkcharts_redo_cdcvaccineintent_opacity

Two of the pollsters have more frequent polling than others. In this next version, I highlighted those two, which reveals the trend better.

Junkcharts_redo_cdcvaccineintent_opacitywithlines

 

 

 


Check your presumptions while you're reading this chart about Israel's vaccination campaign

On July 30, Israel began administering third doses of mRNA vaccines to targeted groups of people. This decision was controversial since there is no science to support it. The policymakers do have educated guesses by experts based on best-available information. By science, I mean actual evidence. Since no one has previously been given three shots, there can be no data on which anyone can root such a decision. Nevertheless, the pandemic does not always give us time to collect relevant data, and so speculative analysis has found its calling.

Dvir Aran, at Technion, has been diligently tracking the situation in Israel on his Twitter. Ten days after July 30, he posted the following chart, which immediately led many commentators to bounce out of their seats crowning the third shot as a magic bullet. Notably, Dvir himself did not endorse such a claim. (See here to learn how other hasty conclusions by experts have fared.)

When you look at Dvir's chart, what do we see?

Dvir_aran_chart

Possibly one of the following two things, depending on what concern you have in your head.

1) The red line sits far above the other two lines, showing that unvaccinated people are much more likely to get infected.

2) The blue line diverges from the green line almost immediately after the 3rd shots started getting into arms, showing that the 3rd shot is super effective.

If you take another moment to look, you might start asking questions, as many in Twitter world did. Dvir was startlingly efficient at answering these queries.

A) Does the green line represent people with 2 or 3 doses, or is it strictly 2 doses? Aron asked this question and got the answer (the former):

AronBrand_israelcases_twoorthreedoses

It's time to check our presumptions. When you read that chart, did you presume it's exactly 2 doses or did you presume it's 2 or 3 doses? Or did you immediately spot the ambiguity? As I said in this article, graphs attain efficiency at communication because the designer leverages unspoken rules - the chart conveys certain information without explicitly placing it on the chart. But this can backfire. In this case, I presumed the three lines to display three non-overlapping groups of people, and thus the green line indicates those with 2 doses but not 3. That presumption led me to misinterpret what's on the chart.

B) What is the denominator of the case rates? Is it literal - by that I mean, all unvaccinated people for the red line, and all people with 3 doses for the blue line? Or is the denominator the population of Israel, the same number for all three lines? Lukas asked this question, and got the answer (the former).

Lukas_denominator

C) Since third shots are recommended for 60 year olds and over who were vaccinated at least 5 months ago, and most unvaccinated Israelis are below 60, this answer opens the possibility that the lines compare apples and oranges. Joe. S. asked about this, and received an answer (all lines display only 60 year olds and over.)

Joescholar_basepopulationquestion

Jason P. asked, and learned that the 5-month-out criterion is immaterial since 90% of the vaccinated have already reached that time point.

JasonPogue_5monthsout

D) We have even more presumptions. Like me, did you presume that the red line represents the "unvaccinated," meaning people who have not had any vaccine shots? If so, we may both be wrong about this. It has become the norm by vaccine researchers to lump "partially vaccinated" people with "unvaccinated", and call this combined group "unvaccinated". Here is an excerpt from a recent report from Public Health Ontario (link to PDF), which clearly states this unintuitive counting rule:

Ontario_case_definition

Notice that in this definition, someone who got infected within 14 days of the first shot is classified as an "unvaccinated" case and not a "partially vaccinated case".

In the following tweet, Dvir gave a hint of what he plotted:

Dvir_group_definition

In a previous analysis, he averaged the rates of people with 0 doses and 1 dose, which is equivalent to combining them and calling them unvaccinated. It's unclear to me what he did to the 1-dose subgroup in our featured chart - did it just vanish from the chart? (How people and cases are classified into these groups is a major factor in all vaccine effectiveness calculations - a topic I covered here. Unfortunately, most published reports do a poor job explaining what the analysts did).

E) Did you presume that all three lines are equally important? That's far from true. Since Israel is the world champion in vaccination, the bulk of the 60+ population form the green line. I asked Dvir and he responded that only 7.5%, or roughly 100K are unvaccinated.

DvirAran_proportionofunvaccinated

That means 1.2 million people are part of the green line, 12 times higher. There are roughly 50 cases per day among unvaccinated, and 370 daily cases among those with 2 or 3 doses. In other words, vaccinated people account for almost 90% of all cases.

Yes, this is inevitable when over 90% of the age group have been vaccinated (but it is predictable on the first day someone blasted everywhere that real-world VE is proved by the fact that almost all new cases were in the unvaccinated.)

If your job is to minimize infections, you should be spending most of your time thinking about the 370 cases among vaccinated than the 50 cases among unvaccinated. If you halve the case rate, that would be a difference of 185 cases vs 25. In Israel, the vaccination campaign has already succeeded; it's time to look forward, which is exactly why they are re-focusing on the already vaccinated.

***

If what you worry about most is the effectiveness of the original two-dose regimen, Dvir's chart raises a puzzle. Ignore the blue line, and remember that the green line already includes everybody represented by the blue line.

In the following chart, I removed the blue line, and added reference lines in dashed purple that correspond to 25%, 50% and 75% vaccine effectiveness. The data plotted on this chart are unadjusted case rates. A 75% effective vaccine cuts case rate by three quarters.

Junkcharts_dviraran_israel_threeshotschart

This chart shows the 2-dose mRNA vaccine was nowhere near 90% effective. (As regular readers know, I don't endorse this simplistic calculation and have outlined the problems here, but this style of calculation keeps getting published and passed around. Those who use it to claim real-world studies confirm prior clinical trial outcomes can either (a) insist on using it and retract their earlier conclusions, or (b) admit that such a calculation was, and is, a bad take.)

Also observe how the vaccinated (green) line is moving away from the unvaccinated (red) line. The vaccine apparently is becoming more effective, which runs counter to the trend used by the Israeli government to justify third doses. This improvement also precedes the start of the third-shot campaign. When the analytical method is bad, it generates all sorts of spurious findings.

***

As Dvir said, it is premature to comment on the third doses based on 10 days of data. For one thing, the vaccine developers insist that their vaccines must be given 14 days to work. In a typical calculation, all of the cases in the blue line fall outside the case-counting window. The effective number of cases that would be attributed to the 3-dose group right now is zero, and the vaccine effectiveness using the standard methodology is 100%, even better than shown in the chart.

There is an alternative interpretation of this graph. Statisticians call this the selection effect. On July 30, the blue line split out of the green: some people were selected to receive the 3rd dose - this includes an official selection (the government makes certain subgroups eligible) as well as a self-selection (within the eligible subgroup, certain people decide to get the 3rd shot earlier.) If those who are less exposed to the virus, or more risk averse, get the shots first, then all that is happening may be that we have split off a high VE subgroup from the green line. Even if the third shot were useless, the selection effect itself could explain the gap.

Statistics is about grays. It's not either-or. It's usually some of each. If you feel like Groundhog Day, you're getting the picture. When they rolled out two doses, we lived through an optimistic period in which most experts rejoiced about 90-100% real-world effectiveness, and then as more people get vaccinated, the effect washed away. The selection effect gradually disappears when vaccination becomes widespread. Are we starting a new cycle of hope and despair? We'll find out soon enough.


Hanging things on your charts

The Financial Times published the following chart that shows the rollout of vaccines in the U.K.

Ft_astrazeneca_uk_rollout

(I can't find the online link to the article. The article is titled "AstraZeneca and Oxford face setbacks and success as battle enters next phase", May 29/30 2021.)

This chart form is known as a "streamgraph", and it is a stacked area chart in disguise. 

The same trick can be applied to a column chart. See the "hanging" column chart below:

Junkcharts_hangingcolumns

The two charts show exactly the same data. The left one roots the columns at the bottom. The right one aligns the middle of the columns. 

I have rarely found these hanging charts useful. The realignment makes it harder to compare the sizes of the different column segments. On the normal stacked column chart, the yellow segments are the easiest to compare because they share the same base level. Even this is taken away from the reader on the right side.

Note also that the hanging version does not admit a vertical axis

The same comments apply to the streamgraph.

***

Nevertheless, I was surprised that the FT chart shown above actually works. The main message I learned was that initially U.K. primarily rolled out AstraZeneca and, to a lesser extent, Pfizer, shots while later, they introduced other vaccines, including Johnson & Johnson, Novavax, CureVac, Moderna, and "Other". 

I can also see that the supply of AstraZeneca has not changed much through the entire time window. Pfizer has grown to roughly the same scale as AstraZeneca. Moderna remains a small fraction of total shots. 

I can even roughly see that the total number of vaccinations has grown about six times from start to finish. 

That's quite a lot for one chart, so job well done!

There is one problem with the FT chart. It should have labelled end of May as "today". Half the chart is history, and the other half is the future.

***

For those following Covid-19 news, the FT chart is informative in a different way.

There is a misleading statement going around blaming the U.K.'s recent surge in cases on the Astrazeneca vaccine, claiming that the U.K. mostly uses AZ. This chart shows that from the start, about a third of the shots administered in the U.K. are Pfizer, and Pfizer's share has been growing over time. 

U.K. compared to some countries mostly using mRNA vaccines

Ourworldindata_cases

U.K. is almost back to the winter peak. That's because the U.K. is serious about counting cases. Look at the state of testing in these countries:

Ourworldindata_tests

What's clear about the U.S. case count is that it is kept low by cutting the number of tests by two-thirds, thus, our data now is once again severely biased towards severe cases. 

We can do a back-of-the-envelope calculation. The drop in testing may directly lead to a proportional drop in reported cases, thus removing 500 (asymptomatic, or mild) cases per million from the case count. The case count goes below 250 per million so the additional 200 or so reduction is due to other reasons such as vaccinations.


Start at zero improves this chart but only slightly

The following chart was forwarded to me recently:

Average_female_height

It's a good illustration of why the "start at zero" rule exists for column charts. The poor Indian lady looks extremely short in this women's club. Is the average Indian woman really half as tall as the average South African woman? (Surely not!)

Junkcharts_redo_womenheight_columnThe problem is only superficially fixed by starting the vertical axis at zero. Doing so highlights the fact that the difference in average heights is but a fraction of the average heights themselves. The intra-country differences are squashed in such a representation - which works against the primary goal of the data visualization itself.

Recall the Trifecta Checkup. At the top of the trifecta is the Question. The designer obviously wants to focus our attention on the difference of the averages. A column chart showing average heights fails the job!

This "proper" column chart sends the message that the difference in average heights is noise, unworthy of our attention. But this is a bad take of the underlying data. The range of average heights across countries isn't that wide, by virtue of large population sizes.

According to Wikipedia, they range from 4 feet 10.5 to 5 feet 6 (I'm ignoring several entries in the table based on non representative small samples.) How do we know that the difference of 2 inches between averages of South Africa and India is actually a sizable difference? The Wikipedia table has the average heights for most of the world's countries. There are perhaps 200 values. These values are sprinkled inside the range of about 8 inches top to bottom. If we divide the full range into 10 equal bins, that's roughly 0.8 inches per bin. So if we have two numbers that are 2 inches apart, they almost span 2 bins. If the data were evenly distributed, that's a huge shift.

(In reality, the data should be normally distributed, bell-shaped, with much more at the center than on the edges. That makes a difference of 2 inches even more significant if these are normal values near the center but less significant if these are extreme values on the tails. Stats students should be able to articulate why we are sure the data are normally distributed without having to plot the data.)

***

The original chart has further problems.

Another source of distortion comes from the scaling of the stick figures. The aspect ratio is being preserved, which means the area is being scaled. Given that the heights are scaled as per the data, the data are encoded twice, the second time in the widths. This means that the sizes of these figures grow at the rate of the square of the heights. (Contrast this with the scaling discussed in my earlier post this week which preserves the relative areas.)

At the end of that last post, I discuss why adding colors to a chart when the colors do not encode any data is a distraction to the reader. And this average height chart is an example.

From the Data corner of the Trifecta Checkup, I'm intrigued by the choice of countries. Why is Scotland highlighted instead of the U.K.? Why Latvia? According to Wikipedia, the Latvia estimate is based on a 1% sample of only 19 year olds.

Some of the data appear to be incorrect (or the designer used a different data source). Wikipedia lists the average height of Latvian women as 5 ft 6.5 while the chart shows 5 ft 5 in. Peru's average height of females is listed as 4 ft 11.5 and of males as 5 ft 4.5. The chart shows 5 ft 4 in.

***

Lest we think only amateurs make this type of chart, here is an example of a similar chart in a scientific research journal:

Fnhum-14-00338-g007

(link to original)

I have seen many versions of the above column charts with error bars, and the vertical axes not starting at zero. In every case, the heights (and areas) of these columns do not scale with the underlying data.

***

I tried a variant of the stem-and-leaf plot:

Junkcharts_redo_womenheight_stemleaf

The scale is chosen to reflect the full range of average heights given in Wikipedia. The chart works better with more countries to fill out the distribution. It shows India is on the short end of the scale but not quite the lowest. (As mentioned above, Peru actually should be placed close to the lower edge.)

 


Reading this chart won't take as long as withdrawing troops from Afghanistan

Art sent me the following Economist chart, noting how hard it is to understand. I took a look, and agreed. It's an example of a visual representation that takes more time to comprehend than the underlying data.

Econ_theendisnear

The chart presents responses to 3 questions on a survey. For each question, the choices are Approve, Disapprove, and "Neither" (just picking a word since I haven't seen the actual survey question). The overall approval/disapproval rates are presented, and then broken into two subgroups (Democrats and Republicans).

The first hurdle is reading the scale. Because the section from 75% to 100% has been removed, we are left with labels 0, 25, 50, 75, which do not say percentages unless we've consumed the title and subtitle. The Economist style guide places the units of data in the subtitle instead of on
the axis itself.

Our attention is drawn to the thick lines, which represent the differences between approval and disapproval rates. These differences are signed: it matters whether the proportion approving is higher or lower than the proportion disapproving. This means the data are encoded in the order of the dots plus the length of the line segment between them.

The two bottom rows of the Afghanistan question demonstrates this mental challenge. Our brains have to process the following visual cues:

1) the two lines are about the same lengths

2) the Republican dots are shifted to the right by a little

3) the colors of the dots are flipped

What do they all mean?

Econ_theendofforever_subset

A chart runs in trouble when you need a paragraph to explain how to read it.

It's sometimes alright to make complicated data visualization that illustrates complicated concepts. What justifies it is the payoff. I wrote about the concept of return on effort in data visualization here.

The payoff for this chart escaped me. Take the Democratic response to troop withdrawal. About 3/4 of Democrats approve while 15% disapprove. The thick line says 60% more Democrats approve than disapprove.

***

Here, I show the full axis, and add a 50% reference line

Junkcharts_redo_econ_theendofforever_1

Small edits but they help visualize "half of", "three quarters of".

***

Next, I switch to the more conventional stacked bars.

Junkcharts_redo_econ_theendofforever_stackedbars

This format reveals some of the hidden data on the chart - the proportion answering neither approve/disapprove, and neither yes/no.

On the stacked bars visual, the proportions are counted from both ends while in the dot plot above, the proportions are measured from the left end only.

***

Read all my posts about Economist charts here

 


The time has arrived for cumulative charts

Long-time reader Scott S. asked me about this Washington Post chart that shows the disappearance of pediatric flu deaths in the U.S. this season:

Washingtonpost_pediatricfludeaths

The dataset behind this chart is highly favorable to the designer, because the signal in the data is so strong. This is a good chart. The key point is shown clearly right at the top, with an informative title. Gridlines are very restrained. I'd draw attention to the horizontal axis. The master stroke here is omitting the week labels, which are likely confusing to all but the people familiar with this dataset.

Scott suggested using a line chart. I agree. And especially if we plot cumulative counts, rather than weekly deaths. Here's a quick sketch of such a chart:

Junkcharts_redo_wppedflu_panel

(On second thought, I'd remove the week numbers from the horizontal axis, and just go with the month labels. The Washington Post designer is right in realizing that those week numbers are meaningless to most readers.)

The vaccine trials have brought this cumulative count chart form to the mainstream. For anyone who have seen the vaccine efficacy charts, the interpretation of the panel of line charts should come naturally.

Instead of four plots, I prefer one plot with four superimposed lines. Like this:

Junkcharts_redo_wppeddeaths_superpose2