The blue mist

The New York Times printed several charts about Twitter "blue checks," and they aren't one of their best efforts (link).

Blue checks used to be credentials given to legitimate accounts, typically associated with media outlets, celebrities, brands, professors, etc. They are free but must be approved by Twitter. Since Elon Musk acquired Twitter, he turned blue checks into a revenue generator. Yet another subscription service (but you're buying "freedom"!). Anyone can get a blue check for US$8 per month.

[The charts shown here are scanned from the printed edition.]

Nyt_twitterblue_chart1

The first chart is a scatter plot showing the day of joining Twitter and the total number of followers the account has as of early November, 2022. Those are very strange things to pair up on a scatter plot but I get it: the designer could only work with the data that can be pulled down from Twitter's API.

What's wrong with the data? It would seem the interesting question is whether blue checks are associated with number of followers. The chart shows only Twitter Blue users so there is nothing to compare to. The day of joining Twitter is not the day of becoming "Twitter Blue", almost surely not for any user (Nevetheless, the former is not a standard data element released by Twitter). The chart has a built-in time bias since the longer an account exists, one would assume the higher the number of followers (assuming all else equal). Some kind of follower rate (e.g. number of followers per year of existence) might be more informative.

Still, it's hard to know what the chart is saying. That most Blue accounts have fewer than 5,000 followers? I also suspect that they chopped off the top of the chart (outliers) and forgot to mention it. Surely, some of the celebrity accounts have way over 150,000 followers. Another sign that the top of the chart was removed is that an expected funnel effect is not seen. Given the follower count is cumulative from the day of registration, we'd expect the accounts that started in the last few months should have markedly lower counts than those created years ago. (This is even more true if there is a survivorship bias - less successful accounts are more likely to be deleted over time.)

The designer arbitrarily labelled six specific accounts ("Crypto influencer", "HBO fan", etc.) but this feature risks sending readers the wrong message. There might be one HBO fan account that quickly grew to 150,000 followers in just a few months but does the data label suggest to readers that HBO fan accounts as a group tend to quickly attain high number of followers?

***

The second chart, which is an inset of the first, attempts to quantify the effect of the Musk acquisition on the number of "registrations and subscriptions". In the first chart, the story was described as "Elon Musk buys Twitter sparking waves of new users who later sign up for Twitter Blue".

Nyt_twitterblue_chart2

The second chart confuses me. I was trying to figure out what is counted in the vertical axis. This was before I noticed the inset in the first chart, easy to miss as it is tucked into the lower right corner. I had presumed that the axis would be the same as in the first chart since there weren't any specific labels. In that case, I am looking at accounts with 0 to 500 followers, pretty inconsequential accounts. Then, the chart title uses the words "registrations and subscriptions." If the blue dots on this chart also refer to blue-check accounts as in the first chart, then I fail to see how this chart conveys any information about registrations (wbich presumably would include free accounts). As before, new accounts that aren't blue checks won't appear.

Further, to the extent that this chart shows a surge in subscriptions, we are restricted to accounts with fewer than 500 followers, and it's really unclear what proportion of total subscribers is depicted. Nor is it possible to estimate the magnitude of this surge.

Besides, I'm seeing similar densities of the dots across the entire time window between October 2021 and 2022. Perhaps the entire surge is hidden behind the black lines indicating the specific days when Musk announced and completed the acquisition, respectively. If the surge is hiding behind the black vertical lines, then this design manages to block the precise spots readers are supposed to notice.

Here is where we can use the self-sufficiency test. Imagine the same chart without the text. What story would you have learned from the graphical elements themselves? Not much, in my view.

***

The third chart isn't more insightful. This chart purportedly shows suspended accounts, only among blue-check accounts.

Nyt_twitterblue_chart3

From what I could gather (and what I know about Twitter's API), the chart shows any Twitter Blue account that got suspended at any time. For example, all the black open circles occurring prior to October 27, 2022 represent suspensions by the previous management, and presumably have nothing to do with Elon Musk, or his decision to turn blue checks into a subscription product.

There appears to be a cluster of suspensions since Musk took over. I am not sure what that means. Certainly, it says he's not about "total freedom". Most of these suspended accounts have fewer than 50 followers, and only been around for a few weeks. And as before, I'm not sure why the analyst decided to focus on accounts with fewer than 500 followers.

What could have been? Given the number of suspended accounts are relatively small, an interesting analysis would be to form clusters of suspended accounts, and report on the change in what types of accounts got suspended before and after the change of management.

***

The online article (link) is longer, filling in some details missing from the printed edition.

There is one view that shows the larger accounts:

Nyt_twitterblue_largestaccounts

While more complete, this view isn't very helpful as the biggest accounts are located in the sparsest area of the chart. The data labels again pick out strange accounts like those of adult film stars and an Arabic news site. It's not clear if the designer is trying to tell us that most of Twitter Blue accounts belong to those categories.

***
See here for commentary on other New York Times graphics.

 

 

 

 


Finding the right context to interpret household energy data

Bloomberg_energybillBloomberg's recent article on surging UK household energy costs, projected over this winter, contains data about which I have long been intrigued: how much energy does different household items consume?

A twitter follower alerted me to this chart, and she found it informative.

***
If the goal is to pick out the appliances and estimate the cost of running them, the chart serves its purpose. Because the entire set of data is printed, a data table would have done equally well.

I learned that the mobile phone costs almost nothing to charge: 1 pence for six hours of charging, which is deemed a "single use" which seems double what a full charge requires. The games console costs 14 pence for a "single use" of two hours. That might be an underestimate of how much time gamers spend gaming each day.

***

Understanding the design of the chart needs a bit more effort. Each appliance is measured by two metrics: the number of hours considered to be "single use", and a currency value.

It took me a while to figure out how to interpret these currency values. Each cost is associated with a single use, and the duration of a single use increases as we move down the list of appliances. Since the designer assumes a fixed cost of electicity (shown in the footnote as 34p per kWh), at first, it seems like the costs should just increase from top to bottom. That's not the case, though.

Something else is driving these numbers behind the scene, namely, the intensity of energy use by appliance. The wifi router listed at the bottom is turned on 24 hours a day, and the daily cost of running it is just 6p. Meanwhile, running the fridge and freezer the whole day costs 41p. Thus, the fridge&freezer consumes electricity at a rate that is almost 7 times higher than the router.

The chart uses a split axis, which artificially reduces the gap between 8 hours and 24 hours. Here is another look at the bottom of the chart:

Bloomberg_energycost_bottom

***

Let's examine the choice of "single use" as a common basis for comparing appliances. Consider this:

  • Continuous appliances (wifi router, refrigerator, etc.) are denoted as 24 hours, so a daily time window is also implied
  • Repeated-use appliances (e.g. coffee maker, kettle) may be run multiple times a day
  • Infrequent use appliances may be used less than once a day

I prefer standardizing to a "per day" metric. If I use the microwave three times a day, the daily cost is 3 x 3p = 9 p, which is more than I'd spend on the wifi router, run 24 hours. On the other hand, I use the washing machine once a week, so the frequency is 1/7, and the effective daily cost is 1/7 x 36 p = 5p, notably lower than using the microwave.

The choice of metric has key implications on the appearance of the chart. The bubble size encodes the relative energy costs. The biggest bubbles are in the heating category, which is no surprise. The next largest bubbles are tumble dryer, dishwasher, and electric oven. These are generally not used every day so the "per day" calculation would push them lower in rank.

***

Another noteworthy feature of the Bloomberg chart is the split legend. The colors divide appliances into five groups based on usage category (e.g. cleaning, food, utility). Instead of the usual color legend printed on a corner or side of the chart, the designer spreads the category labels around the chart. Each label is shown the first time a specific usage category appears on the chart. There is a presumption that the reader scans from top to bottom, which is probably true on average.

I like this arrangement as it delivers information to the reader when it's needed.

 

 

 


People flooded this chart presented without comment with lots of comments

The recent election in Italy has resulted in some dubious visual analytics. A reader sent me this Excel chart:

Italy_elections_RDC-M5S

In brief, an Italian politician (trained as a PhD economist) used the graph above to make a point that support of the populist Five Star party (M5S) is highly correlated with poverty - the number of people on RDC (basic income). "Senza commento" - no comment needed.

Except a lot of people noticed the idiocy of the chart, and ridiculed it.

The chart appeals to those readers who don't spend time understanding what's being plotted. They notice two lines that show similar "trends" which is a signal for high correlation.

It turns out the signal in the chart isn't found in the peaks and valleys of the "trends".  It is tempting to observe that when the blue line peaks (Campania, Sicilia, Lazio, Piedmonte, Lombardia), the orange line also pops.

But look at the vertical axis. He's plotting the number of people, rather than the proportion of people. Population varies widely between Italian provinces. The five mentioned above all have over 4 million residents, while the smaller ones such as Umbira, Molise, and Basilicata have under 1 million. Thus, so long as the number of people, not the proportion, is plotted, no matter what demographic metric is highlighted, we will see peaks in the most populous provinces.

***

The other issue with this line chart is that the "peaks" are completely contrived. That's because the items on the horizontal axis do not admit a natural order. This is NOT a time-series chart, for which there is a canonical order. The horizontal axis contains a set of provinces, which can be ordered in whatever way the designer wants.

The following shows how the appearance of the lines changes as I select different metrics by which to sort the provinces:

Redo_italianelections_m5srdc_1

This is the reason why many chart purists frown on people who use connected lines with categorical data. I don't like this hard rule, as my readers know. In this case, I have to agree the line chart is not appropriate.

***

So, where is the signal on the line chart? It's in the ratio of the heights of the two values for each province.

Redo_italianelections_m5srdc_2

Here, we find something counter-intuitive. I've highlighted two of the peaks. In Sicilia, about the same number of people voted for Five Star as there are people who receive basic income. In Lombardia, more than twice the number of people voted for Five Star as there are people who receive basic income. 

Now, Lombardy is where Milan is, essentially the richest province in Italy while Sicily is one of the poorest. Could it be that Five Star actually outperformed their demographics in the richer provinces?

***

Let's approach the politician's question systematically. He's trying to say that the Five Star moement appeals especially to poorer people. He's chosen basic income as a proxy for poverty (this is like people on welfare in the U.S.). Thus, he's divided the population into two groups: those on welfare, and those not.

What he needs is the relative proportions of votes for Five Star among these two subgroups. Say, Five Star garnered 30% of the votes among people on welfare, and 15% of the votes among people not on welfare, then we have a piece of evidence that Five Star differentially appeals to people on welfare. If the vote share is the same among these two subgroups, then Five Star's appeal does not vary with welfare.

The following diagram shows the analytical framework:

Redo_italianelections_m5srdc_3

What's the problem? He doesn't have the data needed to establish his thesis. He has the total number of Five Star voters (which is the sum of the two yellow boxes) and he has the total number of people on RDC (which is the dark orange box).

Redo_italianelections_m5srdc_4

As shown above, another intervening factor is the proportion of people who voted. It is conceivable that the propensity to vote also depends on one's wealth.

So, in this case, fixing the visual will not fix the problem. Finding better data is key.


Another reminder that aggregate trends hide information

The last time I looked at the U.S. employment situation, it was during the pandemic. The data revealed the deep flaws of the so-called "not in labor force" classification. This classification is used to dehumanize unemployed people who are declared "not in labor force," in which case they are neither employed nor unemployed -- just not counted at all in the official unemployment (or employment) statistics.

The reason given for such a designation was that some people just have no interest in working, or even looking for a job. Now they are not merely discouraged - as there is a category of those people. In theory, these people haven't been looking for a job for so long that they are no longer visible to the bean counters at the Bureau of Labor Statistics.

What happened when the pandemic precipitated a shutdown in many major cities across America? The number of "not in labor force" shot up instantly, literally within a few weeks. That makes a mockery of the reason for such a designation. See this post for more.

***

The data we saw last time was up to April, 2020. That's more than two years old.

So I have updated the charts to show what has happened in the last couple of years.

Here is the overall picture.

Junkcharts_unemployment_notinLFparttime_all_2

In this new version, I centered the chart at the 1990 data. The chart features two key drivers of the headline unemployment rate - the proportion of people designated "invisible", and the proportion of those who are considered "employed" who are "part-time" workers.

The last two recessions have caused structural changes to the labor market. From 1990 to late 2000s, which included the dot-com bust, these two metrics circulated within a small area of the chart. The Great Recession of late 2000s led to a huge jump in the proportion called "invisible". It also pushed the proportion of part-timers to all0time highs. The proportion of part-timers has fallen although it is hard to interpret from this chart alone - because if the newly invisible were previously part-time employed, then the same cause can be responsible for either trend.

_numbersense_bookcoverReaders of Numbersense (link) might be reminded of a trick used by school deans to pump up their US News rankings. Some schools accept lots of transfer students. This subpopulation is invisible to the US News statisticians since they do not factor into the rankings. The recent scandal at Columbia University also involves reclassifying students (see this post).

Zooming in on the last two years. It appears that the pandemic-related unemployment situation has reversed.

***

Let's split the data by gender.

American men have been stuck in a negative spiral since the 1990s. With each recession, a higher proportion of men are designated BLS invisibles.

Junkcharts_unemployment_notinLFparttime_men_2

In the grid system set up in this scatter plot, the top right corner is the worse of all worlds - the work force has shrunken and there are more part-timers among those counted as employed. The U.S. men are not exiting this quadrant any time soon.

***
What about the women?

Junkcharts_unemployment_notinLFparttime_women_2

If we compare 1990 with 2022, the story is not bad. The female work force is gradually reaching the same scale as in 1990 while the proportion of part-time workers have declined.

However, celebrating the above is to ignore the tremendous gains American women made in the 1990s and 2000s. In 1990, only 58% of women are considered part of the work force - the other 42% are not working but they are not counted as unemployed. By 2000, the female work force has expanded to include about 60% with similar proportions counted as part-time employed as in 1990. That's great news.

The Great Recession of the late 2000s changed that picture. Just like men, many women became invisible to BLS. The invisible proportion reached 44% in 2015 and have not returned to anywhere near the 2000 level. Fewer women are counted as part-time employed; as I said above, it's hard to tell whether this is because the women exiting the work force previously worked part-time.

***

The color of the dots in all charts are determined by the headline unemployment number. Blue represents low unemployment. During the 1990-2022 period, there are three moments in which unemployment is reported as 4 percent or lower. These charts are intended to show that an aggregate statistic hides a lot of information. The three times at which unemployment rate reached historic lows represent three very different situations, if one were to consider the sizes of the work force and the number of part-time workers.

 

P.S. [8-15-2022] Some more background about the visualization can be found in prior posts on the blog: here is the introduction, and here's one that breaks it down by race. Chapter 6 of Numbersense (link) gets into the details of how unemployment rate is computed, and the implications of the choices BLS made.

P.S. [8-16-2022] Corrected the axis title on the charts (see comment below). Also, added source of data label.


Visualizing the impossible

Note [July 6, 2022]: Typepad's image loader is broken yet again. There is no way for me to fix the images right now. They are not showing despite being loaded properly yesterday. I also cannot load new images. Apologies!

Note 2: Manually worked around the automated image loader.

Note 3: Thanks Glenn for letting me about the image loading problem. It turns out the comment approval function is also broken, so I am not able to approve the comment.

***

A twitter user sent me this chart:

twitter_greatreplacement

It's, hmm, mystifying. It performs magic, as I explain below.

What's the purpose of the gridlines and axis labels? Even if there is a rationale for printing those numbers, they make it harder, not easier, for readers to understand the chart!

I think the following chart shows the main message of this poll result. Democrats are much more likely to think of immigration as a positive compared to Republicans, with Independents situated in between.

Redo_greatreplacement

***

The axis title gives a hint as to what the chart designer was aiming for with the unconventional axis. It reads "Overall Percentage for All Participants". It appears that the total length of the stacked bar is the weighted aggregate response rate. Roughly 17% of Americans thought this development to be "very positive" which include 8% of Republicans, 27% of Democrats and 12% of Independents. Since the three segments are not equal in size, 17% is a weighted average of the three proportions.

Within each of the three political affiliations, the data labels add to 100%. These numbers therefore are unweighted response rates for each segment. (If weighted, they should add up to the proportion of each segment.)

This sets up an impossible math problem. The three segments within each bar then represent the sum of three proportions, each unweighted within its segment. Adding these unweighted proportions does not yield the desired weighted average response rate. To get the weighted average response rate, we need to sum the weighted segment response rates instead.

This impossible math problem somehow got resolved visually. We can see that each bar segment faithfully represent the unweighted response rates shown in the respective data labels. Summing them would not yield the aggregate response rates as shown on the axis title. The difference is not a simple multiplicative constant because each segment must be weighted by a different multiplier. So, your guess is as good as mine: what is the magic that makes the impossible possible?

[P.S. Another way to see this inconsistency. The sum of all the data labels is 300% because the proportions of each segment add up to 100%. At the same time, the axis title implies that the sum of the lengths of all five bars should be 100%. So, the chart asserts that 300% = 100%.]

***

This poll question is a perfect classroom fodder to discuss how wording of poll questions affects responses (something called "response bias"). Look at the following variants of the same questions. Are we likely to get answers consistent with the above question?

As you know, the demographic makeup of America is changing and becoming more diverse, while the U.S. Census estimates that white people will still be the largest race in approximately 25 years. Generally speaking, do you find these changes to be very positive, somewhat positive, somewhat negative or very negative?

***

As you know, the demographic makeup of America is changing and becoming more diverse, with the U.S. Census estimating that black people will still be a minority in approximately 25 years. Generally speaking, do you find these changes to be very positive, somewhat positive, somewhat negative or very negative?

***

As you know, the demographic makeup of America is changing and becoming more diverse, with the U.S. Census estimating that Hispanic, black, Asian and other non-white people together will be a majority in approximately 25 years. Generally speaking, do you find these changes to be very positive, somewhat positive, somewhat negative or very negative?

What is also amusing is that in the world described by the pollster in 25 years, every race will qualify as a "minority". There will be no longer majority since no race will constitute at least 50% of the U.S. population. So at that time, the word "minority" will  have lost meaning.


Selecting the right analysis plan is the first step to good dataviz

It's a new term, and my friend Ray Vella shared some student projects from his NYU class on infographics. There's always something to learn from these projects.

The starting point is a chart published in the Economist a few years ago.

Economist_richgetricher

This is a challenging chart to read. To save you the time, the following key points are pertinent:

a) income inequality is measured by the disparity between regional averages

b) the incomes are given in a double index, a relative measure. For each country and year combination, the average national GDP is set to 100. A value of 150 means the richest region of Spain has an average income that is 50% higher than Spain's national average in the year 2015.

The original chart - as well as most of the student work - is based on a specific analysis plan. The difference in the index values between the richest and poorest regions is used as a measure of the degree of income inequality, and the change in the difference in the index values over time, as a measure of change in the degree of income inequality over time. That's as big a mouthful as the bag of words sounds.

This analysis plan can be summarized as:

1) all incomes -> relative indices, at each region-year combination
2) inequality = rich - poor region gap, at each region-year combination
3) inequality over time = inequality in 2015 - inequality in 2000, for each country
4) country difference = inequality in country A - inequality in country B, for each year

***

One student, J. Harrington, looks at the data through an alternative lens that brings clarity to the underlying data. Harrington starts with change in income within the richest regions (then the poorest regions), so that a worsening income inequality should imply that the richest region is growing incomes at a faster clip than the poorest region.

This alternative analysis plan can be summarized as:
1) change in income over time for richest regions for each country
2) change in income over time for poorest regions for each country
3) inequality = change in income over time: rich - poor, for each country

The restructuring of the analysis plan makes a big difference!

Here is one way to show this alternative analysis:

Junkcharts_kfung_sixeurocountries_gdppercapita

The underlying data have not changed but the reader's experience is transformed.


The what of visualization, beyond the how

A long-time reader sent me the following chart from a Nature article, pointing out that it is rather worthless.

Nautre_scihub

The simple bar chart plots the number of downloads, organized by country, from the website called Sci-Hub, which I've just learned is where one can download scientific articles for free - working around the exorbitant paywalls of scientific journals.

The bar chart is a good example of a Type D chart (Trifecta Checkup). There is nothing wrong with the purpose or visual design of the chart. Nevertheless, the chart paints a misleading picture. The Nature article addresses several shortcomings of the data.

The first - and perhaps most significant - problem is that many Sci-Hub users are expected to access the site via VPN servers that hide their true countries of origin. If the proportion of VPN users is high, the entire dataset is called into doubt. The data would contain both false positives (in countries with VPN servers) and false negatives (in countries with high numbers of VPN users). 

The second problem is seasonality. The dataset covered only one month. Many users are expected to be academics, and in the southern hemisphere, schools are on summer vacation in January and February. Thus, the data from those regions may convey the wrong picture.

Another problem, according to the Nature article, is that Sci-Hub has many competitors. "The figures include only downloads from original Sci-Hub websites, not any replica or ‘mirror’ site, which can have high traffic in places where the original domain is banned."

This mirror-site problem may be worse than it appears. Yes, downloads from Sci-Hub underestimate the entire market for "free" scientific articles. But these mirror sites also inflate Sci-Hub statistics. Presumably, these mirror sites obtain their inventory from Sci-Hub by setting up accounts, thus contributing lots of downloads.

***

Even if VPN and seasonality problems are resolved, the total number of downloads should be adjusted for population. The most appropriate adjustment factor is the population of scientists, but that statistic may be difficult to obtain. A useful proxy might be the number of STEM degrees by country - obtained from a UNESCO survey (link).

A metric of the type "number of Sci-Hub downloads per STEM degree" sounds odd and useless. I'd argue it's better than the unadjusted total number of Sci-Hub downloads. Just don't focus on the absolute values but the relative comparisons between countries. Even better, we can convert the absolute values into an index to focus attention on comparisons.

 


Best chart I have seen this year

Marvelling at this chart:

 

***

The credit ultimately goes to a Reddit user (account deleted). I first saw it in this nice piece of data journalism by my friends at System 2 (link). They linked to Visual Capitalism (link).

There are so many things on this one chart that makes me smile.

The animation. The message of the story is aging population. Average age is moving up. This uptrend is clear from the chart, as the bulge of the population pyramid is migrating up.

The trend happens to be slow, and that gives the movement a mesmerizing, soothing effect.

Other items on the chart are synced to the time evolution. The year label on the top but also the year labels on the right side of the chart, plus the counts of total population at the bottom.

OMG, it even gives me average age, and life expectancy, and how those statistics are moving up as well.

Even better, the designer adds useful context to the data: look at the names of the generations paired with the birth years.

This chart is also an example of dual axes that work. Age, birth year and current year are connected to each other, and given two of the three, the third is fixed. So even though there are two vertical axes, there is only one scale.

The only thing I'm not entirely convinced about is placing the scroll bar on the very top. It's a redundant piece that belongs to a less prominent part of the chart.


Getting to first before going to second

Happy holidays to all my readers! A special shutout to those who've been around for over 15 years.

***

The following enhanced data table appeared in Significance magazine (August 2021) under an article titled "Winning an election, not a popularity contest" (link, paywalled)

Sig_electoralcollege-smIt's surprising hard to read and there are many reasons contributing to this.

First is the antiquated style guide of academic journals, in which they turn legends into text, and insert the text into a caption. This is one of the worst journalistic practices that continue to be followed.

The table shows 50 states plus District of Columbia. The authors are interested in the extreme case in which a hypothetical U.S. presidential candidate wins the electoral college with the lowest possible popular vote margin. If you've been following U.S. presidential politics, you'd know that the electoral college effectively deflates the value of big-city votes so that the electoral vote margin can be a lot larger than the popular vote margin.

The two sub-tables show two different scenarios: Scenario A is a configuration computed by NPR in one of their reports. Scenario B is a configuration created by the authors (Leinwand, et. al.).

The table cells are given one of four colors: green = needed in the winning configuration; white = not needed; yellow = state needed in Scenario B but not in Scenario A; grey = state needed in Scenario A but not in Scenario B.

***

The second problem is that the above description of the color legend is not quite correct. Green, it turns out, is only correctly explained for Scenario A. Green for Scenario B encodes those states that are needed for the candidate to win the electoral college in Scenario B minus those states that are needed in Scenario B but not in Scenario A (shown in yellow). There is a similar problem with interpreting the white color in the table for Scenario B.

To fix this problem, start with the Q corner of the Trifecta Checkup.

_trifectacheckup_image

The designer wants to convey an interlocking pair of insights: the winning configuration of states for each of the two scenarios; and the difference between those two configurations.

The problem with the current design is that it elevates the second insight over the first. However, the second insight is a derivative of the first so it's hard to get to the second spot without reaching the first.

The following revision addresses this problem:

Redo_sig_electoralcollege_corrected

[12/30/2021: Replaced chart and corrected the blue arrow for NJ.]

 

 


Check your presumptions while you're reading this chart about Israel's vaccination campaign

On July 30, Israel began administering third doses of mRNA vaccines to targeted groups of people. This decision was controversial since there is no science to support it. The policymakers do have educated guesses by experts based on best-available information. By science, I mean actual evidence. Since no one has previously been given three shots, there can be no data on which anyone can root such a decision. Nevertheless, the pandemic does not always give us time to collect relevant data, and so speculative analysis has found its calling.

Dvir Aran, at Technion, has been diligently tracking the situation in Israel on his Twitter. Ten days after July 30, he posted the following chart, which immediately led many commentators to bounce out of their seats crowning the third shot as a magic bullet. Notably, Dvir himself did not endorse such a claim. (See here to learn how other hasty conclusions by experts have fared.)

When you look at Dvir's chart, what do we see?

Dvir_aran_chart

Possibly one of the following two things, depending on what concern you have in your head.

1) The red line sits far above the other two lines, showing that unvaccinated people are much more likely to get infected.

2) The blue line diverges from the green line almost immediately after the 3rd shots started getting into arms, showing that the 3rd shot is super effective.

If you take another moment to look, you might start asking questions, as many in Twitter world did. Dvir was startlingly efficient at answering these queries.

A) Does the green line represent people with 2 or 3 doses, or is it strictly 2 doses? Aron asked this question and got the answer (the former):

AronBrand_israelcases_twoorthreedoses

It's time to check our presumptions. When you read that chart, did you presume it's exactly 2 doses or did you presume it's 2 or 3 doses? Or did you immediately spot the ambiguity? As I said in this article, graphs attain efficiency at communication because the designer leverages unspoken rules - the chart conveys certain information without explicitly placing it on the chart. But this can backfire. In this case, I presumed the three lines to display three non-overlapping groups of people, and thus the green line indicates those with 2 doses but not 3. That presumption led me to misinterpret what's on the chart.

B) What is the denominator of the case rates? Is it literal - by that I mean, all unvaccinated people for the red line, and all people with 3 doses for the blue line? Or is the denominator the population of Israel, the same number for all three lines? Lukas asked this question, and got the answer (the former).

Lukas_denominator

C) Since third shots are recommended for 60 year olds and over who were vaccinated at least 5 months ago, and most unvaccinated Israelis are below 60, this answer opens the possibility that the lines compare apples and oranges. Joe. S. asked about this, and received an answer (all lines display only 60 year olds and over.)

Joescholar_basepopulationquestion

Jason P. asked, and learned that the 5-month-out criterion is immaterial since 90% of the vaccinated have already reached that time point.

JasonPogue_5monthsout

D) We have even more presumptions. Like me, did you presume that the red line represents the "unvaccinated," meaning people who have not had any vaccine shots? If so, we may both be wrong about this. It has become the norm by vaccine researchers to lump "partially vaccinated" people with "unvaccinated", and call this combined group "unvaccinated". Here is an excerpt from a recent report from Public Health Ontario (link to PDF), which clearly states this unintuitive counting rule:

Ontario_case_definition

Notice that in this definition, someone who got infected within 14 days of the first shot is classified as an "unvaccinated" case and not a "partially vaccinated case".

In the following tweet, Dvir gave a hint of what he plotted:

Dvir_group_definition

In a previous analysis, he averaged the rates of people with 0 doses and 1 dose, which is equivalent to combining them and calling them unvaccinated. It's unclear to me what he did to the 1-dose subgroup in our featured chart - did it just vanish from the chart? (How people and cases are classified into these groups is a major factor in all vaccine effectiveness calculations - a topic I covered here. Unfortunately, most published reports do a poor job explaining what the analysts did).

E) Did you presume that all three lines are equally important? That's far from true. Since Israel is the world champion in vaccination, the bulk of the 60+ population form the green line. I asked Dvir and he responded that only 7.5%, or roughly 100K are unvaccinated.

DvirAran_proportionofunvaccinated

That means 1.2 million people are part of the green line, 12 times higher. There are roughly 50 cases per day among unvaccinated, and 370 daily cases among those with 2 or 3 doses. In other words, vaccinated people account for almost 90% of all cases.

Yes, this is inevitable when over 90% of the age group have been vaccinated (but it is predictable on the first day someone blasted everywhere that real-world VE is proved by the fact that almost all new cases were in the unvaccinated.)

If your job is to minimize infections, you should be spending most of your time thinking about the 370 cases among vaccinated than the 50 cases among unvaccinated. If you halve the case rate, that would be a difference of 185 cases vs 25. In Israel, the vaccination campaign has already succeeded; it's time to look forward, which is exactly why they are re-focusing on the already vaccinated.

***

If what you worry about most is the effectiveness of the original two-dose regimen, Dvir's chart raises a puzzle. Ignore the blue line, and remember that the green line already includes everybody represented by the blue line.

In the following chart, I removed the blue line, and added reference lines in dashed purple that correspond to 25%, 50% and 75% vaccine effectiveness. The data plotted on this chart are unadjusted case rates. A 75% effective vaccine cuts case rate by three quarters.

Junkcharts_dviraran_israel_threeshotschart

This chart shows the 2-dose mRNA vaccine was nowhere near 90% effective. (As regular readers know, I don't endorse this simplistic calculation and have outlined the problems here, but this style of calculation keeps getting published and passed around. Those who use it to claim real-world studies confirm prior clinical trial outcomes can either (a) insist on using it and retract their earlier conclusions, or (b) admit that such a calculation was, and is, a bad take.)

Also observe how the vaccinated (green) line is moving away from the unvaccinated (red) line. The vaccine apparently is becoming more effective, which runs counter to the trend used by the Israeli government to justify third doses. This improvement also precedes the start of the third-shot campaign. When the analytical method is bad, it generates all sorts of spurious findings.

***

As Dvir said, it is premature to comment on the third doses based on 10 days of data. For one thing, the vaccine developers insist that their vaccines must be given 14 days to work. In a typical calculation, all of the cases in the blue line fall outside the case-counting window. The effective number of cases that would be attributed to the 3-dose group right now is zero, and the vaccine effectiveness using the standard methodology is 100%, even better than shown in the chart.

There is an alternative interpretation of this graph. Statisticians call this the selection effect. On July 30, the blue line split out of the green: some people were selected to receive the 3rd dose - this includes an official selection (the government makes certain subgroups eligible) as well as a self-selection (within the eligible subgroup, certain people decide to get the 3rd shot earlier.) If those who are less exposed to the virus, or more risk averse, get the shots first, then all that is happening may be that we have split off a high VE subgroup from the green line. Even if the third shot were useless, the selection effect itself could explain the gap.

Statistics is about grays. It's not either-or. It's usually some of each. If you feel like Groundhog Day, you're getting the picture. When they rolled out two doses, we lived through an optimistic period in which most experts rejoiced about 90-100% real-world effectiveness, and then as more people get vaccinated, the effect washed away. The selection effect gradually disappears when vaccination becomes widespread. Are we starting a new cycle of hope and despair? We'll find out soon enough.