I previously discussed the evidence that Pfizer submitted to the FDA to get approval for booster shots (here and here). The studies were small, and circumstantial. It appears that the experts were partially or completely swayed by "real world" data coming out of Israel so I looked at what is in that influential Israeli study (link to report).

The story of vaccinations in Israel is one of high hopes unfulfilled. Here is their full case curve for 2021.

Between April and July, it appeared that the fast vaccination campaign yielded dividends; public health officials were satisfied that the vaccines beat Covid-19. Like other countries, Israel moved to remove most of the mitigation measures. Not so fast, as in July, cases started to tick up again. The rate of growth was so alarming that they became the first country to approve booster shots. Other countries, including the U.S., were curious what the outcomes would be - since the vaccine developers were not going to run proper randomized experiments to prove the value of boosters. Even after administering boosters, the case curve didn't start bending till late September. To this day, despite the general acceptance that vaccines are effective, no one can reliably predict what proportion of the population needs to get injections before infections start to decline, or how much time it would take to start seeing cases come down.

***

This Israel study (link) was published in October, and only concerned one month of data (August 2021). Approximately 1.1 million Israelis were fully vaccinated by July 30 and formed the study population. However, we do not know how many of those subsequently took the booster shot. This is one of those key omissions that have pervaded all the vaccine studies - basic check-off-the-box data that are withheld from the scientific reports.

The data came from the Ministry of Health. Data cutoff for cases was Aug 30, 2021, and these researchers invented a new case-counting window that starts at 12 days after the booster shot. Since the first booster shots analyzed by this study happened on July 30, that means no cases are counted until 12 days later (Aug 11, 2021). For those who took the booster on July 30, the study has a maximum follow-up window of 20 days. The graph above shows a linear growth in administered boosters during the study period. For someone who took the booster on Aug 15, the case-counting window opens on Aug 27 and closes on Aug 30, thus the follow-up period was 3 days. Anyone who took the booster after Aug 18 are "immortal": if they get sick after 12 days, those cases won't feature in the study data.

This little calculation reveals that the follow-up period for the booster group is between 1 and 20 days, and that the study population is effectively three weeks, 12 days fewer than advertised (people who received boosters in those 12 days have not reached the beginning of the case counting window). That means, if we believe everything else in this study, the researchers have shown that the booster shot is effective for an average of 10 days.

The headline result of the study is that those who took the boosters are 11 times less likely to get infected than those who didn't take the boosters (note the unspoken: effective for 10 days, after ignoring anyone who gets sick during the first 12 days). For severe illness, the factor was 20 times.

As explained in my previous post, the use of these multiplicative factors (x times) is a fad. They could have expressed the same data in the usual way: the booster effectiveness was 90% for cases, and 95% for severe cases. In other words, they are claiming the same effectiveness as the original two shots.

The above calculations were "adjusted" by including main effects for age group, gender, calendar day, timing of 2nd dose, and "sector" (Arab/Jewish/Ultra-Orthodox).

***

A vaccine effectiveness metric is a relative ratio between the (severe) case rate of the treated group and that of the untreated group. In the original vaccine trials, the treated group took two shots while the untreated group took the placebo shots. In the real-world booster study, the treated group took the booster shot while the untreated group didn't; both groups previously received two doses of an mRNA vaccine.

The VE metric is merely a correlation between cases and vaccination status. In order to move from correlation to causation, statisticians want to establish that the booster group and the no-booster group are comparable; that the observed difference in case rates isn't driven by factors other than vaccination.

To understand the nature of this challenge, let's pretend that the FDA required vaccine developers to conduct a randomized controlled trial (RCT) to determine the causal effect of the booster shot on cases (The FDA didn't require it.)

We would have recruited participants from the group that has previously took two vaccine shots. Then, they are randomly divided into two groups: booster, and placebo. All participants are then followed for a period of time, and at around the two-month time point, an interim analysis is conducted. This analysis is simple, because by virtue of random assignment, the two groups have co-variate balance (same mix of age groups, genders, occupations, date of second doses, etc.), and thus, any variation in case rates is due to the booster.

Instead, we have "real world data". On the surface, we have two groups: booster and no-booster, which sounds similar to booster and placebo. But labels deceive. The lack of a random assignment device means the two groups are not apples to apples. Note that public health policy prioritizes high-risk groups, and some people have reasons to want early boosters (e.g. work on the front lines, want to stop wearing masks).

I find it rewarding to think about the study population in terms of three subgroups (bottom row of the left chart).

Start with anyone in the study population on July 30, the start of the study period. This person is in the no-booster group - until s/he decides to get the booster shot, at which point the same person migrates to the booster group. There are some who never took a booster during the month of August so they remain in the no-booster group throughout the study.

Let's consider the following three subgroups:

(NB1) People who never took a booster during August

(NB2) People who received a booster during August - they belonged to the no-booster group until the day of booster

(B) People who received a booster during August - they migrated to the booster group on the day of the booster.

(NB2) and (B) are "twins". They are the same people, counted twice, the first time as a member of the no-booster group, and the second time as a member of the booster group.

Thus the no-booster group (NB) in the study consists of (NB1) and (NB2) while the booster group is just (B). Now, let's address whether (NB) can be compared to (B).

***

Consider first (NB2) vs (B). These are twins, they are the same people counted twice. Thus, by definition, we have 100% co-variate balance, if we consider the typical demographic co-variates, such as age, gender, and occupation. Even date of second dose is also perfectly balanced since that's a variable about historical behavior prior to the study.

Nevertheless, (NB2) differs from (B) in pivotal ways. The exposure time windows are disjoint. The (NB2) twin always have earlier exposure than the (B) twin. Besides, the length of exposure is negatively correlated: the longer the follow-up time for the (NB2) twin, the shorter for the (B) twin. Thus, the booster status variable is confounded with exposure time and duration - and this is a hopeless confounding, 100% confounded.

Next, compare (NB1) with (NB2). Are the two partitions of the no-booster group similar? The answer is a resounding no. (NB2) are those who got the booster in August while (NB1), not in August. Unless one believes that the timing of the third dose is chosen completely at random, one must conclude that those two subsets are dissimilar due to selection biases.

In addition, (NB1) invariably has longer follow-up time than (NB2), and by extension, (B). The start of the follow-up window is fixed for everyone in (NB1) and (NB2) but variable for those in (B).

To summarize, (B) is identical to (NB2) on many co-variates due to twinning but differs from (NB2) pivotally on exposure metrics. (NB2) is different from (NB1) due to selection bias, thus these two subsets do not have co-variate balance; they also have different exposure durations.

When we compare (B) with (NB), we may be tricked into thinking we have good co-variate balance because of the (NB2) component. This statistical deception intensifies as more people get boosters. The exposure-related complete confounding is, to my knowledge, impossible to cure. Regression adjustments via main effects are insufficient to solve these complex statistical problems.

The differences between (B) and (NB) - in addition to booster status - include the exposure-related confounding with (NB2), the selection biases with (NB1), and the exposure-related bias with (NB1). All three are affected by the proportion of people who have taken booster shots.

Any serious causal analysis of real-world booster data must offer adjustments that deal with the above statistical issues.

P.S. [11/9/2021] In the following post, I looked at a couple of observations from this study that caught my eye.

## Comments

You can follow this conversation by subscribing to the comment feed for this post.