One of the most frustrating aspects of reading Covid-19 related publications is reading the conclusions. A summary of results is supposed to provide readers with the minimally necessary information to understand the full contents of the paper - but what we encounter is "story time": almost always, these conclusions are stated in the most general sense, ignoring all the caveats, the exclusions, the imperfections of the methodologies and analyses. They are "stories" that go far beyond the limited evidence presented.
I've been reading the Clalit study that was a major force behind the approval of the first booster shots back at the end of 2021/start of 2022. This paper made the following one-line conclusion:
Participants who received a booster at least 5 months after a second dose of BNT162b2 [i.e. Pfizer] had 90% lower mortality due to Covid-19 than participants who did not receive a booster.
Is it possible to misread this sentence? Yes, if you literally read the sentence.
***
If we know something about vaccine efficacy/effectiveness, we immediately recognize that these authors are selling the same story - that the vaccine has VE 90%, the only difference here being the outcomes are deaths attributed to Covid-19, rather than reported cases. As such, this VE is driven by the relative ratio of the death rates of the booster group vs the no booster group. They didn't use the term VE as it typically connotes cases not deaths.
A death rate is a fraction composed of two parts: people who are counted as part of the comparison group, and people within that group who died during the study period. In this post, I'll pay attention to both these parts.
From my last post, we already know one issue with the above statement. If we take any participant at the start of the study and ask whether this person is counted as part of the booster group or the no-booster group during the study, the answer, for 90% of the participants, is both.
Since at the start of the study, 100 percent of the participants had not received the booster, and by the end of the study period (a mere 54 days), 90 percent of them got the booster, the researchers could not have divided up the study population into two distinct groups of people - each person (except the 10% who didn't get the booster during the study period) first appeared in the no-booster group, and then transitioned to the booster group.
But the concluding statement reads like the study enrolled two distinct groups of participants. This misreading is even more likely for knowledgable readers because ordinary studies, such as well-designed clinical trials, use distinct groups.
Note that I'm not criticizing the methodology up to now, I'm describing what it is, and criticizing how the language in the summary statements sends readers the wrong message.
***
The above detail may sound trivial but it has major implications. Think about the regression adjustments. As with all observational studies, researchers claim that by adding standard covariates to their regression models (age, gender, ethnicity, prior health conditions, etc.), they have corrected substantively all biases. For example, if older participants are more likely than younger ones to want the booster, rather than being assigned a treatment at random as in a randomized clinical trial, then it's possible that the age distribution in the booster group does not mirror that in the no booster group, i.e. an age bias exists.
Look at the setup here: Ninety percent of the participants appear once in the treated group and once in the not treated group. By the design of this study, there is little variation across the comparison groups in the distribution of everything, including age. If we check "covariate balance", it's balanced because the two groups are by and large identical. So this type of studies underestimates the variability of everything - which leads to p-values that are too small, and regression adjustments that are meaningless.
***
In the paper's conclusion, the phrase "participants who did not receive a booster" is incorrect in more ways. We already know almost all of them ended up getting boosters before the end of the study period, which was fewer than two months long. I have previously said that a participant transitions into the booster group when s/he takes the booster shot.
This is in fact not true. That's because this study also imposes a "case counting window," under the claim that the booster shot takes 7 days to become effective. In other words, if someone takes a booster shot, and falls ill with Covid-19 within the first 7 days, this case is not counted as a case in the booster group.
Negating this case is one thing - but this study (and many others) took another step: the case is not negated but it is counted as a case in the no-booster group!
How do they pull this off? The study design says not to transition the participants to the booster group until the 8th day after the booster shot.
So, let's review this. Each participant starts off in the no-booster group, then the person took the booster shot; this person is kept in the "no-booster" group for an additional 7 days before being moved into the "booster" group. During these 7 days, if the person gets infected or died from Covid-19, the case or death is counted as "no booster".
Remember that we can only talk about exposure time for this type of study since almost everyone is eventually exposed to both conditions. The exposure time of the "booster" group therefore excludes the first seven days after the booster while the exposure time of the "no booster" group includes the first seven days after the booster.
This is an instance of the additional bias that Lataster described in his comment on our Three Biases paper.
***
As the story goes, even the above description is inaccurate. We haven't dealt with the following puzzling disclosure:
participants who received the booster and had a confirmed case of Covid-19 within 3 days before the effective-booster date (defined as 7 days after the booster was administered) were excluded
The analysts further subdivided the seven days after the booster shot is administered into two parts: Days 1-4 and Days 5-7. Let's follow a participiant who did not have the booster at the start of the study. During the study period, this person took the booster shot. If the person got sick during Days 1-4, the person is counted as a member of the no-booster group with infection as outcome. However, if the person got sick during Days 5-7, this person is excluded from the study, i.e. not counted as part of the no-booster group, and not counted as a case. Finally, if the person did not get sick by Day 7, the person transitions to the booster group.
There is a subtle but important difference. The Days 1-4 treatment is applied to every participant, regardless of outcome. If they got the booster, they continued to be called "no booster" for seven additional days. The Days 5-7 treatment is an exclusion based on the outcome. If the person got sick, the person is removed while if the person did not get sick, the person is retained. For me, it is never acceptable to re-define treatment groups after observing the outcomes. And yet, this paper (and many others) engages in this practice.
To understand the impact of these data processing procedures, you can use the same device we used in the Three Biases paper: imagine a vaccine that is ineffective, apply the analysis, does it result in no effect, or something else?
This is a great example of why researchers must disclose the code used to process the data. It's hard to believe these special adjustments did not get the attention of reviewers. But I can think of no other way to interpret those words. (In fact, the waterfall chart in the paper confirms the exclusions.)
***
Now that you have heard how they measure outcomes and define "no booster", it's time to revisit the conclusion:
Participants who received a booster at least 5 months after a second dose of BNT162b2 [i.e. Pfizer] had 90% lower mortality due to Covid-19 than participants who did not receive a booster.
Does this sentence correctly summarize the study?
Recent Comments