Observational studies have some common characteristics that annoy the analyst. In the previous post about Pfizer's booster studies, I described a few.
- The sample contains few examples of the people to which you're generalizing your result (e.g. people 65 or older, high-risk individuals of any age).
- The measured outcome is not the obvious, direct outcome (e.g. clinical outcome of infection) but an indirect indicator (e.g. antibody levels).
These problems create the need to extrapolate results. Such extrapolation is usually supported by unverified assumptions (aka expert opinion). Of interest, the FDA allowed Pfizer to impose a causal assumption: that if the booster achieves similar levels of antibodies as the original 2nd dose, then the booster also produces similar clinical outcomes as the original two doses. See the previous post for why modifying an indirect outcome does not invariably lead to a change in the direct outcome. (This matter is yet another real-world example about correlation and causation.)
As I have explained often on this blog, making assumptions is not a sin. All statisticians make assumptions - those who claim they make no assumptions make assumptions! Let me give a quick example. Suppose a college runs a survey asking recent graduates about their post-graduation salaries. Only 50% of the graduates responded with a valid number. Should the college impute the salaries of the non-respondents? If I were the analyst, I would impute the salaries, baking in the assumption that those who don't respond are likely to be earning lower salaries (possibly even zero). However, some statisticians will argue against using imputation. They'll claim that one should just report the average of those who responded, because it is a horrible thing to modify the data - one should only correct egregious data entry mistakes but not otherwise touch the data. Are they making no assumptions?
They are making a huge assumption: they assume that the non-respondents have the same average salary as the respondents. What is the basis for that assumption? There is none, other than the desire to not tamper with the dataset. In fact, this assumption is almost surely wrong. As I said in Numbersense (link),
Beware of those who don't tamper with the dataset!
***
In this post, I'll cover several additional extrapolations that were used in analyzing the booster data.
The key immunogenicity results all measure antibody levels against the original virus (so-called wild type) but as we've been constantly reminded, the only virus currently in circulation is said to be the Delta variant. How did Pfizer bridge this gap?
According to the FDA Briefing document (link to PDF), "Pfizer proposes to infer effectiveness of the booster dose against the Delta variant from exploratory descriptive analyses of 50% neutralizing antibody titers against this variant evaluated among subjects from the Phase 1 portion of the study."
"Exploratory descriptive analysis" is not regarded as a serious form of scientific evidence - the rules have changed during this pandemic emergency. I pity the statisticians working on these studies. They were assigned an impossible task. Recall that the gold-standard randomized clinical trial (RCT) has been thrown overboard, replaced by "immunobridging" analysis. The antibody levels stimulated by the booster shot are deemed comparable to those registered (by other individuals) after their second shots, and then the "non-inferiority" in antibody levels is assumed to result in the "non-inferiority" in vaccine effectiveness (a clinical outcome not directly measured).
The Delta variant introduces another complication: as it was not present around the time of the original Pfizer adult trial, researchers do not know what the VE of the first two doses is against Delta. This breaks the causal assumption of Immunobridging.
We do not have even an approximation of the VE outcome against the Delta variant for the 300-odd people in the booster trial. So if one accepts that one can expose the old blood samples to the Delta variant in the lab, and show that the antibody levels are comparable, one has to make an even stronger assumption by claiming that those antibodies will result in the same VE as measured in the Pfizer adult trial.
This line of argument unfortunately results in a contradiction. The original vaccine either works just as well against Delta as it worked against the original strain; or it is less effective, which explains recent surge in cases. Only one of these scenarios can be true.
If the first scenario were true, then the immunobridging assumption holds as VE for Delta is the same as VE for the wild type, but in this scenario, the recent surge in cases has nothing to do with Delta, and there is no need for a booster - since VE is the same.
If the second scenario were true, then the booster may be necessary, but in this scenario, the immunobridging methodology is broken as the prior value of VE no longer applies.
Since Pfizer applied for approval for the booster, it is assuming the second scenario. One might be tempted to prefer assumption #1 because it permits the immunobridging analysis. That is a popular justification you find in academic papers, in the same vein as "computational feasibility" or "tractability". It's risky to use this logic in studies with real-world consequences. Besides, it creates a contradiction.
Beyond the logic of the analysis method, notice that the Delta analysis is performed using only Phase 1 participants, so we are talking about 11 people between 18 and 55, and 12 people between 65 and 75 years old. I'm not sure why blood from the other 300 people cannot be similarly analyzed. Without those results, the immunobridging assumption is expanded yet again to generalize from the 23 people to the 300-plus people (then to the 20K-plus in the adult trial, and finally to the U.S. population.)
A proper RCT would have yielded a much cleaner analysis of the direct clinical outcome.
***
Another missing piece of the puzzle is the provenance of the 300 or so Phase 2/3 trial participants who were "selected" to take part in the booster study. Nowhere in the FDA Briefing document can I find the selection mechanism. Usually, this means the selection was not random from the vaccine arm of the earlier trial. If it is not random, what are the criteria?
The 300 people reduced to about 200 that ultimately were used in any of the primary calculations. That's a drop-off of a third of the starting population, which is a high drop-off rate. (Note: Table 3 claims the evaluable immunogenicity population is 268 but the results shown in Tables 4 and 5 unexpectedly have N=210 and N=179.)
Some of the reasons for exclusion are perplexing. Let's review a few of these reasons.
Six people refused the booster shot. There is an argument that they should be excluded as they did not get the treatment under study, and there is no blood to be analyzed.
Fifteen were dropped because they did not have "at least 1 valid and determinate immunogenicity result within 28 to 42 days after the booster shot." This sets my alarm bells ringing because this exclusion can only be applied after the primary outcome of the study is measured. It's not clear what constitutes "valid" and "determinate". It's concerning to lose 5% of the study population by effectively saying we failed to obtain the primary endpoint.
Then, they dropped a further 30 people (10% of the study population) because their clinician decided that these people committed "protocol deviations" before the 3D+30 day evaluation time.
Last but not least, they also removed 34 people from the study population (15%) based on a key clinical endpoint. This exclusion criterion is described as "evidence of infection up to 1 month after booster dose". Notice it says "after" booster dose, not before. So, it appears that the following happened: they selected about 300 people to start this study regardless of their prior infection status, various people were excluded for the reasons described above, blood samples were collected, PCR tests were performed (on the day of the shot, and then when the participants self-reported symptoms), and if anyone gets sick within 30 days of receiving the booster shot, they were kicked out of the study.
Remember the design of this study. The indirect outcome of antibody levels is used to infer the direct outcome of infection. So when someone is dropped because of infection, we should infer that the dropped person has inadequate antibody levels. These deletions induce a bias in the primary endpoint, raising the average antibody levels of those who remain in the study.
Why is the case-counting window set to 30 days instead of the 7 days used for the 2nd dose? That's not explained in the Briefing document.
***
I may come back to the safety data some day but the sample size is so small that only really loud signals can be heard. If no adverse effects are found, one can't conclude that there are no adverse effects; one can only say that there are no adverse effects that can be detected by this study design.
Comments
You can follow this conversation by subscribing to the comment feed for this post.