In part 3 of my series on the AstraZeneca-Oxford (AO) vaccine trial results, as published in the Lancet paper (link), I'm giving you a health warning: we're going down the deep end. (The previous installments are here and here.)

As previously discussed, the AO trial experienced major mid-course corrections, upsetting the original test design, which necessarily complicates interpreting the data. Every number that is issued is now loaded with caveats, and rarely can one compare an efficacy calculation with another.

Reading through these analyses is like juggling balls, and forced to juggle more and more balls as I take in each supplementary analysis. The key is to not get tired and lower your guard.

Here is Table 3 of the paper that presents a series of supplementary analysis they used to shed light on various specific questions. While reading this table, I'm always anchored to the headline vaccine efficacy (VE) number of this Phase 3 trial, which is 70%, a pooled result across separate U.K. and Brazil trials.

The Origin of the 90% Efficacy Claim

The first analysis shown in Table 3 had a starring role in Astrazeneca's press release, in the paragraph directly following the overall efficacy result. This analysis claims that after a two-dose treatment of the AO vaccine, the VE reaches 90% if the first dose is a "half" dose. This statement implies that other participants in the trial did not get a "half" first dose. I can infer that the remaining subgroup had a VE lower than 70%, and Table 3 confirms that it is 60% if the first dose is a "standard" dose. Knowing the other side raises an immediate question, as it is unusual that decreasing dosage from the "standard" should improve efficacy.

This would have been an astounding finding if the trial had incorporated two **randomized** treatment arms, respectively getting lower doses (LD) and standard doses (SD) in the first shot. But the trial design stipulated just one treatment group getting two standard doses (SD/SD). The LD/SD subgroup was written into the trial when it was determined that a quarter of the first doses contained less vaccine than expected. This means **the difference between the LD/SD and SD/SD subgroups is more than just the size of the first dose**.

In fact, the paper disclosed June 10 as the point of demarcation: every participant in the LD/SD subgroup enrolled prior to June 10 whereas everyone in the SD/SD subgroup enrolled after that date. A quick glance at the case trend of the U.K. indicated that cases were trending downward in June, reaching a temporary plateau in early July.

No amount of magic can cure this factor because the two subgroups are disjoint on the basis of enrollment dates.

It turns out that this subgroup analysis does not include the entire SD/SD subset. We should not directly compare the 70% overall VE with the 90% subgroup VE since the latter excludes the Brazil contingent. The researchers justified the exclusion because no one in Brazil got a half dose. The same researchers previously convinced regulators that the Brazil cohort is similar enough to the U.K. cohort so they can conduct a pooled analysis. So, effectively, they have argued for pooling in one analysis but not the other.

Another sub-subgroup excluded from this subgroup analysis is the older cohort (ages 56 and over). This decision is justified because the LD/SD cohort did not contain any older people; that was because the design stipulated a staged enrollment, starting with lower-risk people and then moving to higher-risk people. As explained above, the LD/SD cohort was finished before the SD/SD cohort began. If they had included all ages, age would become another disjoint factor. In the case of enrollment time, disjointedness isn't a show-stopper but age and country are. This discretion is technically called "researchers' degrees of freedom". With freedom comes responsibility.

The implication is that these individual analyses generate a host of VE numbers, none of which can be directly compared to the others. These numbers are correlated, and yet they do not necessarily have to paint a consistent picture. When several numbers conflict with one another, it reveals the non-randomness lurking behind these comparisons, which denies our ability to clearly explain the discrepancies.

The Case of the Delayed Second Shot

That's not the end of the potential confounding factors, though. Virtually everybody in the LD/SD subgroup had to be called back for a second shot when AO changed the protocol in late July, implemented in August. As a result, virtually everyone got their second shot more than 8 weeks (56 days) after the first. By contrast, 40 percent of the SD/SD group took their second shot less than 8 weeks after the first.

To investigate the timing of the second dose, the investigators conducted another analysis by drilling down further. Now they removed everyone in the SD/SD group who received their second shot prior to 8 weeks. Why should the cutoff be set at 8 weeks and not, say, 3-4 weeks (which is the dose interval favored by other pharmas)? It's because the LD/SD subgroup did not contain any participants with dose intervals less than 8 weeks.

So the cutoff is determined by what data was available for analysis. And what data was available was decided by ... you guessed it, the mid-course alteration of the design. As indicated above, this dropped about 40% of the SD/SD group while retaining 100% of the LD/SD group.

On first brush, this created another counterintuitive result. The VE of the U.K. SD/SD subgroup aged 18-55 who received the second shot longer than 8 weeks after this first is 66%, a bit higher than that for the same but who took the second dose fewer than 8 weeks after the first (60%)!

Don't forget that we are no longer analyzing two randomized treatment groups except for dose intervals. When the shorter-delay sub-subgroup were taken out of the analysis data, did these participants exit evenly across the enrollment period? Could someone who got a first shot later receive a second shot earlier than someone else? We don't know because the participants rather than the investigators appeared to control when they accepted the second shot.

Further, the range around the 66% estimate is from 25 percent to 85 percent, which basically tells you this analysis is worthless, and the gap between 66% and 60% is meaningless.

The Moving Goalpost

Because the regulators were taken by a theory that the second dose should be delayed, they ordered yet another analysis.

This time, the investigators dumped the LD/SD subgroup altogether while recalling the Brazil cohort. They also called back the older people. There exist demographic differences between the U.K. and Brazil cohorts that received the SD/SD treatment. The UK cohort is twice as likely to be 56 and over. The Brazil cohort was two-thirds white while the U.K. cohort was 90% white. Ninety percent of the Brazil cohort were health and social care workers, compared to 60% in the U.K. Notably, the LD/SD cohort who enrolled earlier in the U.K. trial was 90% health and social care workers but is excluded.

Despite these demographic differences, the data from two countries are pooled for this analysis, because the primary factor being investigated is the dose interval. The study population is split by when they received the second dose using a cutoff of 6 weeks. Why 6 weeks and not 8 weeks? It's anybody's guess. I suspect it's partly driven by the later start date of the Brazil trial, which means they have fewer long-interval participants -- the mirror image of the LD/SD cohort. It couldn't be placed earlier because there wouldn't be enough low-delay participants in the U.K. trial.

This analysis again gives a surprising finding, that those who got the second shot in fewer than 6 weeks did worse (53%) than those who took it in longer than 6 weeks (65%). The 12% difference in vaccine efficacy appears meaningful.

Well, well, well. The range estimate of the short-delay sub-sub-group is ...-3% to 79%. Yes, that is *negative* 3%, and the range is over 80%. Moreover, don't be lulled into thinking that the only difference between these two subgroups is the dose interval. The Brazil cohort was much more likely to have shorter intervals: 73% of the short-delay subgroup came from Brazil while 70% of the long-delay subgroup came from the U.K. Did the short-delay sub-subgroup do worse because of the dose interval or because those participants were much more likely to be in Brazil? Or, because the Brazil cohort is less white?

These numbers can't be compared to the overall 70% VE either because that number contains the LD/SD subgroup.

When it rains, it pours

The parade of "exploratory" analyses did not stop there. Somewhere embedded in the Lancet report is this gem of a sentence:

Although there is a possibility that chance might play a part in such divergent results, a similar contrast in efficacy between the LD/SD and SD/SD recipients with asymptomatic infections provides support for the observation (58·9% [95% CI 1·0 to 82·9] *vs *3·8% [−72·4 to 46·3]).

The researchers are making an assertion that a lower dose might after all be better than a higher dose, since the gap between 59% and 4% is so bigly that it could not have been due to chance. To be fair, they provided us the range estimates. Did I see 1% to 83% and *negative* 72% to 46%? These are some crazy, crazy intervals. Sad, sad intervals. As any statistics student should know, the size of the confidence interval is a function of the sample size. The more samples we have, the narrower the interval, the greater our ability to rule out random chance.

## Comments

You can follow this conversation by subscribing to the comment feed for this post.