This is a continuation of a series of posts on the Astrazeneca-Oxford (AO) vaccine trial results. The previous installment is here. These comments are based on the peer-reviewed paper in the Lancet (link).
At the end of the last post, I mentioned that the AO trial experienced multiple meaningful mid-course corrections. I'll discuss the implications of these changes in this post.
Deviations from original protocol
An architect publishes the blueprint for a new building, it's approved by regulators, and construction commences. Midway during the build, the architect revises the blueprint. The regulators approve the changes. The building is completed. When the new tenants inspect their new abodes, they are surprised to find huge pillars in the middle of the floor that weren't on the floor plans they were shown when they made the purchase. They feel both violated and helpless. If the change were extra pipes in the toilet, they might not feel as badly.
Reading the Lancet paper (link) is like inspecting the new building, and learning that AO made many alterations to their pre-specified trial protocols. Several of these changes are not cosmetic in nature but foundational. The following major changes were disclosed in the Lancet paper:
a) The vaccine arm was swtiched from one dose to two doses
b) The vaccine arm was split into two subgroups, one of which was supposed to have received a lower dose than expected at the first shot
c) Two separate trials - one in the U.K. and one in Brazil - were combined in a pooled analysis.
The one-dose, two-dose debate started much earlier than we thought
The vaccine treatment in the AO trials was switched from one dose to two doses, months after the trial started. It is ironic that the U.K. government is now using the trial results to justify a one-dose treatment (officially, the guidance is not to worry about when to take the second shot) when the investigators felt strongly enough to switch to two doses midway through the trials, which required altering the pre-specified protocols. Critically, when making that change, they could have retained a one-dose arm but decided against it.
What makes this decision even more consequential is that AO eventually carved out a subgroup of the vaccine arm - those who received a lower dose at first dose (more later). By the time the protocol was altered (in late July, and implemented early August), this entire subgroup had already received the first dose, which means their treatment would have already been complete, all of them at Day 30 of follow-up or later, when suddenly they were asked to receive a second dose. This late protocol revision partly explains why many participants got the second shot much later than the 3-4 weeks recommended by Pfizer/Moderna or the 4-6 weeks targeted in the revised AO protocol.
While the Brazil trial started later than the U.K. trial, it too started out with a one-dose treatment, later switched to two doses without retaining a one-dose arm.
Splitting the vaccine treatment arm in two
Some coverage of the AO results highlights a 90% VE number, which is a lot more impressive than 70%. This number comes from the Lancet paper, describing a subgroup of trial participants (labeled LD/SD - low dose/standard dose) who are said to have inadvertently received a lower ("half") dose followed by a full dose of the vaccine. About a quarter of the vaccine arm has been classified as LD/SD.
The first thing to realize is that if there is a subgroup that did better than average VE of 70%, then the rest of the vaccine arm must have done worse than average. There is no way around this math. So, if we split the vaccine arm into two subgroups, LD/SD and SD/SD, then the VE of the SD/SD group fell to 62%. Any commentator who mentions 90% without mentioning 62% is cherry-picking the data. An exception can be made if one has a cogent argument as to why the SD/SD group is invalid - but, in this case, the SD/SD group represents the desired treatment!
The second thing to realize is that the trial design did not designate three randomized arms, LD/SD, SD/SD, and control, so that each arm can be compared to another while assuming "all else equal" except for the treatment. The comparison of LD/SD and SD/SD was presented as if the data came from a proper randomized controlled experiment but that is far from the truth. According to the researchers, the LD/SD subgroup arose unintentionally when they discovered that some participants got a lower-than-expected volume of the vaccine in the first dose (more on this in a future post).
As a result, the LD/SD and SD/SD subgroups differ not only by the amount of vaccine but a host of other factors, which include at least the following:
i) age: everyone in LD/SD was 55 years or under while 20 percent of SD/SD was over 55.
ii) country of abode: everyone in LD/SD came from the U.K. trial while the SD/SD participants were split half-half between U.K. and Brazil.
iii) timing of enrollment: everyone in LD/SD received the first shot before anyone in SD/SD got the first shot.
iv) timing of second dose: essentially no one in LD/SD received the second shot before week 8 while 40% of the SD/SD U.K. people and 80 % of the SD/SD Brazil people did
v) occupation: 90% of LD/SD are health and social care workers compared to 60% of SD/SD U.K. and 88% of SD/SD Brazil.
This mess is a case study for why statisticians insist on careful test design and high quality of test execution. The attempt to salvage this trial led to the counterintuitive result that LD/SD had better efficacy than SD/SD. Nevertheless, this result could have been due to any combination of younger people, more health and social care workers, more Britons, or earlier enrollment (when the pandemic was less serious) in the LD/SD subgroup. Those are the known differences; there will also be unknown differences because the treatment was not randomly assigned. It is impossible to separate the confounded effects without making assumptions.
The counterintuitive result may even be a matter of pure chance. The original trial was sized to analyze the vaccine group in aggregate, not split into two parts. Thus, an analysis of these subgroups necessarily result in wider error bars. This is one reason why AO requested a pooled analysis combining results from the U.K. and Brazil trials. When combined, the SD/SD group was judged to be 62% effective, with a confidence interval of 40% to 75%. Without adding the Brazil cohort, the VE is 60%, with a confidence interval of 28% to 78%. This presents two problems: first the 50% range is unacceptably wide, and the lower end of 28% is below the critical number of 30%. (See this previous post for why the FDA is looking for VE > 30%.)
Pooling data from two trials
Another alteration to the pre-specified trial protocol is to pool together the data from the U.K. and Brazil trials for analysis. This is different from Pfizer running one trial across different sites in different countries.
The potential problems is similar to the LD/SD issues. The following list are some of the known differences between the two trials:
i) timing: the U.K. trial started about a month before the Brazil trial
ii) placebo: in the U.K. trial, the placebo originally was the meningococcal vaccine, then it became saline when the second shot was added while in Brazil, all placebo shots were saline
iii) timing of second shot: 60% of the U.K. (SD/SD) group got their second shot after 8 weeks while 80% of the Brazil (SD/SD) group got their second shot before 8 weeks
iv) race: 90% of the U.K. (SD/SD) group are whites compared to 60% in Brazil.
v) occupation: 60% of the U.K. (SD/SD) group are health and social care workers compared to 88% in Brazil
vi) the state of the pandemic, including public health policies, differs significantly between the two countries
It's not that any of these differences destroys the analysis but by pooling the data, an implicit assumption is made that none of those (and other unknown) differences matters. This decision was made to expedite the issuance of actionable results - another example of a shortcut taken in response to the public health emergency.
Let me put it this way: I don't envy the data analysts; these protocol deviations cook up a big mess.
***
In summary, the significant alterations of the protocol pollute the original randomization of treatment, making it much harder to understand the data because the groups being compared no longer differ only by the treatment but by a host of known and unknown biases. In the Lancet paper, these less rigorous analyses are described as "exploratory" so when you see that word, you know the analysts are treating the non-randomized data as if they are randomized.
The reader must be very careful when interpreting any result from this AO analysis. Pretty much every analysis is based on a different set of participants, and most of these subsets cannot be interpreted as randomized.
This was all a bit of a mess. Deciding on what happens in Phase 1/2 trials and then what happens in Phase 3 is a bit of an art rather than science. The Phase 2 are rarely powered enough to make a definite decision on dosing etc for Phase 3. After all that is why they are Phase 2. They made life difficult also by not really doing a real Phase 2.
I think they panicked. They actually had designed their Phase 2/3 to detect a VE of 70%. They must have decided that a single dose was not going to be sufficient. It is possible that they compared the antibody response to that of another vaccine, and realised they didn't have as good a vaccine. So they added a second dose. While it is messy, it doesn't invalidate the trial, as randomisation still holds.
The different countries isn't a problem. Their modelling included an effect for study for the rate of infections. Somewhere in the protocol that also allowed for checking for a treatment by study interaction which I don't think they found. Some statisticians would argue for a random effects model. Similarly there are opinions that they should have ignored the LD/SD and SD/SD groups, as differences between them are likely to be spurious.
I would like to see a survival analysis approach to analysis. This would allow for setting up a time-dependent effect of treatment to see how the effect of treatment differs for the first and second dose.
Posted by: Ken | 02/18/2021 at 05:37 AM
Ken: Ordinarily, different countries are not a problem as evidenced by multi-site trials (like Pfizer). But when these are separate trials, done at different times, not controlling for demographics, and even having varying treatments (dose intervals) and placebos, that's a lot to stomach. I think they could have helped their case by releasing analyses to support the assertion that these trials can be pooled, as well as various other claims throughout. It's certainly possible but hard to tell without seeing supporting evidence.
Posted by: Kaiser | 02/18/2021 at 11:32 AM