I kept the previous post about Pfizer's Bayesian analysis of the vaccine trial data to a reasonable length. And then, I found some loose ends.
Bayesian analysis has the benefit of flexibility
Bayesians like to address the question head on. If our question is about vaccine efficacy (VE), we directly model that, resulting in a probability curve for VE. When we have such a curve (which I reprinted below), we have the ability to answer all kinds of questions.
In the last post, I told you that the Pfizer protocol disclosed that the FDA requires above 30% efficacy - not 50% as has been reported. That means we want at least 95% of the probability mass situated to the right of the VE = 30% point on the curve. With the results from that trial, this is almost certainly true.
What if the FDA has set 70% efficacy as the standard? We can use the same probability curve. Now we look for the probabilities to the right of 70%. This is still close to 100%.
Bayesian updating
I mentioned but never addressed one issue in the last post - why did the statisticians switch from the more intuitive vaccine efficacy (VE) to vaccine's share of cases (VSC)?
Here's one way to think about Bayesian models. We first set down our "prior" expectation, i.e. the pink line shown in the chart below (also reprinted from the last post):
This "prior" represents our subjective belief - prior to running the vaccine trial - of what the vaccine efficacy might be. The pink line is relatively flat across all values of VE with a slight lift towards 100%, indicating these researchers believe the most likely value of efficacy is close to 100% (!)
We can think of a Bayesian model as averaging a set of simulations. (This should remind you of the election forecasting models, which are also Bayesian.) For one such simulation, we select randomly a number from the pink curve (the "prior"). Given the flatness of the curve, pretty much any value of VE is equally likely, with a slight tip towards numbers close to 100%. Let's say we pick 75%.
In this particular simulation, the VE is now assumed to be 75%. The statistician now asks: if the vaccine's efficacy were 75%, and if I ran the trial to 94 cases, what would be the split of cases between vaccine and placebo?
If we flip a coin 94 times, how many heads will we get? That depends on the probability of the coin showing "heads". I didn't say the coin is fair so this probability does not have to be 50%. Now substitute "head" with "vaccine" and "tail" with "placebo", and you can see that the two questions are mathematically equivalent.
What is the probability of the coin showing "head" (i.e. "vaccine")? Is it 75%, the assumed VE for this scenario? Not really! The VE of 75% means the vaccine's case rate is a quarter of the placebo's. What we need is how many of the 94 cases turn out to come from the vaccine group. This number is the VSC - the proportion of cases coming from the vaccine group.
This is why the Bayesian model works with VSC instead of VE. As I showed in the previous post, for each value of VSC, we can derive the value of VE so the model can be interpreted in terms of VE, which is more intuitive.
Now back to the simulations. Each simulation starts with a different randomly-picked value of the VSC. This value of VSC is then used in our 94 coin tosses, after which we tally up how many "vaccine" sides turn up. So for that simulation, we have a split of cases between vaccine and placebo. We repeat these simulations, and average the results. This leads to the posterior probability curve (the black line) shown in the first chart of this post.
P.S. So far, I have ignored one small complication. In all these charts, I limited the VE to between 0% and 100% (from no effect to completely vanquishing the virus). However, VE can be negative! The vaccine could do more harm than good. In the VSC world, the share of cases can only go from 0% to 100%, and there are no other possible values. Note that when VSC exceeds 50%, i.e. the vaccine group accounts for more than half of the infections, VE is actually negative. The negative side of the VE scale is tricky - as in theory, VE can be negative infinity (all cases coming from the vaccine group).
[Added 11/18/2020] Here is the chart of the prior probability lifted from Sebastian Kranz's post. The right half of the probability density corresponds to theta > 0.5, meaning there are more cases in the vaccine group than the placebo group, i.e. VE is negative. Do they really believe that the vaccine has such a high chance (38%) of causing more cases than the placebo?
P.P.S. The analysis methodology is standard. As with most of statistics, the devil is in the definitions, the counting, and the exclusions.
Comments