« The Swedish mirage: the verdict is already written | Main | Two election forecasting models, negative correlations, and model assumptions »


Feed You can follow this conversation by subscribing to the comment feed for this post.

Aniko Szabo

Interesting, I would have interpreted the statement "the vaccine worked in 50% of people" differently. I imagine a binary status after being given a vaccine: either the vaccine worked in me (eg I had a good immune response) and now I am protected, or it did not and I am unprotected. Then _if exposed to the virus_, the protected people will not get sick, but the unprotected ones will. So the number of people who were prevented from getting sick would still depend on the overall prevalence and probability of being exposed to the virus.

I think this is in fact a reasonable first-order approximation to how vaccines/infections work, kind of like an epidemiological SI model (susceptible/immune). Of course, other interpretations are also possible - perhaps the vaccine halves the probability of becoming sick given an exposure. This would look the same for observed outcome in the trial, but I would associate an effect of this type with, say, masks, not vaccines.

Anyway, I do wonder how a "random" person would interpret the concept of a "vaccine working" - maybe somebody already tried to research this? The comments about the duration of the protections are very salient, of course, and are of major concern with the trials. They seem to be really optimized for the situation of long-lasting immunity, which is common but certainly not universal in the virus world.


This a bit of a quibble, but it seems likely that the rate of infection we have observed so far is probably dampened by other measures like wearing masks, avoiding crowds, etc. One of the benefits, or unintended consequence, of a vaccine would likely be that people would no longer take those precautions, so it seems likely in that scenario the rate of infection would be higher.


AS: Useful comment. Let me add the following. The actual definition is given by CDC here. However, one recent comment by Dr. Fauci (heard on TV) sounds like they might drop that metric and go with antibody response, which is what you're talking about.

But if generating antibodies is the first-order goal, and not necessarily preventing infections, then we don't need large-scale RCTs to measure that. I'd even argue in the warp-speed environment, you don't even need to run placebo. Just inject the vaccine, wait some days, and measure antibodies.

The rough math of the Moderna trial is 30,000 participants, 15,000 given placebo, expecting 0.75% infection i.e. 113, 15,000 given vaccine, with infection halved to 0.375%, i.e. 56 infections, so when the total number of infections reaches 170, the vaccine is declared efficacious. (This ignores dropouts and other complications but gets us close to the actual threshold.)

What is interesting to think about is what do we know about the ~99% of the participants who are not infected when we look at the data from the trial. We cannot say that they have been protected for sure, all we know is that they would not have been infected regardless of vaccine or placebo during the period of observation.

Other possibilities are (a) they have not been exposed to the virus during the trial so the verdict is out (this leads to my discussion of observation time and time of protection) (b) they have been exposed but their body fought it off (this can be measured if every participant is tested meticulously but that effort appears to be deemed too high) (c) they have been infected but the infection was not detected.

Regarding your last question, I don't want to call a journalist a "random person" but it's pretty clear from that headline that 50% efficacious is equated to 50% of the people won't get infected, which is not what the CDC definition given above says.


TBW: I was shocked when I first learned the assumption of such a low infection rate over 6 months. The people designing the trials are more knowledgable so I have no basis to object to it.

A side issue that I will write about soon is that we cannot assume that the sequence of enrollment is random, and so, analyzing the demographics and characteristics of the subgroup - including mask wearing behavior - used in the interim analysis is important.


A few points. There is a difference between approving a drug or vaccine and the use of it. Generally any reasonable level of effectiveness is required for registration. In some cases there is good reasons for a clinician to choose a treatment that isn't as effective as the best, so regulatory authorities make it available and then the market decides. In the case of COVID-19 vaccines it will be mainly governments who choose the vaccines, as they will pay for them, and it will be based on price, effectiveness and risk. Moderna's vaccine will be priced 2 to 3 times higher, as they can't afford to subsidise like the majors.

A vaccine with 50% effectiveness possibly will be all that is required but it makes life a bit more complicated. If we assume that we can get the reproduction number down to 1.5 by some mild interventions, contact tracing and testing, then halving this with a vaccine would give 0.75 and the epidemic would disappear over a few months. It would be better to have 90% efficacy, as then we could just vaccinate and forget, as we do with measles, except with COVID-19 we may achieve worldwide eradication.

The Moderna study uses a specified total cases for teh study, and then they are compared using a stratified analysis to take into account any mismatch. Assumptions of a low infection rate are fine, if they are higher then the number of cases is achieved faster and the trial finishes sooner.

Also, it should be mentioned that the FDA will be overseeing everything that happens in these trials. They have in the order of about 50 Phd level statisticians plus all the doctors and support staff.


Ken: I hope people are not thinking 50% efficacy is useless. Cutting the risk by half is a good thing. I just want to point out it's not true that half the people are protected.

Agree with most of what you wrote. One caveat: the sample size is based on the assumption of the low baseline - if that assumption proves too low, the threshold will be reached earlier, as you stated. However, this is bad statistics. If you plug the correct baseline into the sample size formula, you'd need a larger sample to hit the significance threshold.


The baseline is rather indeterminate. It has depended very much on what measures the population and government take to minimise infection. If infection rates in America had been a third of what they have been, they would be waiting until sometime next year. There is a second important aspect to a drug trial, and that is the level of severe adverse events and they need to be low. My recollection is that 15,000 subjects gives a 95% chance of finding a 1 in 5000 event. I think they need more than that. I expect that a requirement will be to keep track of the adverse events.


Yes, the safety evaluation is also important. We can do a quick analysis. Assuming we need to vaccinate 60 percent of the population, in the U.S., that translates to about 200 million people. A 1 in 5000 chance means the severe adverse event affects 40,000. That's a lot of people depending on the meaning of severe. Also, it seems that adverse events sometimes take time to surface so again while it's great to have these results, we have to be patient.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep