You can follow this conversation by subscribing to the comment feed for this post.

I am somewhat surprised by the bolded statement in the last paragraph. An analysis with varying followup is only fallacious if the timing is ignored in the analysis. An entire subfield, survival analysis, was developed to properly analyze such data. Indeed, the protocol specifies Cox proportional hazards regression as its primary analysis method, which does incorporate the followup information. We can argue whether the proportional hazards assumption is reasonable, especially in the light of a second dose, but that does not seem to be your point.

Your concern about median followup restriction of 2 months makes sense in terms of safety concerns, but I don't think it is a direct threat to the validity of the survival analysis.

AS: Great comment. I agree that I should qualify the bolded statement by adding "without making assumptions and adjustments". Now that you opened this can of worms, I do wonder about the survival analysis.

Here's my understanding of how one would do survival analysis when the population has variable observation frames: on the day of analysis, every participant's data are frozen; the median observation time is computed, let's call it two months; the maximum observation frame, say, is three months; anyone who has been observed fewer than three months and has not been infected or dropped out is regarded as censored (due to interim analysis).

Without doing a simulation, I'd guess that this means:
a) by design, we introduced a type of censoring determined by enrollment time (or observation time). The earlier a participant enrolls, the lower the chance of censoring.
b) almost everyone in this analysis is censored by a number between one day and 3 months minus one day.
c) this censoring is treated as if the participant has dropped out although unlike a drop-out, this censoring is forced by the analysis design (most of which "disappear" if we wait till the full analysis).
d) the uncertainty band increases dramatically as observation time increases since we have the full interim sample on day 1 and have almost no one by month 3
e) at the median observation time (two months), exactly half the sample size of the interim analysis contributes to the estimate of the hazard function, probably too wide to be useful
f) basic survival analysis isn't magic; by assuming that the censoring is independent of the outcome, the data accumulated during the shortened observation frame can be combined with the rest of the data. The people who have been observed 3 weeks do not improve our estimate of the hazard function beyond 3 weeks, and if the goal is to establish that the protection period is longer than 2 months, I think we have a problem.

To summarize, I'm concerned because forced censoring coexists with a reduced sample size for interim analysis plus a compressed time line due to early reading. And I'd like the vaccine to have at least two or even three months of protection.

Let me know if there are other adjustments I'm missing.

For reference, the letter from experts about rule #1. They have the same concern as I laid out here. https://www.statnews.com/pharmalot/2020/10/06/fda-covid19-coronavirus-pandemic-vaccines-trump/

The comments to this entry are closed.

##### Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

## Search3

•  only in Big Data
Amazon - Barnes&Noble

Numbersense:
Amazon - Barnes&Noble

## Junk Charts Blog

Graphics design by Amanda Lee

## Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

## Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here