Real-world studies are hard because observational data are packed with known and unknown biases. One thing is for sure: any analysis that doesn't correct for known biases is clearly flawed.

When discussing real-world studies of Covid-19 vaccine effectiveness, I have often lamented the ill effect of "case-counting windows". Recall that none of the vaccine studies count all cases from the day the participants enter the trial and get their first shot. In the case of Pfizer, all Covid-19 cases that occurred prior to the 7th day after the second dose were removed, in effect, nullified.

This procedure is feasible in the vaccine trial, when the same case-counting window can be applied to both vaccine and placebo arms. In real-world studies, the unvaccinated people do not take placebo (saline) shots, and therefore, it is impossible to locate 7 days after the second shot. As a result, the case-counting window removes cases only from the vaccinated group and not from the unvaccinated group.

**tldr; Scroll to the end of the post**

***

I'll show you the effect visually. We start with the cumulative case curve that was displayed in the Pfizer FDA Briefing document.

This chart is straightforward. As time passes, the number of cases grew on both vaccine and placebo arms. At the time of the interim analysis (which led to the emergency use authorization), the cumulative case rate for the vaccine arm was 0.25%, or 25 out of 10,000 was found to be sick while that for the placebo arm was 1.4% or 140 out of 10,000. Because of the trial setup, we can assume all else being equal, and so the vaccine is said to have cut the case rate by just over 80%. The vaccine efficacy is 80%.

But Pfizer said VE was 95%, not 80%. This is because of the case counting window. All cases on either arm that occurred prior to 2D+7 days are nullified. Like this:

This case-counting strategy knocks down the case rates for both placebo and vaccine arms. The placebo group now has a cumulative case rate of 0.8% at interim analysis, while the vaccine group has a cumulative case rate of 0.04%. The vaccine is now said to have cut the case rate by 95%. (But this number is not comparable to the 80% number above. The 95% efficacy applies only to people who did not get sick prior to 2D+7 while the 80% number applies to everyone in the vaccine trial.)

What happened is purely arithmetic. The second calculation is based on a subset of the data used in the first calculation.

***

Next, we do a thought experiment. We pretend that the data came from a real-world study instead of a clinical trial. We will therefore relabel the placebo group as "unvaccinated" and the vaccine arm as "vaccinated". We will now compute a vaccine effectiveness using the methods from real-world vaccine studies.

Here is the picture:

The cumulative case rate for the unvaccinated group is 1.4% while that for the vaccinated group is 0.04%. Thus, the vaccine apparently cut the case rate by 97% - almost perfect protection, even better than what was seen in the vaccine trial.

What just happened is that I took the orange line (for unvaccinated) from the first chart and the blue line (for vaccinated) from the second chart. The blue line is uncontroversial: given the dates of vaccinations of the vaccinated people, the analyst can compute how many days after the first shot someone got sick with Covid-19, and therefore apply the case-counting window, removing all cases occurring prior to 2D+7. This mimics the formula used in the vaccine trial.

When it comes to the unvaccinated group, the analyst could not apply a case-counting window. That's because the unvaccinated group did not take any injections - unlike the placebo participants in the vaccine trial who took saline shots. As a result, the analyst just counts all the cases in the unvaccinated group during the entire study period. This case rate is then compared to the case rate of the vaccinated - but restricted to various case-counting windows.

The beauty of this thought experiment is that it allows us to estimate how much pro-vaccine bias has been introduced by this case-counting window. On the second chart, we know that if case-counting could be applied to the orange line, the case rate would have been 0.8%. On the third chart, we know that the case rate was treated as 1.4% when case-counting became infeasible. Thus, the baseline case rate is inflated by 70% due to this asymmetric application of case counting.

Without a doubt, these real-world studies produce vastly inflated estimates of the true vaccine efficacy. (And here, I have only covered one of many biases in these studies.)

**TLDR;**

Real-world data are messy because unlike a clinical trial, there is no design to constrain biases. Real-world studies are evalulated on whether they have sufficiently corrected for known biases. One useful trick is to run the real-world study methodology through an unbiased dataset (e.g. data coming from a clinical trial or randomized experiment.) One hopes that the RW methodology yields the same answer as when one analyzes the clinical trial using techniques for analyzing trials.

The analysis here shows that the most popular real-world study methodology over-estimates the effectiveness of vaccines. The reason is that the case-counting window is applied symmetrically in the trial analysis but only to the vaccinated subgroup in a real-world study. This asymmetry causes the baseline case rate to be inflated by 70% (in the case of Pfizer), and vaccine effectiveness is the relative improvement over that baseline.

The case-counting window for Pfizer starts at 7 days after the second dose, meaning that all reported infections happening before that day are nullified when analyzing the clinical trial. In the real-world studies, these earlier infections are nullified only for the vaccinated group but not for the control group. This is a pro-vaccine bias built into real-world data. The only remedy is a correction performed at analysis. A first-order correction is to divide the unvaccinated case rate by a factor of 1.7 before computing vaccine effectiveness. (This correction however does not adjust for many other biases found in real-world data.)

## Recent Comments