In the last post, I featured a real-world study of vaccine effectiveness sponsored by the Mayo Clinic. The preprint is available here. Having read several similar studies, I think this is one of the better ones.

Real-world studies apply to data that have been observed, and not collected from a designed experiment. As a result, the vaccinated group and the unvaccinated group are not twins except for vaccination status. Thus, the analyst must correct for any biases that may explain part, even all, of the observed group difference.

I'll be making further observations about this research that reveals the limitation to what we can do in a real-world study.

How much covariate balance was achieved?

A key goal of statistical adjustment is to achieve covariate balance, which enables us to treat the two (adjusted) groups *as if *randomized. Matching by propensity score is not exact because the propensity score is a single number that summarizes all the matched variables (age, sex, race, ethnicity and number of prior PCR tests in the case of Mayo Clinic). Table 1 seems to confirm that the two groups post-matching have essentially the same age distribution.

Matching on age is justified because this coronavirus affects older people worse than younger people. The unvaccinated population, if not adjusted, is younger and so a simple analysis will understate the effect of the vaccine.

The true picture is messier, which can be found in Figure S1(B):

Focus on the columns, each of which is an age group spanning 5 years. The blue columns represent vaccinated people for whom they sought unvaccinated twins (shown in orange). The matched unvaccinated group contains an underweight of people aged 80-90. This gap is bridged by adding people over 90, and people aged 70-80. The excess of people over 90 in the control group is potentially an issue. This difference in distribution is obscured by the aggregate statistic shown in Table 1.

The counterfactual

Figure S1(B) is also revealing when placed side by side with Figure S1 (A), which displays the age distributions of the vaccinated and unvaccinated groups *prior to matching*.

The orange columns before (A) and after (B) matching look completely different. This means the matched unvaccinated group is far from representative of all unvaccinated people. This is expected because the statistical adjustment is used to carve out a subset that mirrors the vaccinated group.

The blue columns appear to be identical on both charts. This confirms that the analysts were able to find a match for every vaccinated individual. That is frequently impossible. But remember that upfront exclusions (such as zip codes with fewer than 25 vaccinated people, and people without prior PCR test results) already make the analysis set different from the set of all vaccinated people.

So what is the point of an observational study? It's to give a good answer to the **counterfactual** question - what would have been the case rate of the vaccinated group in the analysis set if they had not been vaccinated?

The other uses of real-world studies

Food for thought. What if we reverse the matching process - finding twins for the unvaccinated group? This analysis answers the reverse question: what would be the case rate of the unvaccinated group if they were vaccinated? I hope you can see that the resulting matches will be different.

In fact, this is the more interesting question. We expect most people to eventually get the shots so this result can be used to project the imminent future.

I don't get why all the real-world studies aim to replicate the finding of the randomized clinical trial (RCT). If they find a high VE, we learn yet again why the RCT is the gold standard. If their result is worse, we'll choose to believe the RCT because it is the gold standard. I just can't imagine a scenario in which the pharmas will conclude that the VE is lower than what was measured in the RCT, based on a real-world study.

Real-world studies have other uses. For example, it can provide additional color on subgroups. As we know, the RCTs did not enroll enough minorities to provide estimates with reasonable margins of error. They also did not have enough people to analyze geographical differences. Unfortunately, the studies so far have not focused on these aspects that can bring new knowledge.

The one quantity that can't be matched

I now address the biggest concern I have about these real-world studies. They claim an ability to trace the timeline of vaccine efficacy from Day 0 (first shot) through Day 7, 14, 21, ... etc. I'm going to explain why these studies don't yield accurate interim estimates. They should focus on total counts inside a wide case-counting window.

In the vaccine trial, the control group takes two placebo shots. Therefore, each unvaccinated participant has a well-defined Day 0, so that subsequent positive tests can be positioned relative to Day 0.

In a real-world study, there are no placebo shots. As explained in the last post, the Mayo Clinic-nference team defines Day 0 for an unvaccinated individual as the day of the first shot of the paired vaccinated twin. The problem is that there are many possible unvaccinated people to match to each vaccinated person.

Consider an unvaccinated person who gets infected on February 1. The study period for the Mayo Clinic research was from December 1 to Febraury 8. This person - based on zip code, sex, race, ethnicity and number of prior tests - is eligible to be matched to several vaccinated individuals. In each matched pair, Day 0 is fixed by the vaccinated person. Different vaccinated persons may have different Day 0. Therefore, the case on February 1 can end up on any part of the cumulative case curve, depending on which vaccinated individual the matching algorithm pairs this person with!

If the unvaccinated person is matched to someone who got vaccinated on January 31, then the infection on February 1 is counted as a Day 1 infection. If s/he is matched to someone who got vaccinated on January 1, then the same infection on February 1 is counted as a Day 32 infection.

Smarter minds may figure out how this issue can be solved. However, I know it can be avoided by not analyzing the shape of the curve - focus on the total count recorded from Day 0 to the end of the study period. This recommended analysis is independent of the timing of the cases.

Which part of the cumulative case curve is most reliable?

Lastly, I want to draw your attention to Figure S2(A).

This chart brought me so much joy! When reviewing the RCTs and the real-world studies, I kept wanting to see this chart but other research teams failed to sate my hunger.

The chart shows the number of days each vaccinated person in the analysis set has been followed since Day 0 (first shot). This chart is pivotal to our understanding of the VE calculations in an interim analysis.

It seems like an eon ago when I called for the FDA to require two months of follow-up for every trial participant, and not the median trial participant. In the above chart, we can roughly place the median follow-up time at around Day 30. Half the people have more than 30 days of follow-up and half have fewer.

Imagine if this analysis were performed after all people have at least 60 days of observation. We should see a single column at Day 60.

Now look at the very first column on the left side sitting at Day 1. These are people who got their first shot the day before the study period ends so that the analysts only have one follow-up day for them.

Take a quick look at the famous Pfizer cumulative case curve:

I don't have Pfizer's version of Figure S2(A) but imagine it. The first column on that chart were people from the trial that have had a single day of follow-up. This means they can only impact the leftmost edge of the cumulative case curve.

By contrast, the right side of the cumulative case curve consists only of participants who have reached a long follow-up period.

In other words, the part of the curve that has the greatest "support" is the left side. The further to the right of the curve, the fewer people are contributing to the counts, the less reliable is the data. If they showed margins of error, the margins will grow quite drastically moving left to right. What have most analysts been doing with the Pfizer curve? They chop off the left side (link).

This timing issue only affects interim analyses. Let's say we conduct a "full" analysis when all participants have reached 112 days of follow-up (the right edge of the curve above). If we had a Figure S2(A), we would see a single column at Day 112. In that case, every participant is affecting the entire length of the curve. (Technically, we have censoring due to testing positive, death or dropout but not due to interim analysis.)

***

With that, I finish my review of the Mayo Clinic paper. I will turn my attention to other real-world studies. There is considerable variability between these studies, as, you might have noticed, there is quite a bit of art in this science.

## Recent Comments