I offered a few high-level comments on the widely publicized CDC study of real-world effectiveness of the mRNA vaccines in my previous post. Today, I take a deeper dive into the study.
The main value of this real-world study comes from the weekly swabs requested from each participant. Unlike other real-world studies based on "found data", this CDC study is an organized effort with enrolled participants who agreed to send in swabs so their infection status can be determined each week. Of the vaccine trials I reviewed, only the Astrazeneca included this feature (although only for the U.K. trial and possibly only a part of that). These swabs reveal asymptomatic cases. As a result, the measured infection rate is likely to exceed what was observed during the vaccine trials.
The study reported an admirable compliance rate on those swabs. The median participant submitted all requested swabs.
Similar to the Danish study, the analysis population consists of specific higher-risk people. The CDC study focuses on healthcare workers and essential workers. Like all real-world studies, we must be careful when generalizing study results. It's one thing to say the vaccine was found to be highly effective for healthcare & essential workers; it's a completely different thing to claim that the study showed the vaccine to be highly effective for everyone (which is what the media and many "experts" have been touting all day).
Unlike the Danish study (based on comprehensive, found data), the CDC study does not include every healthcare worker or essential worker. Because of the enrollment requirement, the study population is self-selected so the first question to ask is whether the analysis population is representative of all healthcare workers and essential workers. There is nothing in the paper to answer this question. Table 1 discloses that 7 out of 10 people in the study are under 49, over 80% are white and non-Hispanic, and 70% has zero chronic condition (left undefined in the paper). The analysis population is thus significantly younger and healthier than our average American.
There exists two validity questions. The above deals with the difference between the analysis population and the general population. The next question of validity concerns the differences between the vaccinated group and the unvaccinated group within the analysis population.
There are two sentences in the CDC study that perfectly describes the challenges of any real-world effectiveness study:
For example, the infection rate in Miami, Florida was 8.6%, which is 65% higher than the average of 5.2% in the study. Twelve percent of the unvaccinated group work in Miami, compared to 3.5% of the vaccinated group. The infection rate in Portland, Oregon was 1 percent, much lower than average, and 90% of the Portlanders included in the study was in the vaccinated group.
Another example is occupation. The infection rate of primary healthcare workers was 2%, and over 90% of them were in the vaccinated group. Meanwhile, the infection rate of first responders was 9% (over 4.5 times higher), and 40% of them were unvaccinated.
Those two sentences suffice to explain the field of real-world studies. The researchers conclude that VE is very high, that is to say, the unvaccinated group has a much higher infection rate than the vaccinated group. They suggest that the vaccine is wholly responsible for the observed difference in infection rates. But any of the following statements can also be true:
- Some or all of the difference is explained by the over-representation of higher-risk males in the unvaccinated group
- Some or all of the difference is explained by the over-representation of higher-risk Hispanics in the unvaccinated group
- Some or all of the difference is explained by the over-representation of higher-risk first responders (and under-representation of lower-risk healthcare personnel) in the unvaccinated group
- Some or all of the difference is explained by the over-representation of higher-risk people living in Arizona, Florida and Texas (and under-representation of lower-risk residents of Minnesota and Oregon)
The two groups being compared differ not only by their vaccination status but also by gender, race, occupation and state of residence. The unvaccinated group has an over-representation of higher-risk individuals of each category.
If you recall my posts about the studies by Mayo Clinic or Israel's Clalit, this is the imbalance problem they solved by the matching procedure. In the CDC study, they ignored all these biases except for study site.
For studies using regression models, the usual corrective mechanism is to include adjustment terms, similar to the calendar time bias adjustment used in the Danish study. For more details, see my previous post.
In the CDC study, the only factor they adjusted for is study-site bias.
In a side comment, the researchers said they considered regression models that adjusted for sex, age, ethnicity and occupation ("individually"), and the change to VE was "<3%." They appear to be saying that those factors are unimportant because the change to VE is small. And yet, in Table 2, we learn that the study-site adjustment changed the VE from 91% to 90%, which is a difference of 1%.
I'd have preferred to see a model that includes all variables that can explain the difference in infection rates between the vaccinated and unvaccinated groups. Even if the effects of some of these variables are not statistically significant, they need to be in the regression model to obtain a better estimate of the effect of vaccination status - this is because of the correlation between vaccination status and those demographic variables (which is due in part to self-selection).
Curiously, the CDC study did not do a calendar time bias adjustment, and therefore, the bias identified in the Danish study is present. This bias originated from the drastic decline in infection rates during the first quarter of 2021. Upon vaccination, an individual migrates from the unvaccinated group to the vaccinated group. If we tally up the person-hours for the unvaccinated, they will skew towards the start of the study compared to those for the vaccinated group.
The time bias in the CDC study might be mitigated by the unusual enrollment cadence. While the study ran from mid December (start of vaccinations) to mid March, 60 percent of participants got their first doses in December. That leaves 40 percent. Since 25 percent remains unvaccinated to the end, only 15 percent got their first shots between January and mid March. So, the vaccinations were concentrated heavily in the first two weeks of the study.
This sheds some light on the self-selection bias problem described above. It seems that the 25 percent who remain unvaccinated chose not to get inoculated as there was plenty of time for them to do so.
Further, because the case counting window applied to the vaccinated group only, infections in the first few weeks of this study period can only appear in the unvaccinated group, and we know the overall infection rate was higher in those weeks.
Adjusting for these remaining biases will not wipe out the vaccine effectiveness but will provide a more realistic and believable measurement.
I realize I still haven't gotten to the "partial vaccination" analysis. That will appear in a future post.
Recent Comments