
A reader sent me a new study from the Veterans Health Administration about the Omicron Covid boosters, introduced in September 2023 (link). These boosters were one of the first of many “boosters” that were reformulated vaccines targeted at new coronavirus variants, all of which were approved by the FDA without conducting clinical trials. The VHA researchers used real-world data to measure the vaccine effectiveness (VE) of these boosters. In short, they found these boosters lacking, with negative VE against cases, and low VE against hospitalization or death, topping out in the 30% range but waning to below 10% within the first six months.
Reading this report brings back unpleasant memories because they contain some of the same loose language that encourages sloppy thinking, misleads people, and hides iffy science. I’ll be liberally quoting from the above linked paper but the same type of writing can be found in many similar papers I have read during and after the Covid pandemic.
First, a quick refresher on observational data. Researchers pull health records, which include people’s vaccination and medical history. Without a clinical trial, people self-selected into treatment, that is to say, they decided on their own to take the booster shot (or not). Therefore, any observed difference in outcomes could be partly due to inherent differences in the comparison groups (technically called confounding). The random assignment of treatment, as implemented in a clinical trial, ensures that the comparison groups are statistically identical, removing this complication. A good analysis of observational data starts with a reasonable model of the self-selection of treatment, followed by effective adjustments to deal with these confounders.
***
Now, I’ll point out three of the worst cases of loose language in these Covid studies:
“Exact matching”
One of the more popular techniques for dealing with confounding is “matching”, a procedure designed to artificially create pairs of treated and not treated subjects, with the hope that they might behave as if treatment were randomly assigned. To match pairs of subjects requires a set of matching variables, typically variables such as gender, age, and preexisting health conditions.
There are many matching methods. One such method – popular because it is easy to describe – is known as “exact matching”. For example, if we have a vaccinated subject who is a 59-year-old male diabetic, we want to pair him up with an unvaccinated subject who is also a 59-year-old male diabetic. Each matching variable must match exactly, age to age, gender to gender, diabetes condition to same.
But … in the cited paper, what is tagged as “exact” is not actually “exact”. For example, matching on age is done using broad age groups, rather than single-year age. Only three age groups were used: 18-64, 65-74, and 75+. So someone who is 30 can be matched to someone who is 60, which is not what one can reasonably describe as an exact match.
Similarly, another matching variable is called “CAN score”, which is a predicted probability of dying within the next year. This probability score is also grouped into three broad buckets (0-50, 51-89, >=90) before matching so that someone who has 51% chance of dying can be matched to someone with 80% chance of dying. In this paper, that’s an “exact” match.
“Unvaccinated”
Every Covid study compares a “vaccinated” group to an “unvaccinated” group but in almost every case, the unvaccinated have taken at least one Covid vaccine shot – they just haven’t taken the particular vaccine shot that is the subject of the study.
In the above linked paper, they studied the Omicron vaccine shot. But to be eligible for inclusion, a person must have “received at least one documented Covid-19 vaccine in the VA healthcare system at any time in the past”. Therefore, what they label as “unvaccinated” are vaccinated people who decided not to get the Omicron booster. What’s wrong with calling the treated group “Had Omicron booster” and the untreated group “No Omicron booster”?
Take a look at this table header from the article’s Supplement:

If you haven’t read all the details in the paper, you’d assume that they are comparing one group of people who have taken an updated Covid-19 vaccine against another group of people who have had “no vaccination”. But then you’d have misread the study.
The conclusion
Almost all conclusions of these Covid studies mislead through over-generalization.
In this paper, the conclusion stated:
“COVID-19 vaccines targeting the XBB.1.5 variant of Omicron were not effective in preventing infection and had relatively low VE against hospitalization and death, which declined rapidly over time.”
This conclusion only applies to (a) the people in the VHA system who (b) meet the eligibility criteria who (c) look like someone in the matched groups.
If a clinical trial were done by the VHA with the same eligibility criteria, then the (a) and (b) restrictions would still hold while (c) could be dropped because there would have been randomization in place of matching.
One key fact that is always glossed over is that the matched subpopulation does *not* look like the population. In matching, each Omicron vacinee is matched to someone who didn’t take the booster. So after matching, the no-booster group looks just like the booster group. But, the no-booster group post-matching does not look like the no-booster group pre-matching! The magic of matching comes from dropping no-booster people who have no counterpart in the booster group.
All of the above is reflected in the statistics included in the paper.
The following table (relegated to the Supplement) displays the pre-matching statistics.

One immediately learns that the no-booster subpopulation is 4 times as large as the Omicron subpopulation, that self-selected Omicron subjects are much older, less likely to be female, much more likely to have pre-existing comordibities, more likely to have many recent primary care encounters, and also more likely to have gotten a Covid vaccine within the last year.
What happens after matching? The two groups now have the same number of subjects. The column showing the Omicron group doesn’t change while the no-booster column looks almost the same as the Omicron group. (This is found in a Table in the paper.)

Take age groups for example, the post-match, no-booster distribution is 27%, 33%, 40% for 18-64, 65-74, and 75+, which is exactly what it is for the pre-match, Omicron group. But the pre-match, no-booster group’s age distribution is 48%, 23%, 29%! Evidently, those dropped from the analysis who didn't take the booster don't look like those who were matched.
Importantly, the result of this analysis of matched pairs can shed light only on the effect of the Omicron booster on those who choose to take it, which is only about 25% of the entire population (under study). Matching is needed precisely because the no-booster group have different characteristics than the booster group (as seen in the pre-matching statistics); therefore, whoever is in the subgroup of no-booster who look like those in the booster group cannot be generalized to the unmatched no-booster people.
I'm not trashing the matching methodology. It's a very useful technique for analyzing observational data. It just doesn't generalize in the way that many researchers imply.
***
It's sad that paper after paper pushes misunderstanding by using loose language. We can and should do better.
Recent Comments