There is one other curious thing about the matching procedure used to pre-process observational data in the paper analyzing the effect of the Omicron booster from my previous post (link).
As noted, in the table showing pre-matching statistics, there were 3.8 million people who didn't take the booster against 588,000 who did. So, there were about 6.5 non-boosted people for each boosted person. The matching task is to find one non-boosted person to pair up with each boosted person. We therefore expect the vast majority of the 3.8 million non-boosted people to be dropped from the study population.
Now, in the article itself, it said that 511,352 unique control participants were matched. This implies that over 10% of these matched non-boosted people were matched to multiple boosted persons. This usually happens when there aren't enough untreated persons available for matching.
However, they had on average 6.5 untreated available to match to each treated. So there didn't run out of unique untreated people. The next possibility is that the treated and untreated groups were sufficiently different along certain dimensions that the same untreated person must be used multiple times. For example, if there were 10 women aged 95+ with certain set of preexisting conditions in the boosted group, and there were only 2 such people in the non-boosted group, then the same two non-boosted women would appear on the other side of the pair for all 10 matched pairs. In other words, the more re-use there is, the stronger the suggestion that the two groups had some major differences.
In such circumstances, I'd prefer to drop the treated person from analysis rather than use an untreated person multiple times.
Think about this scenario. Let's say there are a particular demographic segment with 5 treated and 1 untreated. The one untreated person will be part of all five pairs. Now, assume that the untreated subject died. That death would have been replicated five times after matching. The matching turned a single death into a 100% death rate in that segment.
Alternatively, assume the untreated person did not get Covid during the observation window, but 2 of the five treated did. Replicated five times, the estimate for the untreated group will be zero cases, out of five.
Matching sounds simple, but details matter. Most published studies don't disclose nearly enough for an outsider to figure out the quality of matching.
Comments