If it's not already clear, observational data is a great playground for statisticians. We sniff out where biases are hiding, and we devise methods to correct such biases. It's a lot of fun. It's also frustrating - because there are endless ways to make adjustments.
In today's post, I examine a couple of curious observations made by the Israeli researchers when they analyzed the data from vaccine boosters (link).
***
The researchers shrank the follow-up window when counting severe cases (relative to counting cases)
This decision surprised me because severe cases are a fraction of all cases so I'd want to have a longer counting window, not a shorter one.
Recall that in the analysis of cases, the Israel booster real-world study pulled data from the national database for the month of August. Because of a case-counting window that starts 12 days after the booster shot, in effect, the study only counts cases from August 11 to August 30. Most people did not get the booster on July 31st, and so the average follow-up time is probably around 10 days. (See the previous post for more.)
For severe cases, the case-counting window shrank by 4 days, as the researchers stopped counting on August 26 instead of August 30. Thus, the average follow-up time for severe illness is around 6 days. You're right - protection against severe illness was shown for less than a week.
Here is a direct quote that explains why the case-counting window was shortened for severe cases relative to all cases:
In order to minimize the problem of censoring, the rate of severe illness was calculated on the basis of cases that had been confirmed on or before August 26, 2021. This schedule was adopted to allow for a week of follow-up (until the date when we extracted the data) for determining whether severe illness had developed.
In the Appendix, the researchers offered the following chart as supporting evidence:
This chart shows the lag between testing positive and the onset of severe illness for Covid-19 cases between November and March 2021 (conditional on people subsequently getting severely ill). The median lag is 4 days, meaning among those who eventually got severely ill, half got severely ill within 4 days of testing positive, and half got severely ill after 4 days (up to almost 30 days).
I believe the logic is that for anyone who got sick in the final 4 days of August and became severely ill, the onset of severe illness would have taken place after the data cutoff for this study.
Consider the following implications:
- According to the chart above, half of the past patients took over 4 days and up to 30 days to develop severe illness. If we want to capture all severe cases, we ought to extend the follow-up period to at least 30 days - it should be 60 days if we use a one-month cohort. The methodology based on median lag time would capture only half of the severe cases.
- Think about the severe cases that occurred between Aug 26 and Aug 30. These severe cases are connected with infections that occurred prior to August 26, most likely during August. They are excluded from the analysis. All of these severe cases should count. The severe cases that happened on the first 4 days should be dropped, if we use their logic. (Note that the infection rate was skyrocketing during August.)
- Now think about someone who took the booster shot on August 16. If this person became severely ill on August 26, the severe case is not counted. But this severe case represents a lag time of 10 days from detection of infection, which according to the chart above, happens more than 10% of the time. This issue arises because the case-counting window is given a fixed ending date regardless of when the booster was administered.
***
The researchers disclosed they faced an incurable imbalance problem.
Part of the paper's Appendix is devoted to alternative analyses, one of which is a matching study. I discussed examples of matching studies before (Mayo Clinic, and Clalit). Unfortunately, this disclosure did not contain enough details to evaluate the quality of matching.
The following statement caught my eye: "Due to the very small number of severe cases, and the high censoring proportion, calculation for severe COVID-19 was not possible."
In my comments on the Clalit study (link), I mentioned a limitation of matching studies: if selection bias is severe, it may be hard to find acceptable matches, leading to dropping many unmatched cases ("high censoring proportion"). In this case, the matched population ended up being much younger than the vaccinated population.
In the Israeli booster study, the researchers attempted to find one person who hasn't taken the booster to match with each person who took the booster. Matching variables are the usual suspects like age group, second-dose date and so on. They appear to be saying that they failed to find adequate matches for a large proportion of the booster group. This is a sign of incurable covariate imbalance. This also suggests that those regression adjustments used in the main analysis are insufficient.
Hi Kaiser,
So I have questions on infections section, I think also for the severe. What happens to the persons /events/person daysb@risk for 12 days between non boost and boosted status?
Also one observation. The infections over this study period (so here also only from 10 Aug -31) is 13000+. In the whole of period from Mar they say its 16000. This is very strong and sudden consistent weakening of immunity orvwhat might be othrr explanation? Also when looking at this number as some form of prevalence. Could we use the historic infection rate as a baseline assumed infection rate over study period or because never infected should we use lifetime probability of infection as a base?
One more thing after boost they say there may be behaviour differences. In thus age group from stories one bahaviour is feeling illbforca few days. ( not mention)
Tx!
Posted by: A Palaz | 11/16/2021 at 05:18 PM
So just some sub sample prevalence overcstudy from exclysions for comparisons.
All 1186000 13474 1.14%,
unknown gender 779 4 0.51%
boosted pre July 30 3076 41. . 1.33%
returned travel in August 29758 424 1.42%
In study 1137804 13009 1.14%
Posted by: A Palaz | 11/16/2021 at 07:00 PM
AP: In that particular study, they took 12 days out of the time at risk. But note that the 12 days for no-booster group are the first 12 days of the study for everyone, while for the booster group, they vary depending on the date of the booster, which means the 12-day window was skewed later in all cases compared to the no-booster group.
I don't like the pre-July exclusion, and I don't understand the travel exclusion. If the result is to apply to the entire population who took boosters, then that group always includes people who travelled.
I think in terms of prevalence, what you need is cohort adjusted but it's hard to get this information. When the formula is number infected divided by number exposed, the problem is that some get infected 10 days after the shot while others may get infected 100 days after the shot, and the simple division ignores this. Further, lifetime chance is more meaningful but again, I think it's hard to get data on that.
Also bear in mind that because of the case counting window, all numbers are conditional! All we have is the chance of infection given that the person did not get infected within the first 12 days after the shot.
Posted by: Kaiser | 11/16/2021 at 08:13 PM
Hi Kaiser,
So I think the exclusions bare interesting be cause they poses the question why prevalence of immuno/fading/senesence higher/ lower in each group.
One other problem I am having is to replicate the days at risk for boosted ~10m and non-boosted ~5m. For me I can only do this with a short boosting cohort (less than 7 days) and because non boosted not allowed until August 10 , a separate non boosted group. So an N to a b with N later.
Did you observe this too.?
Posted by: A Palaz | 11/18/2021 at 05:48 PM
AP: yes, the at-risk numbers make no sense to me. In particular, why is there a huge dropoff in exposure time in the booster group for severe cases vs. cases - which is not replicated in the non-booster group? The problem with many of these Covid-19 studies is insufficient disclosure of where the numbers came from.
Posted by: Kaiser | 11/21/2021 at 06:20 PM