« Boosters in Israel illustrate statistical issues with real-world effectivness analysis | Main | Thirty percent unvaccinated in healthcare: less than meets the eye »


Feed You can follow this conversation by subscribing to the comment feed for this post.

A Palaz

Hi Kaiser,

So I have questions on infections section, I think also for the severe. What happens to the persons /events/person [email protected] for 12 days between non boost and boosted status?

Also one observation. The infections over this study period (so here also only from 10 Aug -31) is 13000+. In the whole of period from Mar they say its 16000. This is very strong and sudden consistent weakening of immunity orvwhat might be othrr explanation? Also when looking at this number as some form of prevalence. Could we use the historic infection rate as a baseline assumed infection rate over study period or because never infected should we use lifetime probability of infection as a base?

One more thing after boost they say there may be behaviour differences. In thus age group from stories one bahaviour is feeling illbforca few days. ( not mention)


A Palaz

So just some sub sample prevalence overcstudy from exclysions for comparisons.

All 1186000 13474 1.14%,
unknown gender 779 4 0.51%
boosted pre July 30 3076 41. . 1.33%
returned travel in August 29758 424 1.42%
In study 1137804 13009 1.14%


AP: In that particular study, they took 12 days out of the time at risk. But note that the 12 days for no-booster group are the first 12 days of the study for everyone, while for the booster group, they vary depending on the date of the booster, which means the 12-day window was skewed later in all cases compared to the no-booster group.

I don't like the pre-July exclusion, and I don't understand the travel exclusion. If the result is to apply to the entire population who took boosters, then that group always includes people who travelled.

I think in terms of prevalence, what you need is cohort adjusted but it's hard to get this information. When the formula is number infected divided by number exposed, the problem is that some get infected 10 days after the shot while others may get infected 100 days after the shot, and the simple division ignores this. Further, lifetime chance is more meaningful but again, I think it's hard to get data on that.

Also bear in mind that because of the case counting window, all numbers are conditional! All we have is the chance of infection given that the person did not get infected within the first 12 days after the shot.

A Palaz

Hi Kaiser,

So I think the exclusions bare interesting be cause they poses the question why prevalence of immuno/fading/senesence higher/ lower in each group.

One other problem I am having is to replicate the days at risk for boosted ~10m and non-boosted ~5m. For me I can only do this with a short boosting cohort (less than 7 days) and because non boosted not allowed until August 10 , a separate non boosted group. So an N to a b with N later.

Did you observe this too.?


AP: yes, the at-risk numbers make no sense to me. In particular, why is there a huge dropoff in exposure time in the booster group for severe cases vs. cases - which is not replicated in the non-booster group? The problem with many of these Covid-19 studies is insufficient disclosure of where the numbers came from.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep