« What the Danish study tells us about the CDC study on real-world effectiveness | Main | In science, truth matters »


Feed You can follow this conversation by subscribing to the comment feed for this post.

A Palaz

Hey Kaiser,

So in Danish I don' t think any much regression they just reweight by patient day. E.g. in care home population they vaccinate early leaving only 1868 not vaccinated from maybe 15-17 days.

Here is my guess at raw cases in unvaccinated cover by their time windows.

454. 0-14 days. Etc...

Why do they not present this. Its very annoying. Maybe you have an idea. Possibly a feeling that it will not look robust with the shrunken unvaccinated sample?

Thinking about that how should that be dealt with?

This connects to this post because the link I see across studies is which is better patient days at risk or patients for presenting.

I think days better because of relationship between infections and exposure, but obvious this gets same effect you detail in Simpsons paradox.

So how does this connect here. Well you make a slip in one post saying "hours at risk" and for the CDC this applies.

One more adjustment for these special high risk groups would be hours of exposure to high risk *degree of risk.

So just to add one more to your great list, some of the variance might also be explained WITHIN and across risk groups by such factors.

I now look at all studies that do not present detailed patient days with disappointment and the question why.


AP: I think the level of disclosure in these studies are well below what's required, especially since most of these are "interim" studies where the data have not been baked in yet. The calendar time is made much more crucial because of (a) the use of case-counting windows and (b) the changing environment. Also, I do not understand why they do not publish their model - the CDC study did not publish their model either. Saying it's a Cox regression is not enough.

The CDC study addresses the issue you brought up - that you can't use infection rates per person when more and more of the cohort are getting vaccinated. So these new studies - unlike RCTs - use person-time as the denominator. As I said above, this effectively splits a vaccinated person's timeline into two parts, first counting as unvaccinated and then as vaccinated. This is the so-called Andersen-Gill extension to Cox. It's possible that this is what the Danish study did as well but nothing in the paper tells us that.

But that adjustment does not deal with the sharp decline in infection rate from December to March, and the fact that the unvaccinated exposure is primarily in the earlier weeks when infection rates were much higher.

I also think - and someone please correct me if I'm wrong - that AG extension does not address the self-selection bias problem in this data. All it does is to address a timing bias that would arise if a standard analysis were applied.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep