Yesterday, I laid out the framework for today's post, in which I compare intent-to-treat (ITT) analysis and per-protocol (PP) analysis of the colonoscopy trial. All the data I cite below are pulled from this paper and its supplement (link).

***

The outcome metric of the clinical trial is the risk of colon cancer (diagnosis) over a 10-year follow-up window (a final analysis will be performed after 15 years). Risk is based on the relative ratio of diagnosis rates between the treatment and usual-care (control) group. In the ITT analysis, the treatment group consists of everyone who's been randomized to receive invitations to colonoscopy while in the PP analysis, the treatment group consists of the "compliers" only, those who responded to the invitation and underwent colonoscopy. Because compliers self-select to accept screening, they cannot be treated as a random subset of the invited group, and thus, the researchers spun up a regression model used to "remove biases".

In the chart below, I present the results of the ITT analysis from the paper, which is pre-specified for this trial. The trial was run in three countries but Poland (64%) and Norway (31%) contributed 95% of the participants.

Immediately, two features of this data come to life. Firstly, the case rate in Poland (0.88%) is much lower than that in Norway (1.35%). The relative risk ratio of 35% is higher than the relative ratio between the treatment and control groups (18%). (In other words, variability between countries is larger than variability between treatment and control.) Secondly, the observed reduction in case rate among Norwegian participants was 24% while that in Poland was 16%. Since Poland comprises almost 61% of the overall population, the reported aggregate effectiveness is just north of Poland's value.

The next chart shows what happens when the researchers switched to an "adjusted per-protocol" analysis. They broke up the invited group into compliers and non-compliers. The paper reported diagnosis metrics for Poland, Norway and the aggregate. (They did not disclose mortality metrics, only reporting the relative ratio, and thus no one can validate that part of the analysis. Nor did they break out Sweden's data.)

In Norway, the paper printed the compliers' case rate to be 0.86%, which is 30% below the average of the invited group. (It is not clear from the paper how much of this shift is due to adjustment and how much is due to filtering out noncompliers. It's not even clear whether this rate is computed from data or estimated from a model.) In any case, the treatment group's case rate was marked down from 1.2% to 0.86%, and correspondingly, the effectiveness metric in Norway leaped from 24% to 45%!

In Poland, a completely different picture emerges. The compliers are said to have a case rate of 0.85%, almost the same as the 0.83% aggregate rate of the invited group. This means that the effectiveness metric in Poland has barely shifted compared to the ITT analysis. It's still about 15%.

From this, we learn that the change in analysis strategy embeds the following key assumption:

(A) Compliers are very different from noncompliers in Norway but not in Poland

Such an assumption might be justifiable; I am in no position to judge. Hold on tight, as there are more assumptions to be unearthed.

Not explicitly stated in the paper is the fact that the case rate for non-compliers must have been altered in the other direction: since in Norway, compliers form 40% of the invited group, and noncompliers 60%, we can back out the case rate of the non-compliers - implied by the adjusted per-protocol analysis - which is 1.77%. This observation reveals two other embedded assumptions:

(B) The diagnosis rate of colon cancer for Norwegians who refused to be screened when invited (1.77%) is higher than that for people not invited to screening and also not screened (1.57%). This requires explanation since people were randomly selected to receive invitiations

(C) The implied gap between the case rates of the compliers and noncompliers in Norway is a whopping ~1%, which is very wide

In Poland, two-thirds of those invited did not comply. Using this information, we can back out the case rate of the noncompliers, and it's 0.82%, actually slightly below the aggregate value of 0.83% for the entire invited group. This leads to further observations:

(D) There is almost no difference in diagnosis rates in Poland between those who complied and those who didn't comply with colonoscopy screening. This is particularly jarring considering (C) above.

(E) Colonoscopy is 3 times more effective in Norway than in Poland. In Norway, it brought the case rate down from 1.57% (usual care) to 0.86% (compliers) which is about the same level as compliers in Poland - despite the control group in Poland coming in much lower at 0.99% than the control in Norway.

***

The researchers decided to do an "adjusted per-protocol analysis" which means they did two things: they filtered noncompliers from the treatment group, and they made adjustments using a regression model that includes variables like country of origin, age, etc.

Since the paper does not disclose the model details, we have to look at the model outputs to infer what happened. What happened is listed above as (A)-(E). These are all changes to the data that ultimately caused the estimated effectiveness to shift from 18% to 31%.

***

What I want to show is that current reporting practices for observational studies in medical journals is wholly inadequate. It should be much easier for readers to comprehend what adjustment models are doing. Much more space should be devoted to justifying the effects of adjustments on the conclusions. Any of the above assumptions can make sense, but there is nothing in the paper to back them up.

The models used to adjust data should be disclosed in full. Tables showing covariate balance between groups should be standard, for every pair of subgroups that feature in any of the metrics included in the publication. For example, age is clearly an important confounder in this study, since colon cancer tends to appear in older people but there is nothing in the paper that shows age distribution between the compliers subset and the usual-care group.

## Recent Comments