Yesterday, I laid out the framework for today's post, in which I compare intent-to-treat (ITT) analysis and per-protocol (PP) analysis of the colonoscopy trial. All the data I cite below are pulled from this paper and its supplement (link).
***
The outcome metric of the clinical trial is the risk of colon cancer (diagnosis) over a 10-year follow-up window (a final analysis will be performed after 15 years). Risk is based on the relative ratio of diagnosis rates between the treatment and usual-care (control) group. In the ITT analysis, the treatment group consists of everyone who's been randomized to receive invitations to colonoscopy while in the PP analysis, the treatment group consists of the "compliers" only, those who responded to the invitation and underwent colonoscopy. Because compliers self-select to accept screening, they cannot be treated as a random subset of the invited group, and thus, the researchers spun up a regression model used to "remove biases".
In the chart below, I present the results of the ITT analysis from the paper, which is pre-specified for this trial. The trial was run in three countries but Poland (64%) and Norway (31%) contributed 95% of the participants.
Immediately, two features of this data come to life. Firstly, the case rate in Poland (0.88%) is much lower than that in Norway (1.35%). The relative risk ratio of 35% is higher than the relative ratio between the treatment and control groups (18%). (In other words, variability between countries is larger than variability between treatment and control.) Secondly, the observed reduction in case rate among Norwegian participants was 24% while that in Poland was 16%. Since Poland comprises almost 61% of the overall population, the reported aggregate effectiveness is just north of Poland's value.
The next chart shows what happens when the researchers switched to an "adjusted per-protocol" analysis. They broke up the invited group into compliers and non-compliers. The paper reported diagnosis metrics for Poland, Norway and the aggregate. (They did not disclose mortality metrics, only reporting the relative ratio, and thus no one can validate that part of the analysis. Nor did they break out Sweden's data.)
In Norway, the paper printed the compliers' case rate to be 0.86%, which is 30% below the average of the invited group. (It is not clear from the paper how much of this shift is due to adjustment and how much is due to filtering out noncompliers. It's not even clear whether this rate is computed from data or estimated from a model.) In any case, the treatment group's case rate was marked down from 1.2% to 0.86%, and correspondingly, the effectiveness metric in Norway leaped from 24% to 45%!
In Poland, a completely different picture emerges. The compliers are said to have a case rate of 0.85%, almost the same as the 0.83% aggregate rate of the invited group. This means that the effectiveness metric in Poland has barely shifted compared to the ITT analysis. It's still about 15%.
From this, we learn that the change in analysis strategy embeds the following key assumption:
(A) Compliers are very different from noncompliers in Norway but not in Poland
Such an assumption might be justifiable; I am in no position to judge. Hold on tight, as there are more assumptions to be unearthed.
Not explicitly stated in the paper is the fact that the case rate for non-compliers must have been altered in the other direction: since in Norway, compliers form 40% of the invited group, and noncompliers 60%, we can back out the case rate of the non-compliers - implied by the adjusted per-protocol analysis - which is 1.77%. This observation reveals two other embedded assumptions:
(B) The diagnosis rate of colon cancer for Norwegians who refused to be screened when invited (1.77%) is higher than that for people not invited to screening and also not screened (1.57%). This requires explanation since people were randomly selected to receive invitiations
(C) The implied gap between the case rates of the compliers and noncompliers in Norway is a whopping ~1%, which is very wide
In Poland, two-thirds of those invited did not comply. Using this information, we can back out the case rate of the noncompliers, and it's 0.82%, actually slightly below the aggregate value of 0.83% for the entire invited group. This leads to further observations:
(D) There is almost no difference in diagnosis rates in Poland between those who complied and those who didn't comply with colonoscopy screening. This is particularly jarring considering (C) above.
(E) Colonoscopy is 3 times more effective in Norway than in Poland. In Norway, it brought the case rate down from 1.57% (usual care) to 0.86% (compliers) which is about the same level as compliers in Poland - despite the control group in Poland coming in much lower at 0.99% than the control in Norway.
***
The researchers decided to do an "adjusted per-protocol analysis" which means they did two things: they filtered noncompliers from the treatment group, and they made adjustments using a regression model that includes variables like country of origin, age, etc.
Since the paper does not disclose the model details, we have to look at the model outputs to infer what happened. What happened is listed above as (A)-(E). These are all changes to the data that ultimately caused the estimated effectiveness to shift from 18% to 31%.
***
What I want to show is that current reporting practices for observational studies in medical journals is wholly inadequate. It should be much easier for readers to comprehend what adjustment models are doing. Much more space should be devoted to justifying the effects of adjustments on the conclusions. Any of the above assumptions can make sense, but there is nothing in the paper to back them up.
The models used to adjust data should be disclosed in full. Tables showing covariate balance between groups should be standard, for every pair of subgroups that feature in any of the metrics included in the publication. For example, age is clearly an important confounder in this study, since colon cancer tends to appear in older people but there is nothing in the paper that shows age distribution between the compliers subset and the usual-care group.
Thank you for confirming my suspicion that biostatistics are the slipperiest of all, and the most abused.
Posted by: Godfree Roberts | 10/27/2022 at 07:58 PM
Very informative and well presented analysis. I think most people would agree with your critique of disclosure and documentation. Especially since this kind of analysis seems generic enough to be distilled into a "best practice" R-script.
But I think there are good(?) reasons, why so little is done in this respect. Among them historical reasons, scientific paper(!) shelf-space was/is scarce, and reasons due to the sociological aspects of research production. E.g. rewarded is number of papers not amount of disclosure and maybe the fear of loosing the "secret sauce" recipe to competitors.
Posted by: gg | 10/28/2022 at 04:57 AM
gg: since there are no limits to what one can put in the "supplementary appendix" these days, there is really no excuse to not providing the details. If "secret sauce" is a concern, then the material in question is not publishable - it belongs to business, not science.
Providing a script is important for reproducibility and is more precise in presenting details such as data processing steps but it also increases the burden on readers as it's not easy to read R scripts if you don't know R. Scripts also do not contain the model outputs.
This leads to the state of "peer review". How is it possible for reviewers to judge the validity of models (for statistial adjustments) when they haven't seen the model outputs? If I had to review this paper, without additional disclosure, my judgement merely reflects whether I believe the effectiveness is closer to 30% or 20%.
Posted by: Kaiser | 10/28/2022 at 10:47 AM