[After communicating with Frakt, Humphrey and Dean Eckles, I realize that I was confused about Frakt's description of the Humphrey paper, which does not perform PP analysis. So when reading this post, consider it a discussion of ITT versus PP analysis. I will post about Humphrey's methodology separately.]
The New York Times plugged a study of the effectiveness of Alcoholics Anonymous (AA) (link). The author (Austin Frakt) used this occasion to advocate "per-protocol" (PP) analysis over "intent-to-treat" (ITT) analysis. He does a good job explaining the potential downside of ITT, but got into a mess explaining PP and never properly addressed the downside of PP. It's an opportunity missed because I fear the article confuses readers even more on an important topic.
The key issue at play is non-compliance in a randomized experiment. If some patients are assigned to AA treatment and others are assigned to some other treatment, typically some subset of patients will "cross-over," (or drop out altogether), and usually such cross-over is associated with the outcome being measured--for example, a patient assigned to AA treatment felt that AA was not working and aberrantly switched to the other treatment; or vice versa.
ITT and PP differ in how they deal with the subset of non-compliers. In ITT, you analyze everyone in the experiment based on their initial assignment, ignoring non-compliance. In PP, you drop all non-compliers from the study, and analyze the subset of compliers only. (Each analysis is "extreme" in its own way.)
Between these two, I usually preferred ITT. The PP analysis answers the question: "If everyone complied with the treatment, what would be its effect?" I don't find the assumption of zero non-compliance realistic. ITT answers a different question: "Of those who take are given the treatment, what would be the expected effect?" This effect is an average of those who complied and those who did not comply, weighted by the proportion of compliers.
Frakt lost me when he said:
In a hypothetical example, imagine that 50 percent of the sample receive treatment regardless of which group they've been assigned to. And likewise imagine that 25 percent are not treated no matter their assignment. In this imaginary experiment, only 25 percent would actually be affected by random assignment.
First of all, the arithmetic does not work. If we ignore assignment as he suggested in the first two sentences, then the patients can either have received treatment or not. But 50 percent plus 25 percent leaves 25 percent of the patients unaccounted for.
Here is an illustration of what I think Frakt wanted to get across:
Of the 50% assigned to the treatment, 90% (45 out of 50) complied and 10% crossed over. Of the other half initially assigned to no treatment, 60% (30 out of 50) crossed over to the treatment. All in all, 75% of the study population received treatment and 25% did not... regardless of their initial assignment.
In an ITT analysis, all patients in the table are analyzed. We compare the top row with the bottom row. By contrast, in a PP analysis, we only analyze the patients along the top-left, bottom-right diagonal, namely, the 65% of the patients who complied with the assigned treatment. So, we compare the top left corner with the bottom right corner.
The important question is whether this 65% subset constitutes a random sample. Frakt implies it is: "only 25 percent [i.e. 65 percent in my example] would actually be affected by random assignment." Maybe when he said "affected by", he didn't really mean random; because it should be obvious that treatment is no longer randomized within the 65% subset.
If the 65% subset were randomly drawn from the initial population, we should still have equal proportions of treated versus non-treated but in fact, we have 70% treated versus 30% not treated. Said differently, the not-treated patients are more likely to cross over than the treated patients.
Cross-over isn't something that happens randomly. Patients are assessing their own health during the experiment, and thus, the opting out is frequently related to the observed (albeit incomplete) outcome.
In the article, Frakt states that the study of Humphreys et. al. "corrects for crossover by focusing on the subset of participants who do comply with their random assignment". I call this "filtering" rather than "correcting".
Does analyzing this subset lead to an accurate estimate of the treatment effect? I don't think so.
By filtering out the cross-overs, the researchers introduce a survivorship bias. If the cross-overs do so because they are unhappy about their assigned treatment, then these patients, if forced to continue the original treatment, are likely to have below-par outcomes compared to those who did not cross over. In a PP analysis, this subset is removed. Practically, this means that the treatment effect (PP analysis) is too optimistic.
Frakt is careless with his language when it comes to discussing the downside of PP analysis. He says (my italics):
it’s not always the case that the resulting treatment effect is the same as one would obtain from an ideal randomized controlled trial in which every patient complied with assignment and no crossover occurred. Marginal patients may be different from other patients...Despite the limitation, analysis of marginal patients reflects real-world behavior, too.
"Not always" leaves the impression that PP analysis is usually right except for rare situations. Note how he uses the word "limitation" above (paired with "despite"), and below, when discussing ITT analysis:
For a study with crossover, comparing treatment and control outcomes reflects the combined, real-world effects of treatment and the extent to which people comply with it or receive it even when it’s not explicitly offered. (If you want to toss around jargon, this type of analysis is known as “intention to treat.”) A limitation is that the selection effects introduced by crossover can obscure genuine treatment effects.
The choice of words leaves the impression that ITT is more limited than PP when both analyses suffer from problems arising from the same source: patients with worse outcomes are more likely to cross over.