When I read Covid-19 medical research during the pandemic, one recurring practice bugs me a lot. Researchers seem to believe that a simple adjustment term inserted into a regression model *eliminates* bias in analyzing observational studies. To wit, if I suspect that age is a confounder, all I have to do is to insert an age term into my regression equation, and somehow that magically resolves the age bias fully.

That's just odd. Such a simple adjustment scheme works for simple biases. Specificially, one must get the age strata defined correctly so that within each age stratum, the treatment status is as if random (conditional on any other confounders used). In most real-world scenarios, this conditional independence assumption is hard to believe.

The age bias example is the tip of the iceberg. Even more bewildering is calendar time adjustment. This is used as a cure-all for all time-varying confounders. If one just inserts a calendar-time term into the regression model, all time-varying biases disappear! Shhh, don't even ask how they determine the appropriate time strata (days, weeks, months, etc.). [Whispering: you don't say the passage of time does not cause confounding.]

Then, I came across Andrew's recent paper - coauthored with Lauren Kennedy - called "causal quartets" (link to PDF), which nicely lays out some of the problems I've tossed around in my head.

Let me summarize the key points of this paper.

***

The methodology being discussed is "a causal model using linear regressions without interactions". This is the dominant type of causal models used in "real-world" studies of Covid-19 vaccines (should nicely generalize to logistic regressions, Cox regressions, etc.). The researchers identify a set of potential confounders - almost always, these are convenient variables that are pulled out of large-scale government databases, such as age, gender, prior hospital visits, etc. - then throw these confounders as main effects in a regression model (typically, a Cox regression). Then, the practice is to take "the coefficient of the treatment variable" as the "causal effect" of the treatment. As Andrew and Lauren stated, this causal effect is really an "average" treatment effect.

It's an average effect in the sense that the treatment under study has an effect for each individual, which may vary across individuals, and the single number that summarizes this distribution is the average of individual effects.

The existence of an average effect does not imply that the effect is identical to all individuals. Indeed, the causal quartet presents different distributions of individual effects that lead to the same average effect.

[Readers will be reminded of the so-called Anscombe's quartets, a favorite of statistics professors to warn against running "trendlines" on any and all datasets.]

Frequently, individuals can be organized into groups (e.g. age groups). Each age group has an average treatment effect among its individuals, and the average treatment effect is an average of the group averages. The value of the average clearly then depends on age group distribution within the population of individuals.

For more, see the Prologue of **Numbersense** (link), where I discuss how this plays into the Simpson's paradox.

***

To nail down the ideas, let's look at a simplest case in which age is the only confounder. The age-adjusted linear regression model is:

Outcome (given Treatment and AgeGroup) = a + b1*Treatment + b2(j)*AgeGroup(j)

where AgeGroup is one of j=3 ("young", "mid", "old"), and "Not Treated" and "old" are considered the reference levels, thus b2("old") = 0 and b1 is set to 0 for Not Treated.

Under this model, we have the following calculations:

Outcome (given Treated, "old") = a + b1

Outcome (given Not Treated, "old") = a

Treatment Effect (given "old") = a + b1 - a = b1

Similarly,

Outcome (given Treated, "young") = a + b1 + b2("young")

Outcome (given Not Treated, "young") = a + b2("young")

Treatment Effect (given "young") = a + b1 + b2("young") - a - b2("young") = b1

In sum, for all age groups, the model gives the same treatment effect of b1. Thus, b1 is the average treatment effect for the entire population. In this case, there is no difference between weighted and unweighted average; the age distribution does not matter.

The madness begins when people take the above and conclude that "the treatment effect is the same for all age groups".

This gets things *backwards*. This conclusion is the starting point, not the ending point, of the regression model; it is a necessary consequence of its structure.

Let's look at the regression equation again:

Outcome (given Treatment and AgeGroup) = a + b1*Treatment + b2(j)*AgeGroup(j)

Three terms together constitute the outcome. The first term ("a") applies to every individual regardless of treatment status or age. Its value is set to be the outcome of the Not Treated in the "old" group (not the only possibility but what I'm using in this discussion). Think of selecting the Not Treated, "old" group as the reference group, and measuring other groups against it.

Then, for each Treated person, regardless of age, we add the second term "b1".

Finally, for each person in AgeGroup j, we add the third term "b2(j)", except the "old" group (for which we add zero).

Now, let's consider an alternative hypothesis: that the treatment effect decreases by age group; the older the individual, the lower the effect. Under the above regression model, however, the treatment effect is always b1 regardless of age. Therefore, it is impossible to represent the alt hypothesis with this model!

We need b1 to vary by age group. What we need is an "interaction" or "cross" term in the regression model, something like this:

Outcome (given Treatment and AgeGroup) = a + b1*Treatment + b2(j)AgeGroup(j) + b3(j)*Treatment*AgeGroup(j)

where, for simplicity of presentation, I set b3(j) = 0 for Not Treated regardless of Age Group so that I don't have to make b3(t, j)

Under this model, we can compute:

Outcome (given Treated, "old") = a + b1 + b3(Treated, "old")

Outcome (given Not Treated, "old") = a

Treatment Effect (given "old") = b1 + b3(Treated, "old")

Similarly,

Outcome (given Treated, "young") = a + b1 + b2("young") + b3(Treated, "young")

Outcome (given Not Treated, "young") = a + b2("young")

Treatment Effect (given "young") = b1 + b3(Treated, "young")

Now, the treatment effect consists of two parts, b1 which is independent of age, and b3 which varies with age group, and so in total, the treatment effect varies with age group.

***

By using the first regression model, the modeler makes the assumption that the treatment effect is the same for all age groups. If the modeler concludes, after fitting that model, that the treatment effect is the same for all age groups, s/he has set up a tautology, and has proven nothing.

Unfortunately, that's the kind of thing that is common in Covid-19 observational studies. They don't even publish the regression equations, coefficients, measures of fit, etc. They should have also tried the model with interactions, and explain why that is not preferred.

In addition, it appears that most of these same studies take b1 from the first model, and call it the treatment effect. SInce this first model assumes that the effect is constant for all subgroups, we can conveniently ignore the relative sizes of these subgroups.

If we have to compute the average treatment effect under the second model (the one with interactions), it would be:

Average treatment effect = Proportion "young" * (b1 + b3(Treated, "young")) + Proportion "mid" * (b1 + b3(Treated, "mid")) + Proportion "old" * (b1 + b3(Treated, "old")) = b1 + weighted average of b3(Treated, j)

The only scenario when we can ignore the age distribution is when the treatment effect does not depend on age. In this case, the weighted average equals the unweighted average so that the second model "converges" to the first model. As already noted, that is an assumption of the researchers, not a conclusion of the fitted model.

If the true treatment effect varies by age, then the form of the first regression model is inappropriate, and the average treatment effect estimated by this model is not the required weighted average effect. That's one of the key points of the Andrew and Lauren paper.

(Scroll to the end of the post to see a visualization of the two models.)

***

Andrew and Lauren suggest that the practice of using regression models without interactions is due to lack of data. It takes "16 times the sample size" to estimate one interaction effect relative to the main effect.

I get it when the observational data came from the professor's undergraduate class of 100 students but in Covid-19 studies, they have easily hundreds of thousands, if not millions, of subjects.

Nevertheless, the point of the causal quartets paper is to advocate including varying treatment effects in modeling observational data. In Section 3.1, they put some numbers behind the concepts I covered above. The study had a treatment effect of 25% higher survival rate (55% survived in the treatment group vs 30% in the control group).

If the regression model does not contain interactions, and the treatment effect (b1) from that model is interpreted as the average treatment effect, one expects each individual who takes the drug should have a 25% higher survival rate. That is only one possibility. Another possibility is that only 25% of the patients benefit from the drug, and for these, the chance of longer survival is 100%. The average treatment effect is 25%*100% + 0%*0% = 25%. Gelman and Lauren then concludes that "once we accept the idea that the drug works on some people and not others,..., we realize that 'the treatment effect' of any given study will depend entirely on the patient mix." Yep, bingo, 100%, +1.

The second possibility is more plausible than the first, and it is excluded by the structure of the regression model without interactions.

In later sections, Gelman and Lauren explore the implication of the fact that the intervention in most observational studies has no impact on a good portion of the subjects. In particular, they note that even if there is a huge impact on a small proportion of the subjects, the average treatment effect may still be small because of averaging. This calls into questions studies that (a) show huge average treatment effects and (b) for which most treated subjects would have no benefit.

Next, they point out that if we knew who might benefit from treatment, we'd have a better measurement if we restricted the test population to that subgroup. This effectively removes the majority of the subjects expected not to benefit, and is widely practiced in all Covid-19 studies (long lists of exclusions). All well and good... until the researchers then pretend that they didn't exclude anyone, and apply the treatment effect from the restricted study to the entire population. Instead, "when generalizing to a larger population, some modeling is necessary conditional on any information used in patient selection." (Gelman & Lauren) Unfortunately, I have not seen this done even once in any of the Covid-19 studies I read.

***

Here is an illustration of the differences between the two regression models discussed above, for those visually inclined:

The second model allows the three green dots for Treated to be placed anywhere: the profile of the Treated line (green) along the age dimension can be anything, does not have to be linear or parallel to the Not Treated line (blue). In this illustration, I arbitrarily decided that the treatment effect is below average for "mid" and "old" age groups; that it is the same for "mid" as for "old"; and that the treatment effect for "young" is higher. By assumption for simplicity, the blue lines (Not Treated) are equivalent on both panels but in general, two of the three blue dots on the right panel can also vary.

P.S. [3-1-23] Fixed the title of the right panel chart.

## Recent Comments