I love "data-driven" arguments. You probably do too if you're reading this blog.

What does it mean to be "data-driven"? One prerequisite is that **the conclusion should change as the data change.** The analog that I write about frequently on the dataviz blog is to qualify as a data visualization (as opposed to just a visualization), **the visual should change as the data change**.

***

Recently, there have been numerous arguments that are immutable in the face of changing data.

John Burn-Murdoch at FT wrote this tweet:

John dislikes this argument:

If there are spare beds, this proves ex-post that mitigation measures were not necessary.

There are different potential problems with this argument.

John is talking about **limiting factors**. In a complex system, many factors combine to produce an outcome. Typically, only one or a few factors are "limiting" in the sense that if those factors were either relaxed or tightened, the outcome would be directly affected. Meanwhile, there are many other factors for which manipulating them would not alter the outcome. These other factors may matter under other combinations but they don't matter right now.

Another counter-argument is related to the **direction of causality**. One reason why the government may shut Nightingale hospitals (for those in the U.S., these are akin to the makeshift beds set up at conference centers) is because mitigation measures were successful in reducing the demand.

Yet another counter-argument is the assumed counterfactual. Because mitigation measures were taken, we could not observe what could have happened if they weren't. The absence of evidence is being taken as evidence of absence.

***

But I think the best way to counter this type of argument is to point out its inherent **data-agnostic autonomous structure**:

I) On the one hand, if the outcome is X, then we conclude Y.

II) On the other hand, if the outcome is not X, then we conclude Y.

In other words, in all cases, whether X or not X, we conclude Y.

In the concern over Nightingale hospitals, this structure translates to:

I) On the one hand, if there are spare beds, then we conclude that mitigation measures were unnecessary, causing harm without benefit.

II) On the other hand, if they run out of beds, then we conclude that mitigation measures were ineffective.

In all cases, we conclude that mitigation measures were unwise.

The problem is that the same conclusion results regardless of the data (need for spare beds). So this is not a data-driven argument despite the presence of data.

***

The Bayesian way of thinking explicitly endorses the premise that conclusions should change as the data changes. Bayesian analysis leads to a posterior probability estimate, which explicitly depends on the data. If you feed the analysis different data, you get a different probability distribution. For an example of this, see this post in which I showed how the Pfizer vaccine efficacy estimate changes, assuming different trial outcomes.

The classicial (frequentist) method of hypothesis testing also produces different conclusions given different data inputs. The "null" hypothesis does not change but with different data, the location of the observed sample with respect to the "null" distribution changes, leading to different p-values.

[P.S. 1-1-2021. I switched from "agnostic" to "autonomous" as I think the argument being autonomous of the data conveys the meaning better.]

## Recent Comments