One of the less discussed fallacies in statistical arguments is tautology. By this, I mean saying the same thing twice. Or, a conclusion that is true by assumption.
A recent example is the following popular argument:
(A) Our economy has suffered greatly because of anti-pandemic measures such as lockdowns, as evidenced by layoffs and GDP decline.
(B) Meanwhile, the deaths due to Covid-19 are insignificant - relative to the lost jobs. (*)
(C) Therefore, the anti-pandemic measures were useless and misguided.
The conclusion is that public health measures were useless.
Every statistical data argument has assumptions. What is a key assumption behind the jump from (B) to (C)? It is that the deaths due to Covid-19 as reported would be the same without the anti-pandemic measures that were imposed. Said differently, the assumption is that anti-pandemic measures are useless.
If one assumes anti-pandemic measures are useless, one will conclude that anti-pandemic measures are useless. So the argument amounts to saying the same thing twice. It is tautological.
Imagine if the opposite assumption were made, that anti-pandemic measures are effective at suppressing the death toll. Then, one will conclude that those measures are useful.
It turns out it's very easy to fall into the trap of tautologies in making statistical (data) arguments. That's because not everything is measured or measurable. Conclusions are an amalgam of data and assumptions. It's not always clear even to the analyst what assumptions have been made.
(*) Some article I came across this morning said there were 330 layoffs for every Covid-19 death in the U.S. This layoff-to-Covid-death metric is apples-to-oranges. All layoffs are counted as if there would not have been a single layoff in the absence of the pandemic (untrue) while only deaths that are confirmed with a positive SAR-CoV-2 diagnosis are counted in the denominator (under-counted).
***
Data science is not immune to tautologies. Here's one:
(A) We run an e-commerce website, and have developed an algorithm to recommend products to our customers.
(B) After the recommendation engine was launched, browsing (page views) of the top recommended products increased significantly relative to past trends.
(C) Therefore, our recommendation engine made the right recommendations.
The conclusion is that the engine recommended the right set of products. What is a key assumption behind the jump from (B) to (C)? It is that page views of other products would not increase similarly if they were recommended to the customers. Said differently, the assumption is that the engine recommended the right products to these customers.
If one assumes the recommendation engine works, one will conclude that the recommendation engine works. The argument amounts to saying the same thing twice. It is tautological.
Alternatively, one can assume that the engine is useless. Whatever is recommended to customers will get top views. In that case, one concludes that the recommendation engine is useless.
***
Have you come across examples of tautologies? Comment below!
while only deaths that are confirmed with a positive SAR-CoV-2 diagnosis are counted in the denominator (under-counted).
Are they? Excess deaths is partly a function of temporarily inadequate health facilities. And there are certainly cases of "covid present" interpreted as "Covid cause". There may be some undercounting, but not nearly as much as often claimed.
More importantly the forecast all time deaths from Covid keeps on collapsing while the advance towards herd immunity is rapid as we understand more about immunity being a thing for large proportions of the pop. and how testing including anti-body testing to date has missed so many.
Interesting piece about tautologies, just being picky about one point.
Though I am not sure you have made your point well.
The jump from B to C is not dependent on A. Indeed lockdown might* have been extremely effective in reducing covid deaths, while the jump from B to C remains legitimate.
* In fact all we could say for certain is lockdown might have been effective in delaying deaths - by 6 months or so in care homes**, by perhaps 2 years in other cases, and until the second and third wave for others.
** In countries like UK Excess deaths has been small negative for past month implying at least some of the covid victims might have been due to go soon (few care home patients stay there for 2 years).
Posted by: Michael Droy | 08/06/2020 at 06:40 PM
I’m gonna go get the papers get the papers.
Posted by: Brian | 08/06/2020 at 07:30 PM
MD: I didn't make it clear in the post but the author used raw death counts, not excess deaths. From what I've seen so far, confirmed Covid deaths are not sufficient to explain all excess deaths but I've written here before that excess deaths require time to "mature". Also, good point about the (A) part of the setup; it just gives some context which should be stated separately.
Posted by: Kaiser | 08/07/2020 at 01:53 AM
Every conspiracy theory is an example of P(A|D)=1 because P(A)=1.
(A=assumption, D=data)
Posted by: Antonio Rinaldi | 08/08/2020 at 11:09 AM
While this is not a tautology, it is an assumption without evidence.
"the advance towards herd immunity is rapid"
Herd immunity, depending on the disease, generally requires greater than 90% of the potential infection pool to be immune/resistant. More infectious diseases (which I would expect COVID-19 to be classified as) can require population level immunity of 99% or even higher.
Even most agressive estimates of infection rate for the US works out to a single digit percentage of the US population (which is the relevant pool). As of this writing, the CDC puts the US case rate at 1.5% after about 6 months. Even if we assume that the CDC numbers undercount exposures by a factor of 10 (meaning a true potentially immune pool of 15% today), we'd still need several years at the current run rate to reach 90% exposure. That does not meet my definition of "rapid"
Posted by: Joshua Jendza | 08/10/2020 at 11:54 AM
It is a bit like the year 2000 problem that wasn't. A company I worked for started dealing with these problems in the eighties, either converting dates to 4 digits or changing the calculations so that they worked properly. As a result the payroll worked, ordering and recording worked and everything else. It cost them a lot of money, but otherwise business would have stopped.
Posted by: Ken | 10/24/2020 at 05:34 AM