I bet you can't believe I'm going to say I don't need/want data on a blog about intelligent thinking about data.
I'm going to argue that intelligent thinking about data includes recognizing when you don't need data - and by extension, you don't need more data.
This post is motivated by opinions that are circulating about how certain public health measures against infectious diseases such as masks and lockdowns are 100% useless - although if you start thinking about other domains, you can find pertinent examples.
Let's start with lockdowns first as that's more black and white. I don't need any data to believe that lockdowns reduce infections. I'd need data if I want to establish the magnitude of protection but for the direction, I don't need data. Why? Because I have theory.
What theory? Infectious diseases spread by infections, which requires contacts. Turning this around, one can say no infection can occur unless there is contact. During the early part of the pandemic, and especially before vaccines were available, in key hotspots, schools and businesses were closed; people were asked to stay home; employees were allowed to work from home. As a result, the frequency of contacts each of us had with others was drastically reduced, and for some, reduced to near zero. By the theory, the dramatic reduction of contacts resulted in a reduction of infections.
Do I need data to prove that statement? No. [The word "theory" is weird. By theory, I mean an immutable phenomenon, such as incorporating the law of gravity in a physical model. There's probably a better word for this. Structural model?]
***
Would I like to have data? Sure. But it's not a must-have. If I had data, I hope to answer more questions, like how much reduction? does it affect all demographic segments equally?
Would I not want to have data? There is even an argument for this (although this sentiment won't be universally shared). I suspect strongly that any data that could be made available to me would be almost impossible to interpret - because we cannot run randomized experiments on lockdowns. Thus, it would take a lot of effort to clean the data, and to adjust the data, and none of this can be accomplished without making lots of subjective assumptions, sure to enrage some and delight others.
For example, all such data confound the effects of lockdowns and vaccinations because most places that had lockdowns also pushed vaccines simultaneously. What's more, masking, social distancing, and other measures were also simultaneously put to work. So, if someone had the data, it is likely to confuse rather than illuminate. Let's not forget about enforcement of lockdowns, and compliance to lockdowns. Contacts might not have been reduced if the lockdown was in name only!
So if I had data, it could show infections going up, down or sideways, and the effect of lockdowns would be masked by all the other different factors. More data could be useless.
***
More data, however, could not overturn my theory... otherwise, my theory is just conjecture. If the data happened to show that, after adjusting for all kinds of other factors, lockdowns have zero or negative impact on infections, I'd be much more likely to reject some of the subjective assumptions or the indirect data proxies, rather than to question the theory, which in this case, means I have discovered evidence that infections occur without contacts.
Even if surveillance data show that compliance to lockdowns was nonexistent, the proper way to interpret the data is to say that lockdowns by themselves should reduce infections but its effect is masked by compliance or lack thereof, so that data that confound both factors show that infections did not drop. Embedded in there is still the theory that reduced contacts reduces infections.
The theory - the structural model - is an immutable part of the larger statistical model. We expect data to conform to this structure. We don't use data to modify the theory. (This structural model is not the same thing as a prior in Bayesian models. Bayesian priors are subject to updating on observing data so it does not represent anything immutable.)
***
There are other examples in other domains as well. Here is Andrew's model of golf putting (link). The section called "modeling from first principles" is an example of a statistical model that embeds a (geometric) theory. He also later uses the term "geometry-based" model.
The simple geometry is not enough, and further iterations of the model add other factors but the original geomtry is still in there.
***
This post records my first thoughts on this topic. I'll get to masks next.
I need more than a theory if you are going to demand I wear a mask.
I need more than a theory if you are going to demand school children <15 wear masks at school when we have data that the school children themselves are at negligible risk.
And I need a lot lot more than a theory if you are going to lock down societies. The costs in education are well defined (starting with a lot of data from a teachers strike in Brazil).
Quite simply I need to know that the risk can be quantified because the cost/annoyance/reduction in basic education can be.
The costs of masks and especially lockdowns are both simple theories backed up with data.
Posted by: Michael Droy | 02/17/2023 at 03:31 PM
MD: I know you want to talk about masks. I will talk about that next - it's not as simple as lockdowns.
Posted by: Kaiser | 02/17/2023 at 09:28 PM
Michael, I think you are right that you should ask for more than a theory to comply with a policy (not in general, but in this case), but that doesn't contradict what Kaiser is saying. His point is that you don't need data to show that you won't catch an infection if you don't have contact. Compliance, cost of intervention, and so on is an entirely different matter. Here is a less controversial example: I will guarantee you, without any data, that if you adopt the following diet, you will lose weight: drink three glasses of water per day and have one banana per day for one month. If you run a trial for this intervention, you'd better analyze it using the Intention To Treat (ITT) population, i.e., good luck with compliance.
Posted by: Eric Novik | 02/21/2023 at 06:21 PM
And this shows 1) the problem with using the word theory when something is well established (like germ "theory") and 2) that many people (like commenters here) don't understand what is meant by "theory"
Posted by: David Norman | 02/22/2023 at 12:39 AM
The point here is the ability to generalize claims. Generalization is done by statistical inference, from sample to population, by mechanistic models and, yes, "intuition". In the later case no data is needed. See https://www.youtube.com/watch?v=ADs7fWIvuVk
Posted by: Ron Kenett | 02/26/2023 at 02:42 AM
Another terrific post.
Just a few side notes.
We have a theory stating that lockdown and masking and vaccination work together to reduce spread: the swiss cheese model: https://en.wikipedia.org/wiki/Swiss_cheese_model; so it is our own theory that makes a measurement of each factor difficult|impossibile.
Unless something like an RCT emerges... like in the UK due to a /random/ accident: https://warwick.ac.uk/fac/soc/economics/research/centres/cage/publications/workingpapers/2020/does_contact_tracing_work_quasi_experimental_evidence_from_an_excel_error_in_england/
Theory - or structural model - in Bayes formula: isn't it represented by the likelihood function form?
Posted by: Antonio Rinaldi | 02/26/2023 at 11:08 AM