In this weekend's **pandemonium**, it emerged that the White House Rose Garden ceremony was probably a super-spreader event. Some commentators appear unpleasantly surprised that an outdoors event could have been responsible. This suggests the following mental model of the risk of infection:

(1) risk(outdoor) = 0

The above statement does not stand alone. There is a second, implicit statement:

(2) risk(indoor) > 0

When reporters asked these people why they do not wear masks or maintain distances of six feet or more, they explain that such measures are not necessary when congregating outdoors. Thus, we should re-state (1) as:

(1b) risk(outdoor + no mask + no distancing) = 0

Many elaborate that when indoors, they put on masks and keep their distance. So we also modify statement (2):

(2b) risk(indoor + mask + (distancing > 6 ft)) = Higher

Since they elect to wear masks and keep distances when indoors, they must also believe in the following:

(3) risk(indoor + no mask + no distancing) = Highest

Not covering their faces and not distancing raise the risk of infection when indoors.

***

We diagram these statements. For simplicity of exposition, I combine the effects of masks and distancing into one factor.

The **base case** used in this post is outdoors without masking or distancing ("Out-noMD"). Some people perceive the base-case risk to be zero. If the event is moved indoors, they will put on a mask and maintain a distance, so they recognize that the risk of infection in that setting would be higher ("In- MD"). If they take off their masks or do not separate while indoors, they believe the risk to be the highest ("In-noMD").

Let IO be the increase in risk between indoor and outdoor events, and MD be the decrease in risk due to masking and distancing.

This simple mental model can be represented using the following equation:

Risk = baseline + IO - MD

The baseline risk is zero, or better stated as negligible, corresponding to an outdoor event with no mask and distancing. Both IO and MD are positive numbers.

The first diagram is just an extraction of three of the four scenarios (black bars). This is known as an **additive model** in which each factor adds or takes away from the cumulative risk, independent of the levels of other factors.

In addition, the magnitude of IO is believed to be notably larger than that of MD. This reflects the widespread *assumption* that the indoor/outdoor factor is dominant.

***

Our simple mental model contains a couple of flaws.

Consider the risk at an outdoor event, in which participants wear masks and keep distances. This means:

baseline = risk(outdoor + no mask + no distance)

risk (outdoor + mask + (distance > 6 ft)) = baseline - MD

Since the baseline risk is perceived to be zero, this scenario is rated negative risk by our model, which is an invalid value.

One resolution is to assume MD = 0, that masks and distancing confer zero benefit. Adding this assumption, however, breaks the other part of the model because people actually say that at an indoor event, putting on a mask and maintaining a distance reduce infection risk.

***

Our mental model is too simple, and cannot express the following concept:

effect of masking and distancing = 0 if outdoors

... < 0 if indoors

Statisticians call this an **interaction effect**. The reduction in risk due to masking (or distancing) is not a constant but varies depending on whether the event is held indoors or outdoors. In our simple additive model, MD has the same effect in both settings. We've got to fix that.

In the diagram of our mental model, we cannot shift the two black bars down by equal measure. Instead of two brown arrows of the same length, we now allow the top brown arrow - which stands for the effect of masks and distancing at an indoor event - to be longer than the bottom arrow. Like this:

When we assume that MD varies depending on indoor/outdoor, it follows that IO now depends on masks and distancing. The green arrows are also different in length.

The equation for this model with interactions is:

Risk = baseline + IO - MD + IO*MD

***

What are the takeaways for a data scientist?

Articulate your intuition in a mental model, which can be expressed in a diagram or an equation, whichever you prefer

Make sure all hidden assumptions in your mental model are revealed

Make sure your model is complex enough to express the complexity of your thinking

Now open your software

P.S. The signs <, =, > should all be interpreted statistically. For example the equal sign does not mean strict equality, but no statistical difference.

## Recent Comments