Over the holiday weekend, news came out that Cornell University, one of the Ivies, has decided to re-open for the Fall term 2020. This means no more online instruction, a full-scale return to in-person teaching. The decision is supported by a mathematical model built by a team of professors and graduate students. They published a very detailed report here, and noted that it is a work in progress.
The report makes a strikingly "counterintuitive" claim, that the number of infections within the Cornell community will be one-fifth lower in the "Full Re-open" scenario relative to the "No Re-open" scenario.
Outcome is sensible if you know the assumptions
At the superficial level, everyone expects infections to go rampant if a large college community like Cornell (~ 35,000 including faculty, staff and students) opens up. But if one takes account of the assumptions behind the model, carefully laid out in the Cornell report, one must conclude that the research conclusion is entirely non-controversial.
In the Full Re-open scenario, Cornell will successfully implement a full-scale test-trace-isolate program which the U.S. government failed to do at all levels. Every member of the community is assumed to be tested for the novel coronavirus every five days. Anyone who tests positive or self-reports positive will be quarantined. Their contacts will be traced and quarantined as well. On top of all that, mask wearing and social distancing policies will be implemented.
Before anyone is allowed to return to campus, they must take two diagnostic tests. The first test is administered at home, and if someone gets a positive result, they must self-isolate at home and delay their return. The second test is required upon arrival, and anyone testing positive is immediately separated from the community.
With the above assumptions, the researchers have described a community in which almost no one coming back to campus will have the virus, infections are detected early, asymptomatic carriers are identified, and infectious or suspected infectious people are isolated, so it is no surprise that after 16 weeks, only 3.5 percent of the community is expected to have been infected.
In the No Re-open scenario, a test-trace-isolate program is unavailable, community spread is not under control, and so a lot more people end up getting sick.
Some unsavory implications
If these modeling results are accepted, they make rather disturbing commentary on several related issues beyond Cornell.
First, the model provides solid evidence that a full-scale test-trace-isolate program constrains community spread and saves lives. This report then is a repudiation of U.S. anti-pandemic policies up to now.
Second, the report also rebukes the previous decisions by American universities, including Ivy League colleges, in March, to shut down their campuses, sending students home. By switching to virtual instruction, these schools have unwittingly contributed to a large growth in infections.
Third, the researchers stress again and again the importance of broad-based, repeated testing to stem infections by asymptomatic carriers, something I've been advocating for a while here. They acknowledge that such testing is not available to the U.S. public at large, which is the primary reason the No Re-open virtual instruction scenario looks so much worse. The report also concludes that masks and social distancing are necessary but not sufficient. This speaks to the continuing failure of U.S. policy.
Notes on the modeling framework
The researchers adopted a traditional framework for the analysis. They set up a simulation which divides the community into many "compartments," including not just the usual Susceptibles and Infectious, but also Exposed, Quarantined, Asymptomatics and so on. Every day, people are moved from compartment to another compartment, such as from Susceptible to Exposed, or from Symptomatic to Recovered. After 16 weeks, they observe the proportion of people in each compartment, the outcome. Each run of the simulation represents a possible future.
The pattern of inter-compartment movements is governed by a set of about 15 parameters. These parameters are assumptions of unmeasurable quantities, such as the average number of contacts per day per person, the probability of getting infected upon contact with an infectious person, the proportion of asymptomatics, and so on.
A subset of these parameters are specified as "stochastic", which is a reason why different simulation runs generate different outcomes. For example, the model assumes an average number of contacts between infectious and susceptible people. Without the stochastic element, given the current mix of infectious and susceptible people, the simulation runs should produce the same number of contacts. The stochastic element introduces variability around this average number so that in some runs, the number is above average, and in others, below average.
This simulation framework is old-school, in the sense that most parameters are fixed to average values. A modern Bayesian model treats all parameters as "random variables" each with an underlying probability distribution. Instead of fixing, for example, the average daily contact rate to 8.3, a Bayesian treats the average rate as a variable, with values that are spread around 8.3. Because of this additional source of variability, a Bayesian model will show wider error bands than the Cornell framework.
The researchers first nailed down a base case scenario ("nominal"). They then did a classical best case and worst case analysis. For instance, the asymptomatic proportion is 48% in the base case, 27% in the best case, and 68% in the worst case. Not all paramaters have three settings. In all three scenarios, they assume that a 50 percent success rate in contact tracing.
Key drivers of the model
In this section, I speculate about the key drivers of the outcomes of the Cornell model. Your level of comfort with the model outcomes depends on whether you think these assumptions are reasonable.
Compliance to the test-trace-isolate program is a key structural assumption. The current version of the Cornell model assumes full compliance. This means every community member submits to testing every five days. This means everyone testing or reporting positive is quarantined. This means everyone is following masking and social distancing rules.
Full compliance is crucial to stemming community spread via asymptomatic carriers. One counter-intuitive insight from the model is that infections are reduced when the proportion of asymptomatic carriers is raised. Because of the test-trace-and-isolate protocol, they assume that an asymptomatic case will be detected even earlier than a self-reported case; earlier detection means less time for the virus to spread around. In the No Re-open scenario, cases are detected only through self-reporting so the virus has more time to propagate.
Another strong structural assumption is the on-campus, off-campus living situation of the undergrads. The researchers assume in the Full Re-open scenario, all returning undergrads live on campus while in the No Re-open scenario, most undergrads still move back to Ithaca, where Cornell is located, but all of them live off campus. This is no small matter because the model provides test-trace-isolate only to on-campus students. If some returning undergrads live off-campus in the Full Re-open scenario, they would not be regularly tested and may become a source of infections.
Another important assumption is low prevalence, on and off campus at the start of term. The base-case prevalence for the county outside Cornell is assumed to be 0.28%, an exceptionally low value; and for people's home locations, 2%. The researchers cite a Cornell Vet School source with a single positive test result for the former number. For many weeks, New York State Governor Cuomo has claimed that the prevalence in the state is above 10%, and that it is 3.6% in less populated parts of the state. (I am not arguing for Cuomo's number as readers of my blog know I don't trust Cuomo's data because his state health department that conducted the antibody tests has not made any information available about those tests. I'm juxtaposing these two numbers to show the range of possibilities.)
The double testing requirement regardless of symptoms prior to and upon arrival ensures that few returnees carry the virus with them. Thus, the researchers assume that only 0.09% of the returnees are infectious. They apply this low prevalence rate to people who already reside on the Cornell campus at the start of term "for expedience".
Another assumption concerns the average number of daily contacts between people in the community. This is set to 8.3. I think this is too low. It is defined as "an interaction between two people that has the potential for transmission of infection," and can include multiple contacts with the same person. The value is really a plug in the Cornell model because it's one of several factors leading to the implied value of R0, which they want to match to a CDC-endorsed value. If they increase daily contacts, they will decrease the probability of infection per contact to arrive at the same R0.
As with other epidemiological models used for this pandemic, the Cornell model assumes immunity. In other words, anyone who has recovered becomes immune, taken out of circulation. The researchers are certain there will be no deaths so the only path after infection is recovery and immunity, with a few hospitalizations.
Yet another feature worth noting is the fixed proportion of people assigned to four levels of severity: asymptomatic, mild symptoms, hospitalized, ICU.
Good data to estimate infection rates and severity rates are hard to come by. In the assumptions used in this model, the undergrads fall into the 18-44 age group, which has the greatest chance (52%) of being asymptomatic (by comparison, 65 and above is assumed to have 13% of being asymptomatic). In the No Re-open scenario, the entire off-campus population are assumed to belong to this age group, which means they are mostly asymptomatic carriers, and because they are not tested while off campus, the model assumes a longer time to spread the virus, and this pretty much explains the headline result.
Sociology
At this stage, the Cornell model is a feat of engineering. It will be improved with some sociological input.
First is the thorny issue of compliance. Everything from mandatory testing to wearing masks is an inconvenience. A college isn't an autocracy so it's hard to imagine that full compliance is possible. What will the school do when people don't show up for testing? The sensitivity analysis shows 41% more infections when testing is reduced to every seven days.
Second is complacency. In the world predicted by the Cornell model, there are barely any infections. Most tests will come back negative. People will start feeling safe and get complacent. Compliance rate will likely deteriorate over time.
Third is gaming. Recall the double testing for returnees to ensure low prevalence at the start of term. At some point, Cornell will announce a date on which this rule takes effect. At this time, a fraction of people will arrange to arrive just before that date.
I'm not just speculating here. This human behavior is well documented in various countries that have announced a date after which all visitors must be quarantined on arrival. Some visitors then scheduled their flights to arrive just before that date. Journalists were interviewing "disappointed" passengers whose flights were unexpectedly delayed, leading to the "unfortunate" situation in which they had to be quarantined.
Cornell is surely going to have to extend the test-trace-isolate program to students in off-campus housing in Ithaca. Otherwise, the well-intentioned re-opening plan will fail. What is the university going to do when these returning students choose to live off-campus, rather than on-campus as the model assumes? Every such student brings the Full Re-open scenario closer to the No Re-open scenario.
Gaming the rules lets the virus creep onto campus. The current model does not account for this possibility.
Fourth is unhappiness. All these rules and restrictions adversely affect campus life. It also appears that the contact tracing by the county health department may be lip service. They estimate that contact tracing takes only one day. Also, they do not test contacts discovered during the tracing, and thus Cornell will likely quarantine all contacts whether they test positive or not. The model estimates that 700-1,000 people will be quarantined at peak. While this is great for public health, I'm not sure how the community will receive it. Further, given 100% in-class instruction, what's the plan for quarantined students to keep up with school?
These are just a few predictable issues that should be included in the model.
The "No Re-open" alternative
The one definitely bad thing to have happened to this study is how the labels of the two scenarios fail to describe what is being modeled. As the study authors repeatedly explain, the biggest difference between the two scenarios is the presence of a strict test-trace-isolate program. Full re-opening without test-trace-isolate will not produce the predicted outcome.
How is the No Re-open (i.e. 100% virtual instruction) scenario defined? The fraction of the community already on campus (all faculty and staff plus half the graduate students) is assumed to continue to stay there. Of the remaining students, mostly undergrads, they assume about half of them will return to Ithaca and attend on-line classes from off-campus housing while the other half will stay at home or not return to school.
The two scenarios are closer than one might expect. In the base case, the Full Re-open scenario assumes 10,000 undergrads return to campus while the No Re-open scenario assumes 9,000 undergrads move back to Ithaca (and live off campus). What sets them apart is that students who live off campus will not participate in the test-trace-isolate program.
This setup is curious to someone admittedly not familiar with the campus housing situation at Cornell. The Full Re-open scenario suggests that they have on-campus housing for at least 10,000 undergrads. The No Re-open scenario suggests that there are at least 9,000 undergrads who can find off-campus housing. With a total of 15,000 undergrads, there must be quite a few vacancies during normal times at both on- and off-campus housing.
The model further assumes that those living off-campus in Ithaca will have zero interaction with the on-campus community. It assumes that the average number of contacts will be the same whether the student lives on campus or off campus. I should note that those two assumptions contradict each other. The former implies that fewer total contacts in the No Re-open scenario (with off-campus undergrads) while the latter implies an equal number of total contacts.
Pitfalls
The parameters in the model are by and large determined independently but some of these variables are clearly correlated while others are likely affected by the same lurking variables. This creates difficulty of interpretation and potential for contradiction, especially when looking at sensitivities.
I'll give two examples. In the base case Full Re-open scenario, 10,000 undergrads are expected to return to campus, and the prevalence of Covid-19 in their hometowns is assumed to be 2%. In the worst case scenario, 12,000 will return, and the prevalence is doubled to 4%. The decision to return to campus is likely correlated with the student's perception of risk, which depends partly on hometown prevalence. So those two parameters probably aren't independent.
In Figure 12, they remarked that the model outcome is insensitive to the time delay of contact tracing. The model assume one day's delay in all scenarios while the sensitivity analysis explores up to seven days. Sensitivity analysis is done one factor at a time while holding all other parameters constant. One such parameter is the fraction of contacts identified and traced, which is fixed at 50 percent. Is it plausible that the extra six days of tracing do not increase this proportion?
Speaking of Figure 12, I already put up a blog post about the problematic graphical representation of the sensitivity analysis results. See here.
***
The Cornell model is complicated, and by their account, assembled in haste. While it has become fashionable to damn Covid19 modeling efforts, one should realize the value of the thinking behind models. Building them forces us to think hard and deeply about cause and effect, direction and magnitude, relative importance of different factors, data quality, and human behavior. The output is infinitely better than mouthing off.
The decision to re-open campus is more complex than this type of model can capture. There are many other issues involving the school's operational model, debt load, customer satisfaction, capability to handle an extra class of students, status of international and out-of-state students, parents' impatience with having kids at home, etc. etc.
If the decision is to be based on just this model, I'd not expect the outcome under Full Re-Open to materialize. I think the students will not comply with such stringent rules, and when they game the system, they adulterate the conditions that produce the "counter-intuitive" result that captured the media's attention.
Recent Comments