It's August, and many universities in the U.S. are attempting to bring some or all of their students back for in-person instruction. Within weeks, sometimes even before the first class, several campuses have already reported dozens or even hundreds of confirmed cases of infection by the contagious novel coronavirus. School administrators are justifiably worried that the virus, once it gets transmitted to older or more vulnerable community members, would cause grave illnesses and even deaths.
It's heartening to see that campus groups are demanding more data transparency from their schools (e.g. this open letter from public health professors at the University of Georgia). In response, some universities have rolled out dashboards, which is just a fancy name for data reports, iconically sprinkled with statistical graphics.
Nevertheless, the standard practice of minimal reporting of campus-wide totals represents much motion but scarce substance. Serious analysts will learn next to nothing from tallies that mix entry testing with ongoing surveillance testing, multiple tests of the same individual with tests of different individuals, etc. In addition, the focus of data reporting must expand from reporting outcomes (how many cases are there?) to reporting operations (are people complying with testnig? are people wearing masks?) in order to reveal opportunities for mid-course correction.
In this post, I outline a comprehensive plan for data reporting that should arm the university communities with the necessary data to assess the level of risk on your campuses.
I call this the aspirational plan because few school administrators will commit to this degree of transparency. Data reporting is political - whether it happens at schools, businesses, governments or organizations. It's quite possible that some officials secretly monitor some of the listed items. That'd be better than not tracking them at all. It falls on campus activists to push their administrators to collect the right data, and release more information.
As you read through this plan, I hope you develop some respect for the profession of data reporting (in business, this usually belong to the business intelligence team). This is not some boring job but a creative and thoughtful endeavor of utmost importance.
***
The ultimate goal of any data reporting plan is to keep the campus safe and open after it re-opens.
In putting together the aspirational data reporting plan, I start by listing the keys to sustained re-opening, based on my interpretation of statistical models built by Cornell and Georgia Tech researchers. Click these previous posts (here and here) to learn more about these models.
(A) Stem the import of virus onto campus.
This can be broken down into two objectives:
(A1) Ensure there is no virus on campus at the start of the semester. This is typically accomplished through comprehensive testing of all returning people, and isolation of the positive cases. [Notice that many Covid19 measures are as aspirational as my data reporting plan. We hope there is no imported virus but everyone assumes leakage is inevitable.]
(A2) Ensure there is no import of virus onto campus during the semester. Various measures are implemented to identify visitors and require self-reporting of health statuses.
(B) Snip transmission chains aggressively:
If the virus seeps onto campus (a certainty), we need to
(B1) find the infected cases quickly (bearing in mind silent spreaders), and
(B2) prevent the virus from spreading widely.
If a campus achieves both of those objectives, the mathematical models predict that infections can be contained during the semester, and students won't be sent home.
***
A data reporting plan must address the above two success factors separately. Over-aggregation of data is an adversary throughout the report design process.
How to monitor the import of virus at the start of the semester
This is one of the simpler tasks. Most dashboards include the number of tests, and the number of positive results. These metrics are not sufficient.
The tallies must not mix together entry testing and ongoing surveillance testing. Entry testing should have a section of its own. In order to interpret the number of tests, we must report the number of returners. It's even better to also count the number of unique individuals tested. Some schools require those already on campus to get tested. If so, those data should be reported in a separate section, not mixed with the data for returners.
[If your school isn't doing any entry testing, or some washed-down version of it, you're dealing with a different problem. Allowing virus to seed itself around campus is the surest way of enabling outbreaks; this is the key epidemiological lesson from the past few months.]
How to track the import of virus throughout the semester
The next task is to track the import of virus during the semester. We've heard many college presidents and deans of student life issue warnings blaming mixing of on-campus and off-campus groups for community spread, and yet we've not seen any university with a data reporting plan capable of tracking this crucial factor. In fact, the Cornell model (see here) assumes, gasp, no mixing of on- and off-campus people so in a sense, this is a blind spot for administrators.
To simplify the language, I'll use the word "visitors" to include anyone showing up on campus who does not reside on campus. The largest groups of visitors are off-campus students and staff, and guests (such as academic collaborators and vendors). At the minimum, a data reporting plan must include an estimated number of visitors circulating on campus. Roughly speaking, this is a function of entries and exits from campus, which are monitored. (Please talk to statisticians on your campus.)
In other words, if the data report contains the number of entries to campus (or campus buildings), the number should be broken down into visitors versus non-visitors. Lest you worry, we don't care about the names of people entering and exiting; for example, if one person enters 10 buildings, we count 10 entrances because we want a measure of contacts, not individuals.
"We don't have the data" is the excuse frequently used by those who prefer not to see the data. Having a reasonable estimate of the number of visitors circulating on campus is so critical to monitoring on- and off-campus mixing and any course correction that administrators should sit down with statisticians to come up with a plan. (Or, hire me or other consultants. We're not that scary!)
How to find virus that has already entered your campus
The next section of the report deals with finding any virus that has entered the campus. This is another tally of test results. The biggest concern is compliance, an issue I raised when discussing the Cornell and Georgia Tech models (see prior posts here and here).
Some schools including Cornell and Georgia Tech have adopted a comprehensive (aka surveillance) testing plan. Most schools take a "see no evil" approach, asking people with symptoms to volunteer for testing (already failed at the national level). Some schools use a limited random testing program. For comprehensive or random testing, the school should report how many invitations were sent out, and how many complied. For self-driven testing, the proportion of unique community members who have been tested in the last X days may be of interest.
It's impossible to interpret aggregate test data. Any serious report about testing must break down the tallies by the type of testing (comprehensive, randomized, self-reported). Other useful breakdowns are type of affiliation (undergrads, grads, faculty, staff, etc.), age groups, gender, subdivisions (e.g. schools). Such analyses allow administrators to see early warnings and take action, such as when they discover that certain communities are flaunting testing requirements.
How to measure community spread
The last section of the reporting is also the most interesting: we want to measure community spread. The standard practice of counting number of confirmed cases is useless. It's like standing by hopelessly watching a runaway train. We focus on the process of spreading, not the outcome of spreading. The goal is to estimate the number of contacts.
The starting point, and easiest, metric is class attendance. The entries and exits from buildings, already mentioned, should be a part of this.
Tried-and-true data collection tools are very useful here. Literally count the number of people walking past selected hotspots on campus. Count the number of masks. I'm sure you can find a few statistics or data science students willing to help out!
In addition, new-fangled AI tools can help. There are dozens of businesses selling (out your) location data collected from cellphones and apps. Surveillance camera footage can also reveal traffic intensity of pedestrians, cyclists, and vehicles. If this is too esoteric, head over to the data science, computer vision, or engineering departments and I'm sure they'll be more than happy to assist.
The data report should not print all these details. It should track a few metrics, such as class attendance, head counts, mask counts, and vehicle counts. These tallies can be samples (such as at hotspots), as we are only interested in trending.
How to monitor operating capacity
For schools that have credible contact tracing and quarantine procedures, there should also be a separate report on relevant metrics. The number of people under quarantine is important but also the average or median length of time spent under quarantine. The number and types of contacts shed light on the transmission chains.
***
The last piece of the puzzle
If any school can roll out the aspirational data report, I'd be overjoyed. The community members will have so much useful data, and administrators will be able to course correct based on trends. As my readers should expect from me, I must point to the elephant in the room.
I must admit that even this aspirational plan has one missing element. It does not measure what economists call the "externalities" of the re-opening. What is the impact of re-opening on the public health of the surrounding community, the people who are not affiliated at all with the university?
People live in areas, or even the same buildings, popular with off-campus students and staff. These community members may come in contact with students or staff at neighborhood restaurants, supermarkets, public transport, or by just walking past each other. From what we know about the coronavirus, outbreaks on campus easily spill over to neighbors. Some coordination with local public health authorities will be needed to address this difficult problem.
To put it bluntly, even if there are no deaths on campus, some hospitalizations and deaths in the surrounding community are by-products of the re-opening. Universities should not turn a blind eye to those.
Comments
You can follow this conversation by subscribing to the comment feed for this post.