We've reached the third part discussing the Covid Symptom Tracker app, launched in the U.K. by a King’s College team, and supported in the U.S. by Harvard and Stanford.
In the first post, I explained how the researchers analyzed their data to draw two attention-grabbers: that around 13 percent of the U.K. population has been infected with SARS-CoV-2 by the end of March, and that the loss of taste and smell is a better predictor of having the virus than the usual suspects like persistent cough. I also pointed out how both those conclusions are questionable. Some issues include the unwarranted assumptions of unbiased data, perfect model accuracy and perfect test accuracy.
In the second post, I explained how the research team pre-processed the app data. Pre-processing is absolutely necessary when we're working with observational data for which we exercise minimal control. Some of these steps could introduce additional bias into the dataset. Certain information such as levels of severity of symptoms and the sequencing of symptoms is dropped, which is acceptable for a quick first-pass analysis; it should be good practice to describe how those decisions were arrived at.
At this stage of the pandemic, we are fully aware of the importance of knowing how the data came into being. This is the focus of today's post.
***
#1 Self-selection
The biggest issue with any tracker app is self-selection. Mobile app users do not give a random sample of the U.K. population. For example, Covid-19 especially impacts the elderly, who are far less likely to use smartphones. Some of the bias can be corrected by appropriate statistical adjustments.
It's clear that certain groups of the population have an inclination or disinclination to use a tracking app. Will someone who has recovered sign up and send daily reports to the app? Will healthy people with no symptoms use the app? The more symptoms someone has, the more severe those symptoms, the more likely someone will find out about, and use the Symptom Tracker app. In a country where tests are rationed, if one has been rejected from testing, one is more likely to use the Symptom Tracker.
It appears that the app users are more likely to be infected than non app users, which adversely affects the goal of determining the prevalence in the general population.
#2 Self-reporting
Everything – demographics, symptoms and test results – is self-reported. There is little app developers can do to verify user-submitted responses. Reported symptoms are likely to be self-diagnosed. Users also select which pieces of information to provide. Missing data present analytical challenges described in the previous post.
#3 Outreach
The authors are very proud of the 1.6 million app downloads, and mentioned this fact five times in the short paper. The data were collected between 24 and 29 March 2020. The app was launched on 24 March 2020. (Assuming they disqualified some app users for various reasons, the total downloads must have exceeded 1.6 million in five days.) A couple of app traffic experts I spoke to felt that such an impressive number requires explanation. What marketing tactics created awareness of the app? Where did they advertise to drive app downloads? What motivated users to adopt the app? Other than random selection (with 100% response rate), any method of outreach introduces selection bias.
#4 Analytical Sample
In any case, the 1.6 million statistic is completely meaningless. In an accompanying table of the preprint, we learn that 412,000 users “answered questions on their symptoms”. The other 1.1. million played no role in the analysis. To catch a glimpse of them, we must visit the top few rows of Table 1 showing demographic summaries. Of the 412,000, about 1,200 reported their PCR test results. Only 1,200 made it to the predictive model. The remaining 410K were later scored by the model, leading to the 13% prevalence claim.
#5 Selection by triage testing
In addition to self-selection discussed in #1, the analytical sample is also biased by the U.K. government’s triage testing policy. The analytical sample retains only people who have test results. Those who have been tested equals the group that qualified for testing. During March, the tested subpopulation consisted primarily of people who have severe symptoms of Covid-19. Because of triage testing, the subgroup who have test results are those with severe symptoms, while the subgroup who haven’t been tested have no or mild symptoms. They are essentially disjoint, which presents extreme analytical challenges.
#6 Aided questions
Those familiar with marketing research know the difference between aided and unaided brand awareness. To measure aided awareness, the questionnaire may ask if someone has heard of the brand Snap. By mentioning Snap, the respondent has been reminded of the brand while filling out the survey, and thus brand awareness may be over-estimated. An unaided question may ask the respondent to list the names of some brands of social media websites they know of. This sort of question produces a messy pile of data.
In the Covid Symptom Tracker App, every question is aided. The questionnaire designer works off a list of known symptoms, asking app users if they experienced each symptom on the list.

#7 Default answers
Looking at the screengrabs, I noticed the pre-populated answers. To allay my doubts, I found and view of a few Youtube videos of people using the Covid Symptom Tracker app. I was surprised to see that the questions about symptoms come with default answers. The defaults appear to be “no”. By having these defaults, it is impossible to distinguish between the user choosing to skip the question, and the user specifically answering no. There will be a massive bias toward default answers. This is, in a word, fatal.
#8 Default answers super-sized
In one of those videos, the host found out that after indicating he was feeling fine, the daily report immediately ended, with no need to answer individual questions about symptoms. From what I could tell, the app then filled out every symptom question “no” by default. (If that wasn’t the behavior, then this user would have zero reports of symptoms, which would result in his exclusion from the analytical sample based on the criteria laid out by the research team.)
So, unintentional bifurcation occurred here. Users who say they feel fine are offered an "unaided" assessment of symptoms while users who say they feel not right are given an aided questionnaire. (See #6 for the definition of aided vs unaided.) Further, the adverse effect is magnified among those who have tested positive, and presumably recovered – as this subgroup is most likely to be feeling healthy.
#8 Timing of symptoms
It’s also crucial to ask the users who reported test results to only list symptoms they experienced prior to taking the test. One can’t predict based on information not known until the predicted event has passed. However, with the design of this app, it's not clear how to attach symptoms to an earlier date, which is not the date of reporting.
***
It's important to say that the presence of bias is a fact of life when you have observational data. Bias does not automatically invalidate an analysis. Recognizing its existence is the first step; taking measures to size up the direction and magnitude is next; and finally, designing appropriate adjustments to mitigate bias is key.
I should applaud the research team for making very responsible disclosure of the limitations, referencing many of the issues I’ve raised in this and my previous posts.
After acknowledging that all their data are self-reported, that the anosmia might appear after other symptoms or after recovery, that testing positive is not identical to being infected, that triage testing means their sample is “not fully representative”, the research team then, in the very next sentence, recommended that WHO adds loss of smell and taste to their symptom list. (which they did along with various other symptoms)
It’s a curious practice of academic science that making such disclosure is an act of exorcism, after which authors are free to make recommendations while ignoring the disclosed deficiencies.
[P.S. See my previous comments on the data pre-processing and the analytical methods of the Covid Symptom Tracker study.]
Recent Comments