News over the weekend from Asia was that even as domestic transmission appears to have been contained, a small number of new coronavirus cases came from people returning home from overseas trips.
I immediately thought of the Wald paradox, an engaging historical example known to many statistics students. I featured it years ago on this blog (here and here).
Wald was a famous statistician during WW2. He was tasked with figuring out how to improve fighter planes. The data he had came from inspections of planes that returned from war zones. By common sense, one can compute the amount of damage suffered in different parts of the plane, and conclude that those parts of the plane that suffered the most damage need to be reinforced.
Wald saw the fallacy of common sense. The statistician realized that those parts of the plane that suffered the least damage should be reinforced. That's because the planes that never came back were missing from his dataset. This is an example of survivorship bias. The planes that got shot down were not the same as those that survived - the surviving planes did not get hit in their most vulnerable parts; if one only analyzed data from survivors, one drew the wrong lesson.
***
In the last weeks, the media has been promoting the (likely) myth that the epicenters of the COVID-19 pandemic were first China, then cruise ships, then South Korea, then Iran and then Italy. Those were the "unsafe" countries reporting the most cases and deaths. The "safe" countries were shutting off travel to these unsafe countries.
Imagine a group of Asian tourists who decided to cancel their trip to Italy or South Korea. They could switch their itinerary to visit the safe countries, like U.S. or Germany where there were (at the time) negligible COVID-19 cases. They made a common-sense decision based on the available data.
Alas, as in the fighter planes example, the most useful data are the missing data. In this case, the "unsafe" countries are those who are conducting a large number of tests and thus producing data. The "safe" countries are those with limited testing.
Traveling to these "safe" countries is not safe as the data made it seem. The coronavirus may be in the community, and these tourists may unknowingly bring the virus back to their home countries.
This is an example of reporting bias. The data we have come from countries who are conducting a lot of testing while the data we don't have come from those that do limited tests. Opting for "safe" countries is the wrong decision.
(that was a very long process to register to sign on).
The CDC boss has admitted that there has been some back testing which show flu victims actually had the Corona Virus.
Meanwhile Corona Virus is everywhere, pretty evenly if, as you suggest, look as test results as a function of tests.
Suggesting it has been spreading slowly for a long time, not in lightening strikes linked a travelling Chinese from Wuhan.
As the Chinese Foreign Spokesman pointed out (misreported but this was his point) there is quite a reasonable chance that the virus did not first start in Wuhan, and might as easily have travelled from US to China (oranywhere to China) as the other way around.
The data is there - detailed dna sequences of strains can with AI identify how the virus evolved. It will certainly be known by many groups of researchers by year end. China has been generous with sharing the results of its testing and should be congratulated. Others have barely begun testing and revealed almost nothing.
Posted by: Michael Droy | 03/16/2020 at 11:37 AM