Statistics is the subject of generalizing data. In statistics, we study how to use imperfect data to draw general conclusions. We often have to wait for the future to reveal the evidence of success.
Ten years ago, McGraw-Hill published my first book, a best-seller, Numbers Rule Your World (link). I’ve hoped to explain five of the most profound concepts from my field in a way that anyone can understand and apply to their daily lives. I attempted the balancing act of featuring specific stories while bringing out the general principles. In the last couple of months, the Covid-19 pandemic has generated immense attention to data and statistics, and provided me confirmation, ten years on, that the contents of Numbers Rule Your World (link) have general applicability. It affirms that statistics is a great subject!
If you’re approaching my book for the first time, or re-reading it, here’s how you can work your way through it, making connections at various points to the Covid-19 crisis.
***
Chapter 2: Bad Spinach / Bad Scores
This is your perfect starting point. Half of the chapter details a disease outbreak investigation. You learn about the information networks the CDC puts in place to monitor unusual patterns of disease around the country. You discover data analysts whose job is to set off the alarm. How do they decide if the first handful of cases foretell an epidemic?
You encounter “shoe leather,” the painstaking collection of data to pinpoint the origin of an outbreak. It’s much harder than you might think. The U.S. politicians are making much noise about the early days of Wuhan. An epidemiologist may learn that 10 out of 15 first patients all visited the Wuhan seafood market before they got sick. But the Wuhan market might be a hugely popular destination, like Central Park in New York City. Finding 10 out of 15 people went to Central Park over a weekend could be more common than you think. If all 10 touched a particular rare species of wildlife, that's another story.
Another controversy of the moment is the huge cost of shutting down the economy, against the benefit of saving lives. I invite readers to think about this cost-benefit tradeoff. Because the disease outbreak was traced to a spinach processing plant in California, the FDA ordered a recall of all spinach, which depressed sales of salad greens for six months or so. The cost was much lower than for this pandemic, but the bacteria is well-known and thus predictable, and the lives saved were most likely zero, and even harder to pin down than in the current crisis.
People are already complaining that social distancing and other measures have not improved our numbers, by saying deaths were “unexpectedly low”. You discover how hard it is to prove the effectiveness of a policy decision because you need the counterfactual – what would have happened if the FDA did not recall the spinach.
Chapter 4: Timid Testers / Magic Lassoes
Everything you should know about the statistics of diagnostic testing and predictive models in one place. The motivating examples are (a) the fight against performance enhancing drugs in sports via anti-doping tests and (b) the false promise of terrorism prediction algorithms.
I wrote several posts (1, 2, 3) about the flawed Stanford antibody study that reported 50 times more infections in Santa Clara county in California than reported cases. Despite this headline, the raw data is also consistent with a scenario of zero proportion having antibodies. The statistical reason is the low underlying prevalence of antibodies, which caps the number of true positives in any sample. The number of false positives will be large relative to true positives, because the false positive rate applies to those without antibodies, which is the bulk of the population.
An extreme case of this phenomenon is terrorist prediction using a dragnet surveillance dataset. The set of terrorists is a drop in the ocean of people under mass surveillance. For a predictive model to capture most of these terrorists (low false negative rate), it must flag a lot of people – even a tiny false positive rate when applied to a huge population results in a lot of false accusations.
You see the distinction between a “chemical” false negative (positive) and a “practical” false negative (positive). To evade steroids testing, dopers use all kinds of tactics, such as handing in someone else’s urine. The diagnostic test finds a true negative but in practical terms, it’s a false negative. For swab testing, one may test negative if the swab doesn’t reach a virus-rich area, or if the patient is tested at the “wrong” time.
All tests have errors. What are the consequences of errors? Also, how likely will the error be discovered? For anti-doping, do dopers inform the test lab they made a mistake?
Chapter 1: Fast Passes / Slow Merges
The first chapter gives important insights on managing capacity, which is a pivotal issue for hospitals during the coronavirus crisis. I cover two case studies: (a) Disney’s management of theme-park capacity, and (b) state’s management of highway congestion.
The strategy of building more capacity is unavailable during a crisis, and wasteful in general. In this extraordinary time, some localities were able to bring up temporary hospitals. Transportation experts discovered that policies of regulating congestion are more effective. Highway ramps slow down the influx of new vehicles onto the highway, and are turned on before congestion arrives. The key is to delay as much as possible the onset of congestion. Sounds familiar? It’s the analogue to “flattening the curve”. Social distancing is supposed to reduce contacts, slowing down community spread. This tactic delays the peak demand for hospital resources. The chapter explains the scientific support for why such tactics work.
Chapter 5: Jet Crashes / Jackpots
Statistical significance is the topic of this chapter, one of the most common and also most commonly mis-used concepts in statistics.
In the Stanford antibody study referenced above, statisticians argued that the observed level of positive results is consistent with zero prevalence. This is a way of saying the result is not statistically significant. What we mean is that the study’s result (50 positives out of over 3,000 tests) would be considered normal even if nobody in the population actually had antibodies. This conclusion does not assert that the true prevalence is zero. The topic of statistical significance is made inaccessible to non-mathematicians by textbook authors piling on equations, but it need not be so. I even made a video on explaining "not statisitical significant" (link).
Before you apply a test of statistical significance, you must make sure you’re comparing apples to apples. In the context of air safety, you shouldn’t compare U.S. and foreign airlines on all routes; you should only compare them on routes they share, in which case, you’ll find no statistical difference. One of the bad stats that is circulating during this pandemic is the purported harm of ventilators. This is usually stated as the vast majority of patients put on ventilators eventually die (presumably, relative to those patients not put on ventilators). But only the most serious patients are placed on ventilators. The proper analysis should focus on patients who qualify for ventilators, and compare those who were put on ventilators to those who got other treatments.
We have already seen a host of invalid comparisons. Comparing complete data from years ago with incomplete data this year. Comparing regions that are in different phases of the epidemiological timeline. Comparing countries that have taken different approaches to managing the crisis.
Chapter 3: Item Bank / Risk Pool
This chapter addresses a deeper issue in using statistical averages. An illustrative example is educational testing like the SAT. Critics pointed to the gap in the average scores across racial groups as evidence of bias. But the racial grouping masks a third factor, which is differential ability between students. Due to various socio-economic factors, white students have higher average ability than black students. So the real question is: Among those students with comparable abilities, does the test still show bias toward one racial group? (The answer is not straightforward.)
During the pandemic, the “denier” faction has argued that the mortality rate of Covid-19 is “on a par with” that of seasonal influenza. Are those two rates comparable? The simple answer is no. Comparing mortality rates masks a third factor, which is the level of immunity in the population. Even at the most optimistic, the current level of immunity to the novel coronavirus is less than 20 percent (This statement generously presumes that (a) having antibodies confer immunity without question and (b) the New York City test result is not a scam, and the test is 100% accurate). By contrast, because of vaccination and recurrence, a good chunk of people have immunity to influenza. The mortality rate of Covid-19 should be compared to that of influenza among a population that has no immunity to either, or a population with similar levels of immunity to each.
The other example in this chapter explains how the insurance industry is an outgrowth of the law of large numbers. In any given year, health or auto insurers cannot predict with high accuracy which covered individuals will file claims, but the magic of statistics allows them to predict with great accuracy the aggregate claims. Disaster insurers run into trouble when rare events happen, such as when Class 5 hurricanes hit Florida. Suddenly, potentially all covered individuals tap into the insurance pool all at once. Insurers try to pool together risks of different types and different areas. The global pandemic is a huge threat to the insurance industry. Perhaps there are lawyers reading this, and they might tell me there are natural disaster exclusions. If those don’t apply, we will see a gigantic number of covered entities filing claims all at once.
Florida’s hurricane insurance market was dysfunctional also because the risk is predictable along some parameters. Those living inland realized they were subsidizing the coastal dwellers and wanted out. The coronavirus also creates different risk subgroups. The younger generations have much lower risks than the elderly. When social-distancing and other practices are applied to the whole community, the subgroup with lower risk peels away. A key difference is community transmission – the risk of hurricane damages does not spread by contact.
***
The above covers less than half of the book. Almost everything in those pages has direct relevance to the Covid-19 world. It’s quite heartening to find out ten years later numbers indeed rule our world.
Here’s a link to get your copy of Numbers Rule Your World. There are also translated copies in Chinese (simplified and traditional), Japanese, Korean, and Portuguese, as far as I know.
Recent Comments