« How to act like a data scientist 4: round data and square scripts | Main | How to act like a data scientist 5: exit polls discredit electability »


Feed You can follow this conversation by subscribing to the comment feed for this post.

Dan Vargo
Remember we are in flu season. According to the CDC, influenza and pneumonia causes 56,000 fatalities in the U.S. each year, the eighth leading cause of death. The fatality rate is 17 per 100,000 or 0.02%. This implies an annual infection count of 327 million cases of the common flu (despite the widespread use of the flu vaccine). On average, that's almost 2 million infections a day if we assume the flu season lasts 6 months (the bulk of infections actually occurs between December and March).

This would mean that nearly every person in the US get the flu (or pneumonia) every year! This is clearly not the case!


DV: Yes, the number seems high. I got the fatalities count and the rate from this CDC doc (PDF). Page 17, Table 1. Am I misinterpreting the numbers? I suppose people can get sick more than once in a flu season.

Dan Vargo

I saw the data and agree those numbers are correct. Because this category is two different conditions it arises a few possible cases of misinterpretation:
1. One condition may be substantially more common AND/OR deadly than the other
2. People who get pneumonia and not influenza are not relevant to your analysis
3. People who get pneumonia and influenza may not be relevant as the combination's effects are not shown (how much worse is getting both than one or the other?)
4. A single person may contract many cases during the statistic period

I can conclude with a high degree of certainty that 327/331 million people in the US (98.8%; meaning 99.985% of people would get it within a 2-year period) aren't getting it yearly. I like your content normally, but this should have been obvious at a glance.


DV: Sure any of those things will be true. The estimate I've seen from the media on number of deaths per year from common cold is 30 to 50,000 so on #2, it seems the bulk of those cases are influenza-related. We agree on #4 and I was referring to cases of infection not unique people infected. #3 is a relevant consideration. I just noticed the name of the condition says "A AND B" but it is actually inclusive A OR B. If it is really "AND", we should expect to find another data series that says "Influenza only."

I think the message here is that if you take the CDC numbers at face value, and look at the number of deaths and the fatality rate, there is a surprising proportion of people who catch the flu each year. One way or another, they are counting 300 million infections.

None of what you said is wrong, and I'd never criticize anyone for being skeptical :)

Antonio Rinaldi

4) The connection between infections and deaths.
Number of infections refers to today, number of death refers to infections of several days ago. This is important especially at the epidemic beginning, since the exponential growth of cases makes new numbers much greater than past numbers.

7) Rates not counts. Counts are misleading.
I'm not sure. Mortality of 1% when 1% is infected is equivalent to mortality of 0.1% when 10% is infected: they yield the same number of deaths.
When serious cases need intensive care and number of intensive care beds are fixed, counts count a lot (no pun intended).


AR: on #4, yes we need the cohort-adjusted fatality rate, which is tough to estimate in the beginning because we need enough patients to have passed through the treatment period. The unadjusted fatality rate will go up for sure if the growth of new cases slows, and that will be another moment when the media may turn good news into bad.

on #7, for treatment, counts matter, you are absolutely right about that. For figuring out whether we might get a pandemic or a manageable crisis, the rates are more useful because the count is a product of two rates.

The comments to this entry are closed.

Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep