« For a change, the FDA earned my trust | Main | Story time on aspirin »


Feed You can follow this conversation by subscribing to the comment feed for this post.

S. Frazier

Did some simple research: please test my thinking:

Ca - Cardiac Arrest
S- Symptons being shortness of breath and chest pains
P(S|Ca) = 53% probability of symptons given cardiac arrest
P(S) = 8% (overall population data, really rough)
P(Ca) = .8% (800/100,000 people suffer Ca)
Using a Bayesian analysis:
P(Ca|S) = P(S|Ca)*P(Ca)/P(S) = 53% x .8%/8% = 5.3%
Your chances of cardiac arrest given the symptons is 5.3%, meaning you may not need to run to the hospital. You certainly need a control group to factor out issues such as panic attacks, etc., that can cause the same symptons.


SF: Thanks for your contribution. Always good to do back of the envelope. If we do a similar analysis on the other symptoms, the number would be even smaller given the much weaker correlation.


Big data doesn't imply a lack of control groups. Lazy analysts don't use the available data to build an appropriate control group.

Lazier journalists re-print this as useful information.


Chris: Big data is mostly observational data and it takes both a lot of time and a lot of statistical expertise to build "appropriate control groups" so I'm not surprised this is not being done. Sometimes you just can't build control groups from existing data. For example, if you launch a new version of an iphone app, Apple is not going to let you keep both new and old versions in the same store; if you want to measure the impact of the new app, you are forced to perform pre-post analysis. Any creation of a control group would require uncomfortably strong assumptions.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep