« For a change, the FDA earned my trust | Main | Story time on aspirin »


Feed You can follow this conversation by subscribing to the comment feed for this post.

S. Frazier

Did some simple research: please test my thinking:

Ca - Cardiac Arrest
S- Symptons being shortness of breath and chest pains
P(S|Ca) = 53% probability of symptons given cardiac arrest
P(S) = 8% (overall population data, really rough)
P(Ca) = .8% (800/100,000 people suffer Ca)
Using a Bayesian analysis:
P(Ca|S) = P(S|Ca)*P(Ca)/P(S) = 53% x .8%/8% = 5.3%
Your chances of cardiac arrest given the symptons is 5.3%, meaning you may not need to run to the hospital. You certainly need a control group to factor out issues such as panic attacks, etc., that can cause the same symptons.


SF: Thanks for your contribution. Always good to do back of the envelope. If we do a similar analysis on the other symptoms, the number would be even smaller given the much weaker correlation.


Big data doesn't imply a lack of control groups. Lazy analysts don't use the available data to build an appropriate control group.

Lazier journalists re-print this as useful information.


Chris: Big data is mostly observational data and it takes both a lot of time and a lot of statistical expertise to build "appropriate control groups" so I'm not surprised this is not being done. Sometimes you just can't build control groups from existing data. For example, if you launch a new version of an iphone app, Apple is not going to let you keep both new and old versions in the same store; if you want to measure the impact of the new app, you are forced to perform pre-post analysis. Any creation of a control group would require uncomfortably strong assumptions.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Business analytics and data visualization expert. Author and Speaker. Founder of Principal Analytics Prep, MS Applied Analytics at Columbia. See my full bio.

Next Events

May: 2 New York Marketing Association Big Data Workshop, NYC

May: 5 NYPL Analytics Careers Talk, NYC

May: 8 Data Visualization Seminar, Denver, CO

May: 15 Data Visualization Seminar, Cambridge, MA

May: 17 Data Visualization Seminar, Philadelphia, PA

May: 22 Data Visualization Seminar, San Ramon, CA

Past Events

See here

Future Courses (New York)

Summer: Statistical Reasoning & Numbersense, Principal Analytics Prep (4 weeks)

Summer: Applied Analytics Frameworks & Methods, Columbia (6 weeks)

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee


  • only in Big Data