« For a change, the FDA earned my trust | Main | Story time on aspirin »


Feed You can follow this conversation by subscribing to the comment feed for this post.

S. Frazier

Did some simple research: please test my thinking:

Ca - Cardiac Arrest
S- Symptons being shortness of breath and chest pains
P(S|Ca) = 53% probability of symptons given cardiac arrest
P(S) = 8% (overall population data, really rough)
P(Ca) = .8% (800/100,000 people suffer Ca)
Using a Bayesian analysis:
P(Ca|S) = P(S|Ca)*P(Ca)/P(S) = 53% x .8%/8% = 5.3%
Your chances of cardiac arrest given the symptons is 5.3%, meaning you may not need to run to the hospital. You certainly need a control group to factor out issues such as panic attacks, etc., that can cause the same symptons.


SF: Thanks for your contribution. Always good to do back of the envelope. If we do a similar analysis on the other symptoms, the number would be even smaller given the much weaker correlation.


Big data doesn't imply a lack of control groups. Lazy analysts don't use the available data to build an appropriate control group.

Lazier journalists re-print this as useful information.


Chris: Big data is mostly observational data and it takes both a lot of time and a lot of statistical expertise to build "appropriate control groups" so I'm not surprised this is not being done. Sometimes you just can't build control groups from existing data. For example, if you launch a new version of an iphone app, Apple is not going to let you keep both new and old versions in the same store; if you want to measure the impact of the new app, you are forced to perform pre-post analysis. Any creation of a control group would require uncomfortably strong assumptions.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)

Marketing and advertising analytics expert. Author and Speaker. Currently at Columbia. See my full bio.

Spring 2015 Courses (New York)

Jan 26: Business Analytics & Data Visualization (14 weeks) Info

Feb 23: Statistics for Management (10 weeks) Info

Mar 28: Careers in Business Analytics & Data Science (one-day seminar) Register

Apr 7: The Art of Data Visualization Workshop (6 weeks) Register

Next Events

Sep: 28 Data Visualization New York Meetup, New York, NY

Oct: 5 Andrew Gelman’s Statistical Communications class, Columbia University

Oct: 13 AQR ProSeminar, NYU Sociology

Oct: 22 Leading Business Change Through Analytics, Columbia Business School

Oct: 30 Ray Vella’s Designing Infographics class, NYU

Past Events

See here

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee


  • only in Big Data