You can follow this conversation by subscribing to the comment feed for this post.

Interesting topic and looking for the next episode.

A little nitpicking:
"For later reference, just remember that 0.5 inch (12 mm) is a big error on this scale. Half an inch is the difference between the median person and the 97.5th percentile person. So our tolerance for inaccuracy is described in small fractions of an inch."

Is this right? The standard deviation for persons (as opposed to samples of 900) is 7.8". So the median to 97.5th percentile should be 15.6"
97.5 percentile is where this sample is relative to all other samples of 900 persons. A person in the sample 0.5 inches above the median would be at the 52.5th percentile

And 7.8/sqrt(900)
From memory shouldn't this be 7.8/sqrt(899) (serious nitpicking!!)

MD: You raised a common point of confusion. To clarify the situation, think about the overall objective of a statistical study. We have a sample of data, and we want to extrapolate from that sample to the unknown population. So we don't know what the SD of the population is. All we have is data on 900 people, with a sample SD of 7.7. The sample SD is not a good estimate of the population SD - not surprising because 900 vs 25,000 people. One of the most magical formulas in all of statistics is the standard error, which measures the variability of the sample average from sample to sample.

Given our objective, the error is defined as how far our sample average is from the population average, therefore, we care about the variability of the sample average, hence the relevant quantity is the standard error.

Does the population the sample is drawn from have an SD of 0.25 inches? That's what the gap between the mean and the 97.5th percentile being 0.5 inches implies.

A quick google search shows that the SD of human height is 3 inches, so the difference between the mean and the 97.5th percentile is 6 inches. That is consistent with my experience, too - I am around 70 inches tall, and an SD of 0.25 would imply that almost everyone is nearly exactly my height.

JK: The 0.25 inch gap is on the sampling distribution, not the population distribution. So, reverse the SE formula, 0.25 inch * sqrt(900) = 7.5 inches is an estimate of the population SD of heights. I took the population values from this CDC report (Table 8), and assumed a normal distribution on heights.

Instead of "almost everyone is nearly exactly my height", think almost every sample average height (from repeated drawing of 900 people) is nearly exactly the same value as the sample we're looking at.

It has always been difficult for me to understand the limits of the regression analysis so I am very happy that you are making this serie of post!
Thanks a lot for taking the time to write it and even more to share it with us here!
I am looking forward to reading it!

Kaiser:
I agree that 0.5 is a big error.
It was specifically the following statement:
"Half an inch is the difference between the median person and the 97.5th percentile person."
That seems odd to me.

Enjoying this series.

MD: Let me think this through. SE = 0.25 inch. Margin of error is 2*SE on each side of the mean. 2*SE = 0.5 inch. For a normal distribution, median = mean, and the margin of error is the middle 95%, spanning the 2.5th percentile to 97.5 percentile. So from the 50th to 97.5th percentile is half the margin of error, which is 2*SE. Did I screw something up?

Kaiser: I think your calculation is right for the percentiles of the distribution of sample-average heights. I am fairly certain it's wrong for the percentiles of the distribution of people's actual heights. 50th to 97.5th percentile for the latter distribution should be 2 SDs, not 2 SEs.

JK: Thank you for persisting. I see what you and MD are complaining about. It's the word "person". I've changed it to sample.

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Comments are moderated, and will not appear until the author has approved them.

(Name is required. Email address will not be displayed with the comment.)

##### Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

## Search3

•  only in Big Data
Amazon - Barnes&Noble

Numbersense:
Amazon - Barnes&Noble

## Junk Charts Blog

Graphics design by Amanda Lee

## Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

## Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here