You can follow this conversation by subscribing to the comment feed for this post.

You made one very common error in your discussion: the precision of a survey-based estimate has almost _nothing_ to do with what proportion of the population is being sampled (as long as you are not sampling almost the entire population). I am sure you know the soup-tasting analogy. So the wide margin of error of the estimates is not because 140,000 is a small proportion of all the businesses, but because the buisiness-to-business variability of the change in the number of employees is large.

In discussing the confidence interval you state:

What this means is that when they report -54,000, what they actually mean is that any number between +46,000 and -154,000 is consistent with the data that was observed. So in fact, the statisticians have no idea whether employment grew or shrank in August.

I think you are stretching it a bit to say that statistician have "no idea" whether employment grew or shrank. They do have an idea, their best guess is that employment shrank by 54,000 jobs. Yes, their sample was noisy, and even if the true value were 0, they would get sample estimates like this one more than 10 percent of the time, but I think it is a statistically valid claim to state "our data suggest it is more likely than not that employment shrank."

Aniko: You read a lot more into that sentence than I intended - and I realize that what I wrote could be misleading so thanks for bringing this up. If everyone in the population were surveyed, then we would have complete information and there could be no sampling error. If we can only collect partial information, then the larger the sample, the smaller the error. But after a certain point, increasing the sample doesn't reduce the error enough to matter so we like to say proportions don't matter. Hope I clarified that.

Also I want to clarify your statement that the large error is not due to sample size but due to variability. The sample size is designed to filter out a certain level of noise (conversely, read a certain level of signal); if the survey has been designed to read changes in employment of 10,000, then the statistician would have called for a lot more than 140K businesses to be surveyed.

Aaron: A margin of error comes at a certain level of confidence, in this case, 90%. Any statement like "our data suggest it is more likely than not that employment shrank" is valid ONLY if we accept a lower confidence level. 90% is already a lower threshold than typically used so one must be careful when issuing such statements.

I have fundamentally a strong objection to this line of thinking because it is equivalent to saying when the sampling error is large, just ignore the variability, and use the average value (point estimate) as the most likely value. It is precisely when the sampling error is large that we must pay attention to it. Otherwise, we might declare all the research on confidence levels and margins of error useless!

In Australia they include a trend line, but most commentators ignore it. I'm going to use the accuracy of unemployment figures as an example in a basic stats course this semester.

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

## NEW BOOTCAMP

See our curriculum, instructors. Apply.
Business analytics and data visualization expert. Author and Speaker. Founder of Principal Analytics Prep, MS Applied Analytics at Columbia. See my full bio.

## Next Events

May: 24 JMP Explorers on Data Viz , Cary, NC

Jun: 14 MITX Data Summit , Cambridge, MA

Jun: 21 Principal Analytics Prep Info Session, online

See here

## Future Courses (New York)

Summer: Statistical Reasoning & Numbersense, Principal Analytics Prep (4 weeks)

Summer: Applied Analytics Frameworks & Methods, Columbia (6 weeks)

## Junk Charts Blog

Graphics design by Amanda Lee

## Search3

•  only in Big Data