In the previous post, we discussed xyopia and causal inference relating to poll results.
In this post, I want to discuss the administration of the poll. As a reminder, here is the snapshot I'm reacting to.
How many people were polled? How did these people learn about the poll? Were there any pre-screening?
I found the USA Today article related to this poll, in which they claim they had "more than 1,000" respondents. So let's assume they had 1,000 respondents.
That is not enough disclosure! The statement that employees are happier because of free food is a statement about the average employee at the average company. There are two levels of sampling involved here. It is unlikely that they first selected companies and then found employees within those companies. It's more likely that they found respondents and asked them where they work.
We don't know how many companies were represented by the 1,000 respondents. That depends on how they collected the data.
Let's consider two extreme cases.
If I told you that all 1,000 respondents came from two companies, one serving free food and one not, you'd be very skeptical of the result. You should be confident that company A's employees are happier than company B's but any conclusion beyond those two companies is dubious.
If I told you that 1,000 respondents came from 1,000 companies, you should be skeptical also. This means that the happiness measure for each company is based on a sample size of one employee. (Averaging across companies mitigates this issue a little.)
In practice, you get different numbers of respondents by company, which introduces yet another wrinkle in the analysis of the data. Some companies are given more weight than others.
The article discloses that only 16% of poll respondents work at companies offering free food. That would be 160 respondents, of which 107 rated their happiness in the top 2 boxes. How many companies did those 100 odd people represent?
Of note, n=160 is very close to the minimum sample size needed to establish the conventional 95% confidence level for statistical significance - if both groups had about 160 respondents.
This means that the large sample size (~840) is crucial to the statistical conclusion. That size of sample makes the estimate of the no-free-food sample very "precise", plus/minus 3%. The estimate for the free-food sample is not precise, about plus/minus 7%.
One hopes that poll reporting comes with an appendix containing the missing information. It's useful for establishing trust.