In a previous post, we saw a statistical reason for why the observed distribution of birth-months of NHL players may be remarkably more variable than those of the population at large, purely due to the process of random sampling of 761 people from millions. It is not at all surprising that certain months would account for say 10% of the births of NHL players (but would be surprising if this happens with the US population).
Next, is it unusual to have higher-than-8% values in the spring months and the lower values in the winter months? Again, we want to know if the pattern we observed may just happen by chance. The answer is contained in the following histogram.
Here, I did 1000 random selections of 761 people. For each selection, I fitted a line through the monthly percentages. If the slope of the line is significantly different from 0, then the line is not flat, which provides evidence that a month-of-year effect exists. By convention, a p-value of 0.05 or smaller (for the t-test of the month coefficient) indicates the slope is not flat.
The histogram collects all the p-values for the 1000 regression lines. We note that a great proportion of the 1000 p-values is greater than 0.05 (actually, only 49 out of 1000 p-values <= 0.05). Thus, we conclude that it is exceedingly unlikely to see
a significant downward trend from spring to winter if indeed 761 people were randomly
selected from the at-large population.
"Exceedingly unlikely" however does not mean impossible. Below are the data and the regression lines for the first 25 simulations. The one labelled p-value = 0.03 is one of the 49 non-flat scenarios (shown by red lines) and closely resembles the observed data! In this case, statistics gives us that the probability of observing this is about 0.049 (= 49/1000) and we'd elect to believe that the assumption of random selection (no month-of-year effect) is incorrect, rather than accept that we saw an exceedingly rare event.
To sum up, the fact that the NHL line fluctuates much more wildly than the population lines is not surprisingly and easily explained by sample size. However, the fact that there is a temporal downward trend deserves attention as it is highly unlikely to occur if the 761 players were randomly selected. (To get an even better picture, it may be worthwhile to figure out the likelihood of a downward trend conditional on having a trend.)