The SuperBowl game was quite a bore but I latched on to the comment made by the TV host that one of the players who scored did so on his birthday. What is the probability of a player scoring on his birthday? Is it rare?
A key element determining this probability is the probability that one or more players' birthdays fall on game day (Feb 9). This question reminds me of the famous "birthday problem". The birthday problem is a staple of textbooks trotted in front of every student of probability: what is the probability that at least two people in a group (e.g. students in a classroom) share the same birthday?
The birthday problem is popular as a tool of instruction because the answer is surprising. One thinks that it should be quite rare to find two sharing the same birthdays in a small group, say of 30 people, since there are 365 possible birthdays. But working out the math leads to the conclusion that, in a class of 30 students, the chance that at least two share a birthday is around 70%. In a class of 60 students, the chance rises to 99%.
An NFL team roster has 53 players, and in any game, can activate 48 players. For simplicity, I'm going to use 50 players per team, or 100 players for two teams on game day. (Some of these, e.g. the backup quarterbacks, may not get to play at all but let's ignore that complication.)
What is the probability that at least one of the 100 players were born on Feb 9? The chance is about 1 in 4. Even with 250 players, the chance is still only 50 percent. With 400 players, the chance is two-thirds. Four hundred exceeds the number of days of the year, but remember that more than one players could share the same birthday. (All these calculations, including the classic birthday problem, assume births are evenly distributed throughout the year.)
***
The larger the group, the less surprising each scenario is. Why does the probability grow much more slowly in the NFL problem than in the classic birthday problem?
The key factor is the fixing of the date to the game day. In the classic birthday problem, the shared birthday could fall on any day of the year so there are many more scenarios that fit the requirement.
This is confirmed when we compare the two formulas, term by term. The below formulas each address the inverse of the original problem. That is to say, they compute the probability of no student sharing the same birthday, and of no player having a Feb 9 birthday. Each term is associated with an individual in the group. In the NFL problem, every term is the same because the only requirement is that the birthday does not fall on Feb 9. In the birthday problem, every term is different because as we account for each student, another date is excluded so that we don't have two birthdays falling on the same day.
Thus, each term in the birthday problem is smaller than the corresponding term in the NFL problem. The more individuals we include in the problem, the wider is the gap between the pair of terms. In other words, the (inverted) probability vanishes much faster in the birthday problem relative to the NFL problem. Invert that, and the required probability increases much faster in the birthday problem than the NFL problem. (Actually, the first term in the birthday problem is larger than the first term in the NFL problem because in the birthday problem, the "first" student can have any birthday while in the NFL problem, the Feb 9 date is excluded from every player. However, this effect is small and dwarfed by all the other terms.)
***
The original question concerns the probability of a player scoring on his birthday. From above, we have a 25 percent chance that at least one of the 100 players were born on Feb 9, the game day. Only these players could score on their birthdays. So we need to multiply this by the probability of any of these players scoring.
This starts to get trickier. We can't justify assuming that all players have the same chance of scoring in a game. Clearly, first-team players have a higher chance of scoring, so do offensive players relative to defensive players. We would need a model of the probability of scoring dependent on variables such as time of field, position of the player, etc. One factor we won't need to include is date of birth (I mean month and day): that's because I assume that the ability to score is unaffected by day of birth, which I think is uncontroversial, and this "independence" assumption simplifies the formula.
Now, from the first part, we generate a scenario with k number of players born on Feb 9, then from the scoring model, we compute the probability of at least one of these k players scoring. We'd also need some baseline statistics such as what proportion of the players are offensive. Then, we compute the weighted average of all these scenarios.
Recent Comments