Last week, I pointed out the futility of using data as proof or disproof in Deflate-gate. Emphatically, a case of "N=All" does not make things better. I later edited the post for HBR (link).

In this post, I want to address a couple of more subtle technical issues related to the Sharp analysis, which can be summarized as follows:

1. New England is an outlier in the plays per fumbles lost metric, performing far better than any other team (1.8x above league average).

2. Different ways of visualizing and re-stating the metric yield the same conclusion that New England is the outlier.

3. There is a dome effect of about 10 plays per total fumbles, meaning that teams who play indoors ("dome") typically suffer 10 fewer fumbles than teams who play outdoors ("non-dome"). New England is an outdoor team that performs better than most dome teams on the plays per total fumble metric. If dome teams are removed from the analysis, New England is an outlier.

4. Assuming that the distribution of the metric by team is a bell curve, the chance that New England could have achieved such an extraordinary level of play per fumbles lost is extremely remote.

5. Therefore, it is "nearly impossible" for any team to have the New England type ability to prevent fumbles... unless the team is cheating.

***

Focus on Point 4 for the moment. This is a standard technique used by statisticians, and the basis of any analysis of "statistical significance". In statistical significance testing, we appeal to the normal distribution (bell curve) to estimate how close the observed sample is to the "average sample". The big question being addressed is: IS THIS AVERAGE?

Let's say we want to measure the effect of genetic modification on the size of fish. If the Fracken-fish sample is far from the average of natural fish samples, we conclude that Fracken-fish is statistically different (larger) from natural fish. A crucial requirement of this analysis is that the samples are randomly drawn.

But for Deflate-gate, the big question is: IS THIS EXTREME? The statistical significance tool is not designed to answer this question. The analysis tells us that the Patriots do not look like the average random sample from the NFL. Saying that something is not average is far from saying that it is an outlier! Indeed, statistical significance testing is frequently (and controversially) used to detect "small effects".

If the Patriot sample were randomly drawn out of the NFL, then Point 4 would have provided evidence of an extreme value but there is no random selection here. This takes us back to the point of my first post: the Patriots could belong to a group of elite NFL teams that have more "skill" in preventing fumbles, or there could be many other possibilities.

***

The other point of interest is that Points 1-4 say essentially the same thing: that the Patriots are far different from the rest of the NFL on the play per fumbles metric. Point 2 is the visual equivalent of the mathematics of Point 4.

Point 3 sounds different but it really isn't. Points 2 and 4 say the Patriots don't fumble much. But dome teams fumble less because they play indoors; thus, their presence in the analysis makes the Patriots advantage (a non-dome team) less pronounced. Thus, in constructing Point 3, Sharp removed dome teams. It's the same data, viewed from a different lens.

Repetitively stating the same statistic does not make an argument. I'm not saying Sharp should not have performed these steps. I'd have done many of these analyses myself. But they play the role of quality control. The reiterations don't strengthen the argument, and they sound a bit like Sunday morning talk shows.

## Recent Comments