A number of readers sent me Warren Sharp's piece about the ongoing New England Patriots' deflate-gate scandal (link to Slate's version of this) so I suppose I should say something about it. For those readers who are not into American football, the Superbowl is soon upon us. New England, one of the two finalists, has been accused of using footballs that are below the weight requirements on the rulebook, hence "deflate-gate".
The Slate piece is a good example of the brand of data journalism that is possible in today's world where anyone can get a hold of a lot of data. The quality of the analysis is above average as far as these pieces go. I like the use of different visualizations to understand the variability of the plays per fumble across teams. It's also clear to me that the histogram is easily the best of the bunch.
The chart is just missing a label: the one team standing far right is the New England Patriots.
When reading these pieces, pay attention to the structure of the statistical argument. Here is how I would summarize the argument:
1. New England is an outlier in the plays per fumbles lost metric, performing far better than any other team (1.8x above league average).
2. Different ways of visualizing and re-stating the metric yield the same conclusion that New England is the outlier.
3. There is a dome effect of about 10 plays per total fumbles, meaning that teams who play indoors ("dome") typically suffer 10 fewer fumbles than teams who play outdoors ("non-dome"). New England is an outdoor team that performs better than most dome teams on the plays per total fumble metric. If dome teams are removed from the analysis, New England is an outlier.
4. Assuming that the distribution of the metric by team is a bell curve, the chance that New England could have achieved such an extraordinary level of play per fumbles lost is extremely remote.
5. Therefore, it is "nearly impossible" for any team to have the New England type ability to prevent fumbles... unless the team is cheating.
Actually, Sharp to his credit did not argue Point 5. If you jump to the end of the article, all he said was the extreme value of New England's plays per fumble performance is not random fluctuation. And that there are many possible explanations, including both legitimate and illegitimate ones. So not a lot of bang for the buck of running the analysis.
However, any reader of the Slate article, especially an anti-Patriots fan, will be tempted to cite these statistics and actually conclude that the Patriots cheated. All he/she requires is some story about how deflating footballs reduce the chance of fumbling the football. Notice I say story because I don't think we have any scientific theories connecting the two, yet.
This is a good example of the limitation of data analysis, including Big Data analysis. However suggestive, the data could not prove guilt. In this case, it is hard to find data to deny that the Patriots couldn't achieve the result in a legal way.
In fact, the "dome" analysis for me weakens rather than strengthens the argument. Sharp switched from plays per fumbles lost to plays per total fumbles, the difference being fumbles that were recovered by the fumbling team. Since I can't understand how deflating the football could help the Patriots recover more fumbles after the ball hits the ground, I perfer plays per total fumbles as the measure. Given this measure, the Patriots is not an outlier at all, and is second to the Falcons--only when Sharp removed all dome teams (the Falcons being one) could he make the argument that the Patriots is the outlier. But this tells me there are legitimate ways to perform equally or slightly better than the Patriots did-just look at the Falcons.
The only true proof of cheating is if New England is caught red-handed. Show me the record of the weights of the footballs and you have a believer. Show me someone who saw the footballs being deflated. Similarly, drug testing (statistics) did not nail Lance Armstrong (I wrote about it a lot); what finally wrongfooted him were eye-witness evidence and law enforcement investigations.
Also, this is an example of what Andrew Gelman has been calling "reverse causation" problems (link). We learn that New England did spectacularly on a metric, and we want to know what caused it. This is the opposite structure from an A/B test where we vary some causes, and observe how the variations affect an outcome. The reverse causation problem is one of the big issues of the Big Data era that isn't getting enough attention.
As this post is dragging, I will leave other comments to Part 2. An issue I have with the outlined statistical argument is that Points 1, 2, 3 and 4 are essentially re-stating the same thing four times.