A number of readers sent me Warren Sharp's piece about the ongoing New England Patriots' deflate-gate scandal (link to Slate's version of this) so I suppose I should say something about it. For those readers who are not into American football, the Superbowl is soon upon us. New England, one of the two finalists, has been accused of using footballs that are below the weight requirements on the rulebook, hence "deflate-gate".
The Slate piece is a good example of the brand of data journalism that is possible in today's world where anyone can get a hold of a lot of data. The quality of the analysis is above average as far as these pieces go. I like the use of different visualizations to understand the variability of the plays per fumble across teams. It's also clear to me that the histogram is easily the best of the bunch.
The chart is just missing a label: the one team standing far right is the New England Patriots.
***
When reading these pieces, pay attention to the structure of the statistical argument. Here is how I would summarize the argument:
1. New England is an outlier in the plays per fumbles lost metric, performing far better than any other team (1.8x above league average).
2. Different ways of visualizing and re-stating the metric yield the same conclusion that New England is the outlier.
3. There is a dome effect of about 10 plays per total fumbles, meaning that teams who play indoors ("dome") typically suffer 10 fewer fumbles than teams who play outdoors ("non-dome"). New England is an outdoor team that performs better than most dome teams on the plays per total fumble metric. If dome teams are removed from the analysis, New England is an outlier.
4. Assuming that the distribution of the metric by team is a bell curve, the chance that New England could have achieved such an extraordinary level of play per fumbles lost is extremely remote.
5. Therefore, it is "nearly impossible" for any team to have the New England type ability to prevent fumbles... unless the team is cheating.
Actually, Sharp to his credit did not argue Point 5. If you jump to the end of the article, all he said was the extreme value of New England's plays per fumble performance is not random fluctuation. And that there are many possible explanations, including both legitimate and illegitimate ones. So not a lot of bang for the buck of running the analysis.
However, any reader of the Slate article, especially an anti-Patriots fan, will be tempted to cite these statistics and actually conclude that the Patriots cheated. All he/she requires is some story about how deflating footballs reduce the chance of fumbling the football. Notice I say story because I don't think we have any scientific theories connecting the two, yet.
This is a good example of the limitation of data analysis, including Big Data analysis. However suggestive, the data could not prove guilt. In this case, it is hard to find data to deny that the Patriots couldn't achieve the result in a legal way.
In fact, the "dome" analysis for me weakens rather than strengthens the argument. Sharp switched from plays per fumbles lost to plays per total fumbles, the difference being fumbles that were recovered by the fumbling team. Since I can't understand how deflating the football could help the Patriots recover more fumbles after the ball hits the ground, I perfer plays per total fumbles as the measure. Given this measure, the Patriots is not an outlier at all, and is second to the Falcons--only when Sharp removed all dome teams (the Falcons being one) could he make the argument that the Patriots is the outlier. But this tells me there are legitimate ways to perform equally or slightly better than the Patriots did-just look at the Falcons.
The only true proof of cheating is if New England is caught red-handed. Show me the record of the weights of the footballs and you have a believer. Show me someone who saw the footballs being deflated. Similarly, drug testing (statistics) did not nail Lance Armstrong (I wrote about it a lot); what finally wrongfooted him were eye-witness evidence and law enforcement investigations.
Also, this is an example of what Andrew Gelman has been calling "reverse causation" problems (link). We learn that New England did spectacularly on a metric, and we want to know what caused it. This is the opposite structure from an A/B test where we vary some causes, and observe how the variations affect an outcome. The reverse causation problem is one of the big issues of the Big Data era that isn't getting enough attention.
***
As this post is dragging, I will leave other comments to Part 2. An issue I have with the outlined statistical argument is that Points 1, 2, 3 and 4 are essentially re-stating the same thing four times.
I looked at some of the data factors: the players in NE and their records of fumbling and the number of fumbles by other players at the running back position. I didn't check receivers.
What I saw is that NE appears to choose lower quality, more ball secure running backs. All the better running backs in the league fumbled much more often. To compare, just meant getting a list and typing each name into the career stats page. Their last true top quality rusher, Corey Dillon, fumbled at the typical rate for top running backs. Their lead back for the last seasons, Stevan Ridley - who was injured much of this season - actually fumbled at the same rate as other top running backs, but he's played fewer years so his totals are lower.
I also checked the small number of times an NE player played elsewhere. I mention this to show the poor quality of some of the additional analysis I've read about this: more than once I've seen that Danny Woodhead's plays per fumbles have dropped since leaving NE. Problem is he fumbled one time in NE and one time in San Diego as a running back (and 1 time each place as a receiver) but played fewer years in SD. He's played fewer years because he's been there less time and because he's been hurt (and now he'll play less because he's also getting older).
In fact, if you go through the numbers, what jumps out is the analysis is basically that Benjarvus Green-Ellis didn't fumble as a Patriot. Replace this one guy with a better running back who fumbles more often and you get no meaningful difference in plays to fumbles. Interestingly, Green-Ellis was signed as a free agent by Cincinnati and asked to take the top running back role and started to fumble. Not a lot but some and this turned out to be the end of his career because he didn't play in 2014, suggesting a guy without top back quality who loses his special ball security ability is toast. (There may be a difference in roles that's hard to quantify: in NE, he mostly ran the ball on safe plays when the team was ahead or when ball security was a major emphasis, but that would require figuring out how many of which type of play, etc.) If the entire argument is about this one guy, that's not an argument at all.
Posted by: jonathan | 01/27/2015 at 11:19 AM
jonathan: That's an interesting angle to investigate causes other than cheating. It is often more helpful to ask the more general question: is there a continuum of risk taking among running backs? (or are there types of running backs that result in more or less risk as measured by total fumbles?)
I also wonder about types of play calls. Another possibility is that NE calls less risky running plays; this is analogous to QBs who are not asked to take any risk (e.g. Alex Smith) and they rack up the plays with few turnovers.
Finally, it's always about the one guy. The question is whether that one guy is special, or that one guy could be any guy.
That's why this is a great example of the challenges of doing reverse causation analysis.
Posted by: junkcharts | 01/27/2015 at 12:19 PM
Can we revisit one point. "...the extreme value of New England's plays per fumble performance is not random fluctuation."
It needs to be emphasized that there is no such thing as "random fluctuation". We have done a great disservice in our mathematics education, trying to tell people that there is some magical "chance" that things happen. Random events are simply events whose causes are unknown (as you describe as to what *caused* this measure).
Briggs says it very well: "It is to assume murky, occult causes are at work, pushing variables this way and that so that they behave properly. To say about a proposition X that “X is normal” is to ascribe to X a hidden power to be “normal” (or “uniform” or whatever). It is to say that dark forces exist which cause X to be normal, that X somehow knows the values it can take and with what frequency ... This is all incoherent. Each and every grade Sally received was caused, almost surely by a myriad of things, probably too many for us to track. But suppose each grade was caused by one thing and the same thing. If we knew this cause, we would know the value of x; x would be deduced from our knowledge of the cause. "
http://wmbriggs.com/post/14656
Posted by: Nate | 02/05/2015 at 10:16 AM
Nate: I didn't get into that issue because in this case, the plays per fumble statistic is an average and thus, by Law of Large Numbers, it will be distributed normally. Briggs is talking about assuming normality on the population. That said, I'm ok with Briggs's statements: (a) residual error is often random error, but could be better called unexplained variance; and (b) nothing is distributed exactly as modeled. But I consider (a) not a common fallacy among people who practice statistics; and (b) true but inconsequential when the probability model is appropriate.
Posted by: junkcharts | 02/05/2015 at 07:50 PM