Gabe Murray wrote to Andrew Gelman, asking for comments about the accusations hurled at the current Tour de France front-runner Chris Froome. He said:
This post by VeloClinic has been getting a lot of media attention in the past few days, within the context of Chris Froome's dominant performance in the Tour de France:
The assumptions seem very dubious to me, and I would love to see a critique of their methods.
Andrew forwarded the email to me. I wrote about steroids testing and Tour de France in my first book Numbers Rule Your World. The gist was that those tests fail to capture most dopers, which of course is not a controversial comment today. But pre the Armstrong confession, it used to be people believed in lots of things.
I clicked on the link, and read through the nicely-written summary of how they estimated the probability of doping, and wrote back to Gabe:
It looks like a reasonable way of proceeding to me. When you say "The assumptions seem very dubious to me, and I would love to see a critique of their methods.", what do you mean exactly? Are you unhappy with the method or are you unhappy with the assumptions (such as the prevalence of doping, the effect of doping, etc.)?
He replied:
The part of the analysis that seemed most dubious to me was the estimate of the prevalence of doping in the peloton today, estimated at 25%. The authors seemed to think that was a conservative estimate, but it sounds very high to me. If it was 10 or 15 years ago, sure, I think the number would be even higher than that. But given the monitoring that has been put into place by UCI since then, including the biological passport and year-round monitoring, if the number of riders doping was nearly 25% then I would expect that the number of positive doping tests would be much higher than it is. But of course it's hard to know, and perhaps the doping is just very sophisticated micro-dosing.
Also, for the distribution of W/kg, the parameters of the Gaussian distribution for clean riders (shown in the first plot) suggest that there is a very narrow range of W/kg values for clean riders in the peloton. I don't know enough about sports physiology to know whether this is true, which is part of why I was looking for critical analysis from others. I would have expected the standard deviation to be larger.
Finally, I wonder if the Gaussian distribution is even appropriate here when modelling the distribution of W/kg values for the most elite athletes in the world.
That was a nice deconstruction of the model used for the prediction. Two normal distributions are postulated, one for clean riders and one for dopers. Then you mix those distributions.
Now my response to Gabe's critique.
Generally speaking, this type of conversation falls into the same vein as the "deflategate" in NFL. Statistical modeling allows us to put a probability estimate on the chance that there is foul play. Note that it is an estimate, based on a model of the world. Further, note that the cheaters are adversaries, and they will hide their tracks, and most analyses just plain fail to take that into account.
All models can be challenged on the grounds that you don't like the structure of the model. I personally don't like that kind of critique unless someone can offer an alternative model that both sides agree to be better.
On the specifics about the prevalence of doping, I wrote:
This is the issue of the "prior" in Bayesian statistics. In theory, you can run sensitivity analysis to see how the 25% assumption changes things.
In reality, I don't think it matters. Their model is that the bell curve shifts upwards for dopers (which I think is reasonable). This means that if you focus on the extreme tail of the combined distribution, it is always going to be the case that the majority of those extreme people would come from the doping distribution.
Later, Gabe confirmed what I just said, as he played with the spreadsheet from VeloClinic:
Based on your comments, I should clarify that I am not opposed to attempts to estimate the probability of doping given power, and I accept that all models are approximations. I do think it's appropriate to question whether the parameters of a model are sensible, and I gave some specific examples that I thought could be improved. Also, I am not at all opposed to them using W/kg as a proxy measure.
Playing around with their provided spreadsheet ... does confirm that changing the prior does not have a big impact. However, changing the standard deviation of the clean rider distribution does.
To which I responded:
I think the question that gets to the core of this is: what should a reasonable model predict to be the chance of doping given extreme levels of performance? When you get into the extremes, it seems to me any reasonable model would say the same thing this model is saying - that the more extreme the performance, the more likely the athlete has doped. I don't think statistical modeling can give you the kind of answers you are looking for.
I'd add that given the revelations around Armstrong, we should be modeling: the more naturally gifted a performer, the more likely he/she dopes. People say the opposite before but they were wrong. Because of this, it is even more difficult to separate out doping from talent!
In the email exchange, I didn't comment on the standard deviation bit that Gabe raised. By increasing the standard deviation of the clean riders, you are explicitly allowing more extreme performers who are clean so what he said sounded right. In this scenario, I'd have shifted the dopers proportional to standard deviations.
I say there is great progress because reporters no longer believe the story that if you pass dozens of tests, you must be clean!
[I should add that Gabe later pointed out that it is no longer clear whether Froome was an outlier or not on that proxy metric for performance.]
Recent Comments