Data democracy

Tiger tiger

Picked up the Metro paper the other day and found them ventilating about the possibility that Tiger Woods used steroids; the news was that a Canadian doctor he (and other professional athletes) hired has been caught with HGH and drug equipment. In the section on why Tiger couldn't be doping, the following chart appeared:


According to this line of argument, since steroids should improve driving distances, and since driving distance determines overall performance, the fact that his average driving distance "remained almost constant throughout the years" proved that he did not dope.

Now, I have no idea if he dopes or not.  But this particular argument is full of holes.  In the modern era, steroids are used not just for enhancing brute strength but also shortening recovery times, prolonging training, etc.  Also, it holds only if overall performance is heavily affected by driving distance.

The bar chart has multiple problems:

  • The choice of starting the vertical scale at 250 is completely arbitrary, and as been shown before, cutting off the bottoms of bars is a bad idea -- the lengths of the remaining parts are no longer proportional to the stated data.
  • The choice of the three years is also unexplained, especially when 2001 is not in the middle of 1997 and 2009. 
  • The horizontal gridlines are totally redundant since all three numbers sit in the very last section (290-300).  

Why were those three years chosen?  The following line chart that plots all the data may give us a clue:


The choice of 2001 and 2009 means we missed the peak of his driving distance performance.  Looking at the standardized units, we see that at its peak, the driving distance was about 2.6 times the standard deviation above his career average (the zero line using the scale on the right). 

The difference between 1997 and the peak was about 20, which looked large compared to the standard deviation of 6 over this entire period. Establishing a reference point is very important to interpreting any observed difference.

This is one of the few occasions where double axes can be recommended.  The two axes in fact plot the same data, only reflecting a difference in scale.

Reference: "Three reasons to believe he's totally clean", Metro USA, Dec 16 2009.


Feed You can follow this conversation by subscribing to the comment feed for this post.

Jon Peltier

If you select your data carefully, you can let yourself believe anything.

The comments to this entry are closed.