Read fast, pay the price
Structuring a chart

Knowledge transfer

Graphs are indispensable if one is to make sense of large data sets.  Kraig W. pointed us to some of the "bump charts" he made of the 2007 Tour de France, and indeed they are quite powerful.  (Because of the amount of data, you'd need to see the pop-up image to make sense of it.)

Tourdefrance2007

As someone who only has cursory knowledge of the Tour, I learnt a lot from this graph alone.  The chart traced the ranking of each rider through 20 stages in the competition.

  • Roughly, I am aware that the winner wears the yellow jersey so the yellow line traces the progress of the eventual champion.  I also know that the green jersey has something to do with sprinting and so I surmise from this chart that the sprint stages are close to the beginning of the tour and the best sprinter either lost interest or faded away over the course of the race.
  • At least the green jersey winner didn't bail out of the race.  Another thing we see is that about 180 riders started on day "0" and about 140 finished the tour.  (The hash marks on the right play a crucial role here.)
  • The bailout lines (that shoot to the skies) should be removed because the same information is provided in the gradual step-down of the lines.  Not least because these "explosions" are very ugly.
  • Especially intriguing to me is that variance in effect on ranking of different stages.  Some stages like 1, 2, 5 and 6 pretty much preserved the ranks.  However, stages like 5, 7 and 8 resulted into wholesale redistribution of ranks: not small changes either.  Is this tactical movement dictated by the teams or stage-specific influence?
  • Then, for stages 9 and 14, only the front half of the ranks were shaken.  Stage 13 also stood out: here, almost everyone shifted ranks but only by a little.
  • I'm not sure what the pre-Tour ranking comes from (stage -1).  The Tour organizers certainly did not reference those.
  • I'd imagine that if different teams were plotted with different colors, we may see team tactics in motion.

Would it have been better to plot the "lag times from the leader" rather than ranks?  Hard to say.  Plotting time differentials will tell us more as ranks remove the magnitude information.  However, it can cause the chart to look even more messy.

Graphs are efficient in transferring knowledge.  Imagine having to stare at a large table of rankings instead!

Source: BikeTechReview.com, KDUBlog, July 30 2007.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Hadley Wickham

There are more examples of using parallel coordinates for the Tour de France at http://statisticalgraphics.blog.com/, including ranking, time per stage and cumulative time.

Michael S.

The bailout lines do help show that the eventual winner owes his position somewhat to the disqualification of riders ahead of him (stages 15 & 16--did it happen twice? I can't remember) but there must be a more elegant way to indicate this.

derek

I'm a little surprised that the convention is for rank 1 to be at the bottom of the chart. I would have had the high ranks high.

JB

1. The green jersey is the "points" leader. Points are given for reaching sprint finishes first, second, third, etc. These finishes can be at the end of a stage or somewhere during the middle.

2. Sprinters are typically not good climbers. As soon as the race hits the mountains, the green jersey holder is just trying not to get eliminated.

3. The complete redistribution of rankings is usually due to the stage or its format, not a team maneuver. The prologue (Stage 0) and Stage 13 were individual time trials (riders leave 1 or two minutes apart for a relatively short ride). Stage 5 was the first climbing stage. 7, 8, and 9 were in the Alps. 14-16 were the Pyrenees.

4. "Bailing out" can be because someone "crashes out" or they arrive at the finish too late and are then DQ'ed, or they just quit. It depends on the stage, but you have to finish within a certain percentage of the stage winner's time (eg. 150%).

5. I think the bail-out lines are useful because they indicate the position of the riders who bailed. The most important bail-out was after Stage 16, when the race leader was pulled from the Tour by his team after allegations he had dodged off-season drug testing.

Steve

I agree that the bail-out lines are useful and important, but I fear that they might be constructed artificially by setting each bailing-out rider's next rank to some out-of-range value like 1000 for the stage after the rider bails out. That may be why they are not parallel. I think it might be better to simply remove the skyrocketing bail-out lines and mark the end point of each of those rider's lines with say a red dot or something to indicate where they were ranked when they left the Tour. Otherwise a very fascinating and educational chart.

Alpha Chen

Speaking of graphs being efficient in trasferring knowledge, you should check out this great talk at TED which made amazing use of graphs and data:

http://www.youtube.com/watch?v=hVimVzgtD6w

zbicyclist

You are only eligible for the green jersey if you finish the race, so the green jersey winner has to stay in by definition -- you don't get the jersey if you drop out, even if you have the most points.

derek

Does anybody know where to find a reasonable data set of "General Classification" rank for each stage? The examples I got from a brief googling session were highly inconvenient, and mostly did not even include ranks lower than ten or fifteen.

9.2.5

Might be interesting to include a small second graph along the X axis showing the profile of each stage - or even simply icons for stage type - time trial (individual race v. clock), mountain, or flat. It could lend some quick explanation to which stages are the ones that re-organized the GC.

Might also be fun to explore how to show which stage(s) the eventual jersey winners wore the jersey during the race and showing stage winners - perhaps both could be done by varying line weights?

Stat

Hi,

I blog about statistics too. I've linked to you.

Hopefully you can link back.

Thanks,
Stat

dark_angel

hi there these web site is wack

jb

zbicyclist:
VeloNews has the stage-by-stage results.

http://www.velonews.com/tour2007/results/

derek

jb, I expect you meant me, not zbicyclist (it's a confusing effect of the design of the comments section that there is a line between the comment and the commenter, but only white space between the previous commenter and the next comment.

It was velonews's list that I extracted in the end, but I have to say their professional consistency was appalling; the results seem to have been created anew by a different reporter every day, and they each had a different idea about how the list should be arranged, and what the names of the competitors were. Get a database you guys!

Still, this is what I produced after some data cleaning: graph of Tour de France 2007

* I reversed the order of the ranks, with the leader at the top instead of the bottom
* I marked the riders who quit or were canned by a blue marker instead of the shooting lines. It's now much easier to count them and note their rank at the stage they left.
* I labelled the stages with flat, mountain etc.
* I named the three winners, and one loser

Unusually, I chose a dark background, because I couldn't think of an alternative to the iconic convenience of yellow line=yellow jersey.

At ~350Kb, the graphic is a much larger file size than I normally use, because of the complexity of all those lines in a raster image. I'm not really sure they add much value per kilobyte: the rest of the graph would be not bad even by itself, and take up many fewer kilobytes. But if your display technology is high res and kilobytes don't cost much, then there's no harm, I suppose, and Tufte would approve :-)

Kaiser

Derek: excellent effort! I also like 9.2.5's idea of combining it with a "relief map" of the course. This addition reminds me of the famous Napoleon Russian campaign map.

The comments to this entry are closed.