Cleanup job for dirty oil
Weekend reading

Yellow, green and polka dots

Reader Joran recalled our feature of Tour de France bumps charts, made then by Kraig, and he decided to make his own for this year's tour. (He typically blogs about Nordic skiing.)

Here are some highlights:

Tdf2010a You'd notice a similar pattern in 2010 as in 2007. The yellow jersey pretty much stays in the front of the pack throughout... the green jersey (sprints) eventually fades away while the polka dots jersey (mountains) improves as the tour continues.

From the design perspective, one decision concerns whether the colored lines track the jersey or track the current owner of the jersey. Over the course of the tour, jersey change owners, possibly multiple times. What to do?




  Notice that the top of the chart slopes downwards, and that is due to withdrawals of riders during the course of the race.

In the second chart, Joran brings this out by tracking each withdrawn rider until the stage they dropped out, and we can see their then ranks when they faltered.

This shows good use of foreground/background to bring out aspects of the data. In the original post, when you mouse on the red dots, a label appears showing the name of the rider.



 In this next chart, a small multiples format is adopted, with the riders from each team plotted together and each team in a separate plot.  This allows us to see the relative performance easily. Joran tried using one plot, and many colors -- and not surprisingly, discovered that the resulting chart is unreadable. The small multiples format is a solution to this problem.

As someone not too familiar with the race, I find the high variance of the ranking within each team to be unexpected. Can't explain why this would be. In particular, even when a team (Saxobank) has a highly ranked cyclist, it's interesting that the other members of the team are much lower ranked. I thought that team members try to cluster together and protect the team leader. Well, you may be able to make more sense out of this than I can.

I think these charts are ranked alphabetically by the name of the team -- I'd order them by the rank of the leading cyclist of each team.


Another improvement is to label the stages as Mountain vs. Sprint. This can be done by coloring the column for the respective stage... sort of like those economic charts where they color the periods of recession. This helps explain what we are seeing, why some riders achieve drastic improvements (or reductions) in ranks over some stages.


What is clear is that having domain knowledge is an important asset to making good charts. Research is key. This is something Joran also realized, and it's useful to read his commentary about the issues of interpreting the data, being able to recognize typos, etc.



Re: ranking variance within the team: this is exactly what I would expect from team members protecting the team leader. They work to protect the team leader stage-by-stage *without regard for their own overall ranking*. So the people who are good in the mountains will not do anything heroic on the sprint stages, and vice versa; then on the stages that are their specialty, when the team leader needs their skills, they'll step up and help him out, while the other specialties hang back. A chart of the stage-by-stage results should probably show that the team leader is never far from his teammates, but it's a different set of teammates depending on the stage profile.

Rob Meekings

It would be interesting to colour the line segments in the small multiples chart where riders are wearing one of the jerseys. This could then be used to look for correlations between, for example, the sprint jersey and leader's jersey in the early stages, etc.



Ranks alone are by far not enough to understand what is going on within the Tour, though one interesting perspective to start with.
Take a look at this post to get all three views:
(i) stage times
(ii) cumulative times
(iii) ranks

There is also a link to the data and the software that allows you to create these charts on your own and select the features you are interested in.

I am about to add the rider types (i.e., climber, helper, etc.) which will be a nice additional insight, which I will post in a special after the Tour finished.
(Btw, you will find visualizations and data of the last 5 Tours there as well.)


Cool plots, Martin! Lots of different ways to look at stage race data like this...I used to rest day today to go back and grab data for this year's Giro! Should be fun...

The comments to this entry are closed.