« August 2007 | Main | October 2007 »

Buffer time

As this report from the Department of Transportation makes clear, congestion on our roadways causes travellers to add "buffer time" to their planned journeys.  So, for instance, one may have to allocate 32 minutes for a trip that would have taken 20 minutes in uncongested traffic if one would like to guarantee on-time arrival.  The 12 minutes would either become time spent sitting on the road or wasted time due to arriving too early.

Buffer time can be applied to graphs too.  Some graphs require readers to spend time fishing out the information.  The chart used to illustrate travel time belongs to this category. 
Dottraveltime_2The clock analogy fails; in fact, it confuses matters as the hour hand just sits there serving no purpose.  The buffer time between staring and comprehending is too much!

Only four numbers underly this chart: travel time when uncongested and buffer time to guarantee on-time arrival, for 1982 and 2001.  The following version gets to the point without fuss. 
RedotraveltimeIt shows that the travel time increased significantly even under uncongested traffic; worse, the buffer time multiplied.

Reducing buffer time is always good but some buffer time may be inevitable.  In the traffic analogy, to eliminate all buffer time would mean lots of unused capacity.  In the context of graphs, more complicated charts would require more time; the key is whether the reader is rewarded for the time spent figuring out the chart.

Source: "Traffic Congestion and Reliability", Department of Transportation.

Structuring a chart

Nytmpg This chart from the NYT was intended to show how the EPA has moved the bar on vehicle mileage ratings: 2008 estimates were lower than 2007 estimates across the board, regardless of manufacturer, model and city/highway.

The chart was built from one basic component, repeated for each model. 
Nytmpgsm_2I like the discreet gridlines (the white ticks) which enable readers to count off the mileage ratings.

The data is rich: ratings were given along three dimensions (model, year of estimate and city/highway).  Readers can benefit from a stronger guidance in where to look for the most pertinent information.  As the chart stands, it is merely a container for the data.  It fails our self-sufficiency test: all the data were printed on the chart, and the bars add little.

In the junkart version, I use knowledge of the data to structure the chart. First, noting that sedans, hybrids and trucks/SUVs/minvans have different levels of mileage ratings, I clustered the models into three groups.  Secondly, the city and highway ratings were separated into two columns as I consider the between-model comparisons more important than city-highway comparisons. 
RedompgThe chart is a dot plot, with a vertical tick for 2007 estimates and a dot for 2008 estimates.  It's easy to see that all dots sit to the left of vertical ticks.

More subtly, we can also see that the hybrids appeared to have been penalized more.  Or perhaps, the higher the rating, the larger the downward adjustment...

Source: "Mileage Ratings Are Still Estimates, Though Closer to Reality", New York Times, Sept 16 2007.

Knowledge transfer

Graphs are indispensable if one is to make sense of large data sets.  Kraig W. pointed us to some of the "bump charts" he made of the 2007 Tour de France, and indeed they are quite powerful.  (Because of the amount of data, you'd need to see the pop-up image to make sense of it.)


As someone who only has cursory knowledge of the Tour, I learnt a lot from this graph alone.  The chart traced the ranking of each rider through 20 stages in the competition.

  • Roughly, I am aware that the winner wears the yellow jersey so the yellow line traces the progress of the eventual champion.  I also know that the green jersey has something to do with sprinting and so I surmise from this chart that the sprint stages are close to the beginning of the tour and the best sprinter either lost interest or faded away over the course of the race.
  • At least the green jersey winner didn't bail out of the race.  Another thing we see is that about 180 riders started on day "0" and about 140 finished the tour.  (The hash marks on the right play a crucial role here.)
  • The bailout lines (that shoot to the skies) should be removed because the same information is provided in the gradual step-down of the lines.  Not least because these "explosions" are very ugly.
  • Especially intriguing to me is that variance in effect on ranking of different stages.  Some stages like 1, 2, 5 and 6 pretty much preserved the ranks.  However, stages like 5, 7 and 8 resulted into wholesale redistribution of ranks: not small changes either.  Is this tactical movement dictated by the teams or stage-specific influence?
  • Then, for stages 9 and 14, only the front half of the ranks were shaken.  Stage 13 also stood out: here, almost everyone shifted ranks but only by a little.
  • I'm not sure what the pre-Tour ranking comes from (stage -1).  The Tour organizers certainly did not reference those.
  • I'd imagine that if different teams were plotted with different colors, we may see team tactics in motion.

Would it have been better to plot the "lag times from the leader" rather than ranks?  Hard to say.  Plotting time differentials will tell us more as ranks remove the magnitude information.  However, it can cause the chart to look even more messy.

Graphs are efficient in transferring knowledge.  Imagine having to stare at a large table of rankings instead!

Source: BikeTechReview.com, KDUBlog, July 30 2007.

Read fast, pay the price

At first, this looks like a decent chart despite the donut construct, which I cannot stand (but the Economist loves).


The accompanying text proclaimed: "Rock stars are famous for excess, and some pay the price".  The rest of the paragraph points out drug- and alcohol-related deaths, plus deaths due to "unhealthy lifestyles", which apparently include cancer and cardiovascular disease.

There is a gaping hole between what's on the chart and what's in the text.  They just talk past each other.

  • The chart invites us to compare the European experience to the American experience. Each donut presents the proportion of total deaths by causes of death. The top donut presents American rock-star deaths, the bottom European ones. But this comparison has zilch to do with the key point, which is how rock stars are different from the rest of us.  The chart tells us nothing about the rest of us.  The 20% death by cancer would be entirely unremarkable if 20% of non-rock-star deaths also were attributed to cancer!
  • We must also bear in mind that the base populations are rock stars who died young. This is a very specific demographic segment, and so the only valid point of reference are people who died young.  If we think along those lines, then among unmusical people, if they died young, what might have been the causes of death?  Drugs? Alcohol?  Accidents?  Suicide?  You bet.  I am not sure who is the authoritative source of such data but the CDC reported that among Americans aged 15-34 who died, the leading causes were "unintentional injury", suicides, homicides, cancer and heart disease.  Not much different from the above list...
  • The deaths depicted in the two donuts totaled fewer than 100, and yet percentages are given to one decimal place.  This creates a false sense of precision not justified by the sample size.
  • The deaths occurred over about 50 years.  It is very likely that the causes of premature death have shifted during this time span, making an aggregate analysis questionable.

Charting is much more than just aesthetics.  Some basic statistical common sense goes a long way.  This was observed long ago by Huff.

Source: "Rock stars: live fast, die young", Economist, Sept 4 2007.