The many-faced area chart is not usually your best choice
Why is this chart so damn hard to read?

A multidimensional graphic that holds a number of surprises, via NYT

The New York Times has an eye-catching graphic illustrating the Amtrak crash last year near Philadelphia. The article is here.

The various images associated with this article vary in the amount of contextual details offered to readers.

This graphic provides an overview of the situation:


Initially, I had a fair amount of trouble deciphering this chart. I was searching hard to find the contrast between the orange (labeled RECENT TRAINS) and the red (labeled TRAIN # 188). The orange color forms a wavy area akin to a river on a map. The red line segments suggest bridges that span the river bank. The visual cues kept telling me train #188 is a typical train but that conclusion was obviously wrong.

The confusion went away after I read the next graphic:


This zoomed-in view offered some helpful annotation. The data came from three days of trains prior to the accident. Surprisingly, the orange band does not visualize a range of speeds. The width of the orange band fluctuates with the median speed over those three days. And then, the red line segments represent the speed of train #188 as it passed through specific points on the itinerary.

The key visual element to look for is the red lines exceeding the width of the orange band as train #188 rounds Frankford Junction.


In the second graphic, the speeding is more visible. But it can be made even more prominent. For example, instead of line segments, use the same curvy element to portray the speed of train #188. Then through line width or color, emphasize train #188 and push the average train to the background.


Notice that there is an additional line snaking through the middle of the orange band. The data have been centered around this line. This type of centering is problematic: the excess speed relative to the median train has been split into halves. The reader must mentally reassemble the halves. The impact of the speeding has therefore been artificially muted.

 In this next version, I keep that midpoint line and use it to indicate the median speed of the trains. Then, I show how train #188 diverged from the median speed as it neared the Junction.



 This version brings out one other confusing element of the original. This line that traces the median speed is also tracing the path of the train (geographically). Actually, the line does not encode speed--it just encodes the reference level of speed. The graphic above creates an impression that train #188 "ran off track" if the reader interprets the green line as a railroad track on a map. But it is off in speed, not in physical location.





Feed You can follow this conversation by subscribing to the comment feed for this post.

Adam Schwartz

It doesn't seem like the lead-up data to the derailment yields much information (although it reminds me of Minard's "Napoleon's March" graphic without all the valuable info). Train 188 pretty much stayed near the median speed of trains at every marked point - until it didn't. If the train was speeding at along the length of the journey then the accident would seem, well, less accidental. But as it is, if the operator really just lost a sense of where they were on the route it seems to me that's a lousy safeguard for proper speed control to have the operator "know" the course. We don't do that with automobile drivers; that's why we have speed limit signs.


I don't have anything to add other than that I am a regular reader of your blog and love it. Keep up the great work.


I think the way that I would really like to see this is with the first map, as is, aligned with a two charts, above or below, that map the speeds.

One, a standard line chart, with the median speed as one line, and train 188 as another. The x axis aligning roughly with the spatial layout of the route.

The other, a variance from median chart.

I think having the visual of the route is great, but I don't think the bands and lines give any good understanding of the speeds at all.

Something like this (very very very rough) mock-up:


Is median the best statistic to use here? It tells us that the train entered the bend faster than average, but not that it necessarily entered too quickly. Did other trains over the 3 days enter the bend at the same speed? Maybe that driver often approaches the bend at that speed and that's accounted for in the median by another driver always entering the bend extremely slowly.

It does show that before the bend, the train was travelling at the same speed as the average train and has gone in to the bend faster than you would expect based on its journey until that time.


To be honest I think these are awful. I would suggest a small multiples approach. I think the map is good, but I might do one map coloured by speed for the typical train and put another one for the accident train next to it.

The comments to this entry are closed.