Second Dataviz Workshop Soon to Start, and Feedback from First Workshop
Getting the basics right is half the battle

A great visual of complicated schedules

Reader Joe D. tipped me about a nice visualization project by a pair of grad students at WPI (link). They displayed data about the Boston subway system (i.e. the T).

The project has many components, one of which is the visualization of the location of every train in the Boston T system on a given day. This results in a very tall chart, the top of which I clipped:


I recall that Tufte praised this type of chart in one of his books. It is indeed an exquisite design, attributed to Marey. It provides data on both time and space dimensions in a compact manner. The slope of each line is positively correlated with the velocity of the train (I use the word correlated because the distances between stations are not constant as portrayed in this chart). The authors acknowledge the influence of Tufte in their credits, and I recognize a couple of signatures:

  • For once, I like how they hide the names of the intermediate stations along each line while retaining the names of the key stations. Too often, modern charts banish all labels to hover-overs, which is a practice I dislike. When you move the mouse horizontally across the chart, you will see the names of the unnamed stations.
  • The text annotations on the right column are crucial to generating interest in this tall, busy chart. Without those hints, readers may get confused and lost in the tapestry of schedules. If you scroll to the middle, you find an instance of train delay caused by a disabled train. Even with the hints, I find that it takes time to comprehend what the notes are saying. This is definitely a chart that rewards patience.

Clicking on a particular schedule highlights that train, pushing all the other lines into the background. The side panel provides a different visual of the same data, using a schematic subway map.


 Notice that my mouse is hovering over the 6:11 am moment (represented by the horizontal guide on the right side). This generates a snapshot of the entire T system shown on the left. This map shows the momentary location of every train in the system at 6:11 am. The circled dot is the particular Red Line train I have clicked on before.

This is a master class in linking multiple charts and using interactivity wisely.


You may feel that the chart using the subway map is more intuitive and much easier to comprehend. It also becomes very attractive when the dots (i.e., trains) are animated and shown to move through the system. That is the image that project designers have blessed with the top position of their Github page.

However, the image above allows us to  see why the Marey diagram is the far superior representation of the data.

What are some of the questions you might want to answer with this dataset? (The Q of our Trifecta Checkup)

Perhaps figure out which trains were behind schedule on a given day. We can define behind-schedule as slower than the average train on the same route.

It is impossible to figure this out on the subway map. The static version presents a snapshot while the dynamic version has  moving dots, from which readers are challenged to estimate their velocities. The Marey diagram shows all of the other schedules, making it easier to find the late trains.

Another question you might ask is how a delay in one train propagates to other trains. Again, the subway map doesn't show this at all but the Marey diagram does - although here one can nitpick and say even the Marey diagram suffers from overcrowding.


On that last question, the project designers offer up an alternative Marey. Think of this as an indiced view. Each trip is indiced to its starting point. The following setting shows the morning rush hour compared to the rest of the day:


 I think they can utilize this display better if they did not show every single schedule but show the hourly average. Instead of letting readers play with the time scale, they should pre-compute the periods that are the most interesting, which according to the text, are the morning rush, afternoon rush, midday lull and evening lull.

The trouble with showing every line is that the density of lines is affected by the frequency of trains. The rush hours have more trains, causing the lines to be denser. The density gradient competes with the steepness of the lines for our attention, and completely overwhelms it.


There really is a lot to savor in this project. You should definitely spend some time reviewing it. Click here.

Also, there is still time to sign up for my NYU chart-making workshop, starting on Saturday. For more information, see here.


Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.