« April 2016 | Main | June 2016 »

A multidimensional graphic that holds a number of surprises, via NYT

The New York Times has an eye-catching graphic illustrating the Amtrak crash last year near Philadelphia. The article is here.

The various images associated with this article vary in the amount of contextual details offered to readers.

This graphic provides an overview of the situation:


Initially, I had a fair amount of trouble deciphering this chart. I was searching hard to find the contrast between the orange (labeled RECENT TRAINS) and the red (labeled TRAIN # 188). The orange color forms a wavy area akin to a river on a map. The red line segments suggest bridges that span the river bank. The visual cues kept telling me train #188 is a typical train but that conclusion was obviously wrong.

The confusion went away after I read the next graphic:


This zoomed-in view offered some helpful annotation. The data came from three days of trains prior to the accident. Surprisingly, the orange band does not visualize a range of speeds. The width of the orange band fluctuates with the median speed over those three days. And then, the red line segments represent the speed of train #188 as it passed through specific points on the itinerary.

The key visual element to look for is the red lines exceeding the width of the orange band as train #188 rounds Frankford Junction.


In the second graphic, the speeding is more visible. But it can be made even more prominent. For example, instead of line segments, use the same curvy element to portray the speed of train #188. Then through line width or color, emphasize train #188 and push the average train to the background.


Notice that there is an additional line snaking through the middle of the orange band. The data have been centered around this line. This type of centering is problematic: the excess speed relative to the median train has been split into halves. The reader must mentally reassemble the halves. The impact of the speeding has therefore been artificially muted.

 In this next version, I keep that midpoint line and use it to indicate the median speed of the trains. Then, I show how train #188 diverged from the median speed as it neared the Junction.



 This version brings out one other confusing element of the original. This line that traces the median speed is also tracing the path of the train (geographically). Actually, the line does not encode speed--it just encodes the reference level of speed. The graphic above creates an impression that train #188 "ran off track" if the reader interprets the green line as a railroad track on a map. But it is off in speed, not in physical location.




The many-faced area chart is not usually your best choice

I found this chart about the exploding U.S. debt levels in ZeroHedge (link), sourced from Citibank.

Citi debt total

The top line story is pretty easy to see: total debt levels have almost reached the peak of the 1930s. (Ignore that dreadful labeling of the years on the horizontal axis.)

Now, the three colors supposedly carry further insights related to the components of the debt. The problem is it is very hard to figure out which component(s) are responsible for the debt explosion. The choice of the area chart adds to our trouble.

Here are two other area charts that display the same three data series.


Just look at the yellow patch. The left chart gives the wrong impression of steep growth, refuted by the right chart. For the three data series, there are six unique area charts that one can produce!

The following smoothed line chart gives an accurate picture of the relative changes in levels of the debt components:

Government debt was the primary driver of the exploding debt both in the 1930s and in the present era. The other debt components also rose but not quite as much. All data series are converted into indices, with 1920 as the reference year.

A scatter plot with connecting lines sometimes produces a more visual portrayal of "home-coming" although in this case, I am not sure the advantage is not clear.


This chart requires more attentive reading. It does make the point that by 2015, the level of government debt has exceeded the previous peak (1950) while the other two debt components are fast reaching the prior peak (1934).



Super-informative ping-pong graphic

Via Twitter, Mike W. asked me to comment on this WSJ article about ping pong tables. According to the article, ping pong table sales track venture-capital deal flow:


This chart is super-informative. I learned a lot from this chart, including:

  • Very few VC-funded startups play ping pong, since the highlighted reference lines show 1000 deals and only 150 tables (!)
  • The one San Jose store interviewed for the article is the epicenter of ping-pong table sales, therefore they can use it as a proxy for all stores and all parts of the country
  • The San Jose store only does business with VC startups, which is why they attribute all ping-pong tables sold to these companies
  • Startups purchase ping-pong tables in the same quarter as their VC deals, which is why they focus only on within-quarter comparisons
  • Silicon Valley startups only source their office equipment from Silicon Valley retailers
  • VC deal flow has no seasonality
  • Ping-pong table sales has no seasonality either
  • It is possible to predict the past (VC deals made) by gathering data about the future (ping-pong tables sold)

Further, the chart proves that one can draw conclusions from a single observation. Here is what the same chart looks like after taking out the 2016 Q1 data point:


This revised chart is also quite informative. I learned:

  • At the same level of ping-pong-table sales (roughly 150 tables), the number of VC deals ranged from 920 to 1020, about one-third of the vertical range shown in the original chart
  • At the same level of VC deals (roughly 1000 deals), the number of ping-pong tables sold ranged from 150 to 230, about half of the horizontal range of the original chart

The many quotes in the WSJ article also tell us that people in Silicon Valley are no more data-driven than people in other parts of the country.

The surprising impact of mixing chart forms

At first glance, this Wall Street Journal chart seems unlikely to impress as it breaks a number of "rules of thumb" frequently espoused by dataviz experts. The inconsistency of mixing a line chart and a dot plot. The overplotting of dots. The ten colors...


However, I actually like this effort. The discontinuity of chart forms nicely aligns with the split between the actual price movements on the left side and the projections on the right side.

The designer also meticulously placed the axis labels with monthly labels for actual price movements and quarterly labels for projections.

Even the ten colors are surprisingly manageable. I am not sure we need to label all those banks; maybe just the ones at the extremes. If we clear out some of these labels, we can make room for a median line.


How good are these oil price predictions? It is striking that every bank shown is predicting that oil prices have hit a bottom, and will start recovering in the next few quarters. Contrast this with the left side of the chart, where the line is basically just tumbling down.

Step back six months earlier, to September 2015. The same chart looks like this:


 Again, these analysts were calling a bottom in prices and predicting a steady rise over the next quarters.

The track record of these oil predictions is poor:


The median analyst predicted oil prices to reach $50 by Q1 of 2016. Instead, prices fell to $30.

Given this track record, it's shocking that these predictions are considered newsworthy. One wonders how these predictions are generated, and how did the analysts justify ignoring the prevailing trend.