« August 2023 | Main | October 2023 »

Dataviz in camouflage

This subway timetable in Tokyo caught my eye:


It lists the departure times of all trains going toward Shibuya on Saturdays and holidays.

It's a "stem and leaf" plot.

The stem-and-leaf plot is a crude histogram. In this version, the stem is the hour of the day (24-hour clock) and the leaf is the minute (between 0 and 59). The longer the leaf, the higher the frequency of trains.

We can see that there isn't one peak but rather a plateau between hours 9 and 18.


Contrast this with the weekday schedule in blue:


We can clearly see two rush hours, one peak at hour 8 and a second one at hours 17-18.

Love seeing dataviz in camouflage!


What is the question is the question

I picked up a Fortune magazine while traveling, and saw this bag of bubbles chart.

Fortune_global500 copy

This chart is visually appealing, that must be said. Each circle represents the reported revenues of a corporation that belongs to the “Global 500 Companies” list. It is labeled by the location of the company’s headquarters. The largest bubble shows Beijing, the capital of China, indicating that companies based in Beijing count $6 trillion dollars of revenues amongst them. The color of the bubbles show large geographical units; the red bubbles are cities in Greater China.

I appreciate a couple of the design decisions. The chart title and legend are placed on the top, making it easy to find one’s bearing – effective while non-intrusive. The labeling signals a layering: the first and biggest group have icons; the second biggest group has both name and value inside the bubbles; the third group has values inside the bubbles but names outside; the smallest group contains no labels.

Note the judgement call the designer made. For cities that readers might not be familiar with, a country name (typically abbreviated) is added. This is a tough call since mileage varies.


As I discussed before (link), the bag of bubbles does not elevate comprehension. Just try answering any of the following questions, which any of us may have, using just the bag of bubbles:

  • What proportion of the total revenues are found in Beijing?
  • What proportion of the total revenues are found in Greater China?
  • What are the top 5 cities in Greater China?
  • What are the ranks of the six regions?

If we apply the self-sufficiency test and remove all the value labels, it’s even harder to figure out what’s what.



Moving to the D corner of the Trifecta Checkup, we aren’t sure how to interpret this dataset. It’s unclear if these companies derive most of their revenues locally, or internationally. A company headquartered in Washington D.C. may earn most of its revenues in other places. Even if Beijing-based companies serve mostly Chinese customers, only a minority of revenues would be directly drawn from Beijing. Some U.S. corporations may choose its headquarters based on tax considerations. It’s a bit misleading to assign all revenues to one city.

As we explore this further, it becomes clear that the designer must establish a target – a strong idea of what question s/he wants to address. The Fortune piece comes with a paragraph. It appears that an important story is the spatial dispersion of corporate revenues in different countries. They point out that U.S. corporate HQs are more distributed geographically than Chinese corporate HQs, which tend to be found in the key cities.

There is a disconnect between the Question and the Data used to create the visualization. There is also a disconnect between the Question and the Visual display.