Unlocking the secrets of a marvellous data visualization

Scmp_coronavirushk_paperThe graphics team in my hometown paper SCMP has developed a formidable reputation in data visualization, and I lapped every drop of goodness on this beautiful graphic showing how the coronavirus spread around Hong Kong (in the first wave in April). Marcelo uploaded an image of the printed version to his Twitter. This graphic occupied the entire back page of that day's paper.

An online version of the chart is found here.

The data graphic is a masterclass in organizing data. While it looks complicated, I had no problem unpacking the different layers.

Cases were divided into imported cases (people returning to Hong Kong) and local cases. A small number of cases are considered in-betweens.


The two major classes then occupy one half page each. I first looked at the top half, where my attention is drawn to the thickest flows. The majority of imported cases arrived from the U.K., and most of those were returning students. The U.S. is the next largest source of imported cases. The flows are carefully ordered by continent, with the Americas on the left, followed by Europe, Middle East, Africa, and Asia.


Where there are interesting back stories, the flow blossoms into a flower. An annotation explains the cluster of cases. Each anther represents a case. Eight people caught the virus while touring Bolivia together.


One reads the local cases in the same way. Instead of flowers, think of roots. The biggest cluster by far was a band that played at clubs in three different parts of the city, infecting a total of 72 people.


Everything is understood immediately, without a need to read text or refer to legends. The visual elements carry that kind of power.


This data graphic presents a perfect amalgam of art and science. For a flow chart, the data are encoded in the relative thickness of the lines. This leaves two unused dimensions of these lines: the curvature and lengths. The order of the countries and regions take up the horizontal axis, but the vertical axis is free. Unshackled from the data, the designer introduced curves into the lines, varied their lengths, and dispersed their endings around the white space in an artistic manner.

The flowers/roots present another opportunity for creativity. The only data constraint is the number of cases in a cluster. The positions of the dots, and the shape of the lines leading to the dots are part of the playground.

What's more, the data visualization is a powerful reminder of the benefits of testing and contact tracing. The band cluster led to the closure of bars, which helped slow the spread of the coronavirus. 


The windy path to the Rugby World Cup

When I first saw the following chart, I wondered whether it is really that challenging for these eight teams to get into the Rugby World Cup, currently playing in Japan:


Another visualization of the process conveys a similar message. Both of these are uploaded to Wikipedia.


(This one hasn't been updated and still contains blank entries.)


What are some of the key messages one would want the dataviz to deliver?

  • For the eight countries that got in (not automatically), track their paths to the World Cup. How many competitions did they have to play?
  • For those countries that failed to qualify, track their paths to the point that they were stopped. How many competitions did they play?
  • What is the structure of the qualification rounds? (These are organized regionally, in addition to certain playoffs across regions.)
  • How many countries had a chance to win one of the eight spots?
  • Within each competition, how many teams participated? Did the winner immediately qualify, or face yet another hurdle? Did the losers immediately disqualify, or were they offered another chance?

Here's my take on this chart:



The ebb and flow of an effective dataviz showing the rise and fall of GE

Wsj_ebbflowGE_800A WSJ chart caught my eye the other day – I spotted someone looking at it in a coffee shop, and immediately got a hold of a copy. The chart plots the ebb and flow of GE’s revenues from the 1980s to the present.

What grabbed my attention? The less-used chart form, and the appealing but not too gaudy color scheme.

The chart presents a highly digestible view of the structure of GE’s revenues. We learn about GE’s major divisions, as well as how certain segments split from or merged with others over time. Major acquisitions and divestitures are also depicted; if these events are the main focus, the designer should find ways to make these moments stand out more.

An interesting design decision concerns the sequence of the divisions. One possible order is by increasing or decreasing importance, typically indicated by proportional revenues. This is complicated by the changing nature of the business over the decades. So financial services went from nothing to the largest division by far to almost disappearing.

The sequencing need not be data-driven; it can be design-constrained. The merging and splitting of business units are conveyed via linking arrows. Longer arrows are unsightly, and meshes of arrows are confusing.

On this chart, the long arrow pointing from the orange to the gray around 2004 feels out of place. What if the financial services block is moved to the right of the consumer block? That will significantly shorten the long arrow. It won’t create other entanglements as the media block is completely disjoint and there are no other arrows tying financial services to another division.



To improve readability, the bars are spaced out horizontally. The addition of whitespace distorts the proportionality. So, in 2001, the annotation states that financial services (orange) accounted for “about half of the revenues,” which is directly contradicted by the visual perception – readers find the orange bar to be clearly shorter than the total length of the other bars. This is a serious deficiency of the chart form but this chart conveys the "ebb and flow" very well.

The merry-go-round of investment bankers

Here is the start of my blog post about the chart I teased the other day:



Today's post deals with the following chart, which appeared recently at Business Insider (hat tip: my sister).

It's immediately obvious that this chart requires a heroic effort to decipher. The question shown in the chart title "How many senior investment bankers left their firms?" is the easiest to answer, as the designer places the number of exits in the central circle of each plot relating to a top-tier investment bank (aka "featured bank"). Note that the visual design plays no role in delivering the message, as readers just scan the data from those circles.

Anyone persistent enough to explore the rest of the chart will eventually discover these features...


The entire post including an alternative view of the dataset is a guest blog at the JMP Blog here. This is a situation in which plotting everything will make an unreadable chart, and the designer has to think hard about what s/he is really trying to accomplish.

Made in France stereotypes

France is on my mind lately, as I prepare to bring my dataviz seminar to Lyon in a couple of weeks.  (You can still register for the free seminar here.)

The following Made in France poster brings out all the stereotypes of the French.


(You can download the original PDF here.)

It's a sankey diagram with so many flows that it screams "it's complicated!" This is an example of a graphic for want of a story. In a Trifecta Checkup, it's failing in the Q(uestion) corner.

It's also failing in the D(ata) corner. Take a look at the top of the chart.


France exported $572 billion worth of goods. The diagram then plots eight categories of exports, ranging from wines to cheeses:


Wine exports totaled $9 billion which is about 1.6% of total exports. That's the largest category of the eight shown on the page. Clearly the vast majority of exports are excluded from the sankey diagram.

Are the 8 the largest categories of exports for France? According to this site, those are (1) machinery (2) aircraft (3) vehicles (4) electrical machinery (5) pharmaceuticals (6) plastics (7) beverages, spirits, vinegar (8) perfumes, cosmetics.

Compare: (1) wines (2) jewellery (3) perfume (4) clothing (5) cheese (6) baked goods (7) chocolate (8) paintings.

It's stereotype central. Name 8 things associated with the French brand and cherry-pick those.

Within each category, the diagram does not show all of the exports either. It discloses that the bars for wines show only $7 of the $9 billion worth of wines exported. This is because the data only capture the "Top 10 Importers." (See below for why the designer did this... France exports wine to more than 180 countries.)

Finally, look at the parade of key importers of French products, as shown at the bottom of the sankey:


The problem with interpreting this list of countries is best felt by attempting to describe which countries ended up on this list! It's the list of countries that belong to the top 10 importers of one or more of the eight chosen products, ordered by the total value of imports in those 8 categories only but only including the value in any category if it rises to the top 10 of the respective category.

In short, with all those qualifications, the size or rank of the black bars does not convey any useful information.


One feature of the chart that surprised me was no flows in the Wine category from France to Italy or Spain. (Based on the above discussion, you should realize that no flows does not mean no exports.) So I went to the Comtrade database that is referenced in the poster, and pulled out all the wine export data.

How does one visualize where French wines are going? After fiddling around the numbers, I came up with the following diagram:


I like this type of block diagram which brings out the structure of the dataset. The key features are:

  • The total wine exports to the rest of the world was $1.4 billion in 2016
  • Half of it went to five European neighbors, the other half to the rest of the world
  • On the left half, Germany took a third of those exports; the UK and Switzerland together is another third; and the final third went to Belgium and the Netherlands
  • On the right half, the countries in the blue zone accounted for three-fifths with the unspecified countries taking two-fifths.
  • As indicated, the two-fifths (in gray) represent 20% of total wine exports, and were spread out among over 180 countries.
  • The three-fifths of the blue zone were split in half, with the first half going to North America (about 2/3 to USA and 1/3 to Canada) and the second half going to Asia (2/3 to China and 1/3 to Japan)
  • As the title indicates, the top 9 importers of French wine covered 80% of the total volume (in litres) while the other 180+ countries took 20% of the volume

 The most time-consuming part of this exercise was finding the appropriate structure which can be easily explained in a visual manner.



Visualizing movements of people

Long-time reader Daniel L. sends in this chart illustrating a large data set of intra-state migration flows in the U.S. The original chart is at Vizynary by way of Daily Kos.



There is no denying that this chart is beautiful to look at. But what is its message? That there are people migrating from and to every state? (assuming all fifty states are present)

Daily Kos describes how one can hover over any state to see its individual patterns. Something like this:


This is a great way, perhaps the only way, to consume the chart. Essentially, the reader is asked to generate a small-multiples panel of charts. The chart does a better job at showing the pairs of states between which people migrate than at showing the relative size of the flows. The size of the flows is coded in the width of the arcs. The widths are too similar to tell apart; and it doesn't help that no legend is provided.

The choice of color is curious. Each region of the country is its own color, in a "nominal" way. It is a design decision to emphasize regions.

Another decision is to hide information on the distances of the migrations. Evidently, the designer sacrificed that information in order to create the neat circular arrangement of states.

A shortcoming of this representation is one missing dimension: the direction of the flow. I'm not sure given any pair of states A and B, whether the net migration is into A or into B.


I propose a solution using the map while preserving the interactive element of the original.

On this map, when you hover over a particular state, it highlights all other states for which there are migrations flows into or out of that state. For color, use a blue-white-red scheme with blue indicating net inflow, red indicating net outflow, and white for near-zero flows. Include a legend.

Another important decision for the designer is absolute versus relative scales. In an absolute scheme, you rank the entire set of flows for all pairs of states; obviously, the resulting colors would be influenced by the state populations. Alternatively, you rank the flow sizes within each state; in this case, the smaller states will feel exaggerated.

The map has the additional advantage of showing the approximate distance (and direction) moved, which, for me, is a useful piece of information.

New but is it better?

Conventionally, the bracket in a sports tournament is presented like this (link):


In the Euro 2012 that's happening right now, the group stage is followed by the knockout stage (quarter-, semi- and final).

The knockout stage is pretty straightforward. The group stage presents some challenges because it's difficult to present the chronology together with the team standing at the same time.


The official site of Euro 2012 has an innovative "Tournament Map" that is an attempt to improve upon the traditional design. (link)


I have mixed feelings about this presentation. It's easier to get a sense of how each team performed chronologically over the course of the competition. But then, I can't figure out what day the winner of a quarterfinal would play in the semifinal.

Ron Paul confuses the charts

Andrew Sullivan (link) re-printed this grouped column chart showing the result of a Washington Post-ABC poll on how voters say they would react to Ron Paul running as an independent candidate in next year's U.S. presidential election.


One aspect of this chart bothers me... depending on one's familiarity with the election politics, the need to read carefully both the titles at the bottom of the chart, and the legend, and possibly also the title of the chart (or the knowledge that the Republican wears red and Democrat blue) in order to orient onself. You can experiment by blocking out one or two of these three items.

Here's the same chart with a small number of fixes. Printing the legend onto the bars themselves makes the data more readable. This change necessitates flipping the columns over to horizontal bars. There are pros and cons to using a stacked chart versus a grouped chart.


Neither of these charts answer the burning question in the reader's mind, which is likely to be from whom would Paul take his votes. The key message from above is that the insertion of Paul is projected to make the identity of the Republican candidate irrelevant. The following flow chart emphasizes the shift in votes as opposed to the vote totals.


It appears that the Others/Undecided voters who can still swing the election do not consider Ron Paul as a desirable alternative. Most of Ron Paul's supporters would come from voters who would have cast their votes for the Republican or Democratic candidate (by a ratio of 3 Republican votes to 1 Democratic vote if Romney is running, or 3 to 2 if Gingrich is running).

Lost in complexity

Felix Salmon (link) and others linked to this BBC News graphic about European debt recently.



At first sight, the use of arrows inside a ring, enhanced by an interactive filter by country, seems to be an inspired idea.

Then, I started clicking. Here is the German view.


According to the paragraph beneath the headline, the arrows show how much money is owed by each country to banks in other nations. So, it appears that German banks have lentborrowed about equal amounts tofrom France and Italy, and also good amounts tofrom the U.S., the U.K. and Japan. And German banks would be affected if these debtors were to default.

Now, take a look at the right column where BBC tells us "The biggest European economy is exposed to Greek, Irish and Portuguese, but mostly, Spanish debt." Say what?

Much more important than appearance, the designer must ensure that the data and conclusions make sense. Here, the chart doesn't support the discussion.


See also my previous post about Europe debt. 



Reading the landscape

Here are some posts I find worth reading on other graphics blogs:

Nick has done wonderful work on the evolution of the rail industry in the U.S., with a flow chart showing how mergers have produced the four giants of today, as well as a small multiples of maps showing how they split up the country.

A lovely feature of the flow chart is the use of red lines to let readers see at a glance that Union Pacific is the only rail company that has lasted the entire 4 decades, while the other 3 giants came into being within the last 20 years.

On the maps, notice a slight inconsistency between the left and right columns: on the right side, both maps have the same set of anchor cities, which act as "axes" to help readers compare the maps; on the left side, the sets of anchor cities are not identical. It would also be interested to see a version with all four route maps superimposed and differentiated by color. That may bring out the competitive structure better.


Georgette has a nice post summarizing issues with picking colors when producing charts. Her blog is called Moved by Metrics.


Meanwhile, Martin finds a shockingly poor pie chart here.


There was a time where you'd find the kind of heatmaps featured here by Nathan as wallpaper in my office. It's a great visualization tool for exploring temporal patterns in large data sets. However, I'd never even think of putting these in a presentation.  It's a starting point, not an end-point, of an analysis project. Some things are wonderful for consumption only in private!