Made in France stereotypes

France is on my mind lately, as I prepare to bring my dataviz seminar to Lyon in a couple of weeks.  (You can still register for the free seminar here.)

The following Made in France poster brings out all the stereotypes of the French.


(You can download the original PDF here.)

It's a sankey diagram with so many flows that it screams "it's complicated!" This is an example of a graphic for want of a story. In a Trifecta Checkup, it's failing in the Q(uestion) corner.

It's also failing in the D(ata) corner. Take a look at the top of the chart.


France exported $572 billion worth of goods. The diagram then plots eight categories of exports, ranging from wines to cheeses:


Wine exports totaled $9 billion which is about 1.6% of total exports. That's the largest category of the eight shown on the page. Clearly the vast majority of exports are excluded from the sankey diagram.

Are the 8 the largest categories of exports for France? According to this site, those are (1) machinery (2) aircraft (3) vehicles (4) electrical machinery (5) pharmaceuticals (6) plastics (7) beverages, spirits, vinegar (8) perfumes, cosmetics.

Compare: (1) wines (2) jewellery (3) perfume (4) clothing (5) cheese (6) baked goods (7) chocolate (8) paintings.

It's stereotype central. Name 8 things associated with the French brand and cherry-pick those.

Within each category, the diagram does not show all of the exports either. It discloses that the bars for wines show only $7 of the $9 billion worth of wines exported. This is because the data only capture the "Top 10 Importers." (See below for why the designer did this... France exports wine to more than 180 countries.)

Finally, look at the parade of key importers of French products, as shown at the bottom of the sankey:


The problem with interpreting this list of countries is best felt by attempting to describe which countries ended up on this list! It's the list of countries that belong to the top 10 importers of one or more of the eight chosen products, ordered by the total value of imports in those 8 categories only but only including the value in any category if it rises to the top 10 of the respective category.

In short, with all those qualifications, the size or rank of the black bars does not convey any useful information.


One feature of the chart that surprised me was no flows in the Wine category from France to Italy or Spain. (Based on the above discussion, you should realize that no flows does not mean no exports.) So I went to the Comtrade database that is referenced in the poster, and pulled out all the wine export data.

How does one visualize where French wines are going? After fiddling around the numbers, I came up with the following diagram:


I like this type of block diagram which brings out the structure of the dataset. The key features are:

  • The total wine exports to the rest of the world was $1.4 billion in 2016
  • Half of it went to five European neighbors, the other half to the rest of the world
  • On the left half, Germany took a third of those exports; the UK and Switzerland together is another third; and the final third went to Belgium and the Netherlands
  • On the right half, the countries in the blue zone accounted for three-fifths with the unspecified countries taking two-fifths.
  • As indicated, the two-fifths (in gray) represent 20% of total wine exports, and were spread out among over 180 countries.
  • The three-fifths of the blue zone were split in half, with the first half going to North America (about 2/3 to USA and 1/3 to Canada) and the second half going to Asia (2/3 to China and 1/3 to Japan)
  • As the title indicates, the top 9 importers of French wine covered 80% of the total volume (in litres) while the other 180+ countries took 20% of the volume

 The most time-consuming part of this exercise was finding the appropriate structure which can be easily explained in a visual manner.



Visualizing movements of people

Long-time reader Daniel L. sends in this chart illustrating a large data set of intra-state migration flows in the U.S. The original chart is at Vizynary by way of Daily Kos.



There is no denying that this chart is beautiful to look at. But what is its message? That there are people migrating from and to every state? (assuming all fifty states are present)

Daily Kos describes how one can hover over any state to see its individual patterns. Something like this:


This is a great way, perhaps the only way, to consume the chart. Essentially, the reader is asked to generate a small-multiples panel of charts. The chart does a better job at showing the pairs of states between which people migrate than at showing the relative size of the flows. The size of the flows is coded in the width of the arcs. The widths are too similar to tell apart; and it doesn't help that no legend is provided.

The choice of color is curious. Each region of the country is its own color, in a "nominal" way. It is a design decision to emphasize regions.

Another decision is to hide information on the distances of the migrations. Evidently, the designer sacrificed that information in order to create the neat circular arrangement of states.

A shortcoming of this representation is one missing dimension: the direction of the flow. I'm not sure given any pair of states A and B, whether the net migration is into A or into B.


I propose a solution using the map while preserving the interactive element of the original.

On this map, when you hover over a particular state, it highlights all other states for which there are migrations flows into or out of that state. For color, use a blue-white-red scheme with blue indicating net inflow, red indicating net outflow, and white for near-zero flows. Include a legend.

Another important decision for the designer is absolute versus relative scales. In an absolute scheme, you rank the entire set of flows for all pairs of states; obviously, the resulting colors would be influenced by the state populations. Alternatively, you rank the flow sizes within each state; in this case, the smaller states will feel exaggerated.

The map has the additional advantage of showing the approximate distance (and direction) moved, which, for me, is a useful piece of information.

New but is it better?

Conventionally, the bracket in a sports tournament is presented like this (link):


In the Euro 2012 that's happening right now, the group stage is followed by the knockout stage (quarter-, semi- and final).

The knockout stage is pretty straightforward. The group stage presents some challenges because it's difficult to present the chronology together with the team standing at the same time.


The official site of Euro 2012 has an innovative "Tournament Map" that is an attempt to improve upon the traditional design. (link)


I have mixed feelings about this presentation. It's easier to get a sense of how each team performed chronologically over the course of the competition. But then, I can't figure out what day the winner of a quarterfinal would play in the semifinal.

Ron Paul confuses the charts

Andrew Sullivan (link) re-printed this grouped column chart showing the result of a Washington Post-ABC poll on how voters say they would react to Ron Paul running as an independent candidate in next year's U.S. presidential election.


One aspect of this chart bothers me... depending on one's familiarity with the election politics, the need to read carefully both the titles at the bottom of the chart, and the legend, and possibly also the title of the chart (or the knowledge that the Republican wears red and Democrat blue) in order to orient onself. You can experiment by blocking out one or two of these three items.

Here's the same chart with a small number of fixes. Printing the legend onto the bars themselves makes the data more readable. This change necessitates flipping the columns over to horizontal bars. There are pros and cons to using a stacked chart versus a grouped chart.


Neither of these charts answer the burning question in the reader's mind, which is likely to be from whom would Paul take his votes. The key message from above is that the insertion of Paul is projected to make the identity of the Republican candidate irrelevant. The following flow chart emphasizes the shift in votes as opposed to the vote totals.


It appears that the Others/Undecided voters who can still swing the election do not consider Ron Paul as a desirable alternative. Most of Ron Paul's supporters would come from voters who would have cast their votes for the Republican or Democratic candidate (by a ratio of 3 Republican votes to 1 Democratic vote if Romney is running, or 3 to 2 if Gingrich is running).

Lost in complexity

Felix Salmon (link) and others linked to this BBC News graphic about European debt recently.



At first sight, the use of arrows inside a ring, enhanced by an interactive filter by country, seems to be an inspired idea.

Then, I started clicking. Here is the German view.


According to the paragraph beneath the headline, the arrows show how much money is owed by each country to banks in other nations. So, it appears that German banks have lentborrowed about equal amounts tofrom France and Italy, and also good amounts tofrom the U.S., the U.K. and Japan. And German banks would be affected if these debtors were to default.

Now, take a look at the right column where BBC tells us "The biggest European economy is exposed to Greek, Irish and Portuguese, but mostly, Spanish debt." Say what?

Much more important than appearance, the designer must ensure that the data and conclusions make sense. Here, the chart doesn't support the discussion.


See also my previous post about Europe debt. 



Reading the landscape

Here are some posts I find worth reading on other graphics blogs:

Nick has done wonderful work on the evolution of the rail industry in the U.S., with a flow chart showing how mergers have produced the four giants of today, as well as a small multiples of maps showing how they split up the country.

A lovely feature of the flow chart is the use of red lines to let readers see at a glance that Union Pacific is the only rail company that has lasted the entire 4 decades, while the other 3 giants came into being within the last 20 years.

On the maps, notice a slight inconsistency between the left and right columns: on the right side, both maps have the same set of anchor cities, which act as "axes" to help readers compare the maps; on the left side, the sets of anchor cities are not identical. It would also be interested to see a version with all four route maps superimposed and differentiated by color. That may bring out the competitive structure better.


Georgette has a nice post summarizing issues with picking colors when producing charts. Her blog is called Moved by Metrics.


Meanwhile, Martin finds a shockingly poor pie chart here.


There was a time where you'd find the kind of heatmaps featured here by Nathan as wallpaper in my office. It's a great visualization tool for exploring temporal patterns in large data sets. However, I'd never even think of putting these in a presentation.  It's a starting point, not an end-point, of an analysis project. Some things are wonderful for consumption only in private!






False promises of equality and structure

(Here's something especially for those like me who are stuck in their homes in the Northeast USA this weekend.)

A few readers weren't impressed by Nielsen's presentation of the smartphone marketplace:


This chart type is very popular, both among business consultants and statisticians. Consultants call them "marimekko charts" while statisticians call them "mosaic charts". It's got multiple names as it has been reinvented multiple times. I have nightmares from having to produce this sort of charts in Powerpoint by hand (deconstructing and reconstructing column charts), and I have written before about my dislike of them (see here, and here).


Supporters point to two advantages of this type of chart:

  • Equality: it puts the two dimensions of the market place -- operating system/software, and producer/brand -- on equal footing. As an added bonus, the areas of the rectangles are meaningful: they correspond to the relative market shares.
  • Structure: the chart often reveals interesting aspects of the structure of the data. For instance, here it shows that certain smartphones have "closed" systems where the OS and producer forms a one-to-one relationship while some producers like HTC makes phones with different operating systems.

A little thought exposes these as false promises.

The two dimensions are, in fact, not equal. Look at the one contiguous column for Apple versus two separate sections for HTC. In order to know the market share of HTC, the reader needs to do additions... in his/her head. While this is not so hard when HTC appears only twice, your reader would not be amused if HTC appears seven times on the same mosaic. It is a limitation of this chart type that one cannot get the column sections to be of one piece without destroying the one-piece structure of the row sections.

In addition, I don't think it is easy to compare the areas of fat rectangles versus narrow rectangles, or squares versus long strips, etc. On consulting style charts, you almost always find the entire data set printed, which is to say, this chart is rendered not self-sufficient. On statistical charts, you typically find axis labels; this is not much better because of the difficulty in estimating relative areas.

The extent to which one can learn the structure of the data is restricted by our ability to estimate and sum areas.


In the junkart version, I use a flow chart. Special attention is paid to expressing as clearly as possible the structure of the marketplace, thus the separate sections for the "open" versus "closed" systems, as well as the many-to-many relationships among the "open"-system players.


The thickness of the flows is proportional to the market shares. I added a few data points to anchor the scale. The two dimensions of the data are treated symmetrically.

There is also no need to startle readers with a kaleidoscope of colors so typical of marimekkos.






The cross-hairs of religions

Long-time reader Nick B. found this attractive flow chart.


The chart was produced by the Internet Monk blog. The data was culled from this report (PDF) by the Pew Forum.

The cross-hairs trumpet excitement but the reader is left without much. One could tell that the unaffiliated proportion (red) has more than doubled, mostly at the expense of Catholics (green); that most religions retain the vast majority of their faithful (at least by internal proportions); and that people of one or another faith  move to one or another faith.

Yet, any of these high-level insights do not require a chart that contains data on movement between each pair of religions.

One smart thing about this chart is the inclusion of "unaffiliated / no religion", which completes the picture; otherwise, some previously faithful people would drop off the chart (literally).

The other smart thing is its self-sufficiency: none of the data is printed on the chart, and I doubt readers miss them.


Here, I attempt an alternative, which is a variant of the Web of Debt chart discussed here.


Note the economy of colors, lines, etc. I have chosen to use the number of people with a particular childhood faith as the base for all the percentages; other bases can be selected. For example, the unaffiliated has grown by 144% of the childhood base, with about half of that growth coming from previous Protestants; meanwhile, an exodus of Catholics has occurred. (PS. the data for other faiths being incomplete in the aforementioned report, I made up some of the data so as to finish the chart.)

If the line thickness is made proportional to the percentages, that would eliminate the need to have all those numbers on the chart.

Untangling Europe's debt web

A number of blogs have hailed this NYT diagram/chart/infographic as "nice". The accompanying article is here.


Whether this is nice or not depends on what message you want to convey with this graphic. If it is entanglement, then yes, the graphic reveals the complexity very well. If one wants to understand the debt situation in Europe, then no, this chart doesn't make it clear at all.

From the perspective of someone wanting to dissect the debt web, an enhanced data table is hard to beat.


The first section looks at the interdependency between the five troubled countries, collectively known as PIIGS on Wall Street. The additional debt owed to Britain, Germany and France are shown below. Notice that the original chart does not treat these three countries the same way as PIIGS: we do not know what the values are of the arrows pointing from these three into PIIGS.


Expressed on per-capita terms, Ireland stands out as the worst of the bunch while the citizens of the other four countries are bearing roughly equal amounts of debt per person.


I tried to come up with something more "fun", as below:

Redo_eudebt_revHere, I opted to use a small multiples chart to split the countries. In so doing, I accepted redundancy in search of clarity. Each amount is plotted twice once as a borrowing (red line) and once as a lending (black arrow).

It is immediately clear why Greece is the most urgent issue.

Perhaps the chart type is not as important as the transformations I did to the data:

1) All amounts shown are "net" amounts between any pair of countries. In the original data, there are two arrows between each pair. For example, Italy owes Ireland $46 million but Ireland owes Italy $18 million; this means Italy owes Ireland $28 million net.

2) All amounts are expressed per capita. Since the populations of these countries vary from 4.5 million (Ireland) to 60 million (Italy), the total debt cannot and should not be compared to each other.

3) Not shown here but I also expressed the net amount lent/borrowed per dollar of GDP. This is another metric that makes sense. The nominal GDP of these countries range from $0.2 - $2 trillion. The PPP GDP has a similar range.

4) One item I did not fix is the currency. Given the fluctuation in exchange rate between the Euro and US$, I think it may be better to express all the numbers in Euros.

A next step would be to include Britain, Germany and France.

Reference: "In and Out of Each Other's European Wallets", New York Times, April 30, 2010.

PS. Reader JF pointed out an inconsistency in the numbers on the chart. I revised the chart to fix this issue. In the current chart, one can read the information as: the average Portuguese owes Spain $5,453, owes Italy $141, while having lent $903 to Greece and $1,561 to Ireland. Each chart can be interpreted from the perspective of the average citizen in that particular country. (For details, see the comments below.)