Made in France stereotypes

France is on my mind lately, as I prepare to bring my dataviz seminar to Lyon in a couple of weeks.  (You can still register for the free seminar here.)

The following Made in France poster brings out all the stereotypes of the French.

Made_in_france_small

(You can download the original PDF here.)

It's a sankey diagram with so many flows that it screams "it's complicated!" This is an example of a graphic for want of a story. In a Trifecta Checkup, it's failing in the Q(uestion) corner.

It's also failing in the D(ata) corner. Take a look at the top of the chart.

Madeinfrance_totalexports

France exported $572 billion worth of goods. The diagram then plots eight categories of exports, ranging from wines to cheeses:

Madeinfrance_exportcategories

Wine exports totaled $9 billion which is about 1.6% of total exports. That's the largest category of the eight shown on the page. Clearly the vast majority of exports are excluded from the sankey diagram.

Are the 8 the largest categories of exports for France? According to this site, those are (1) machinery (2) aircraft (3) vehicles (4) electrical machinery (5) pharmaceuticals (6) plastics (7) beverages, spirits, vinegar (8) perfumes, cosmetics.

Compare: (1) wines (2) jewellery (3) perfume (4) clothing (5) cheese (6) baked goods (7) chocolate (8) paintings.

It's stereotype central. Name 8 things associated with the French brand and cherry-pick those.

Within each category, the diagram does not show all of the exports either. It discloses that the bars for wines show only $7 of the $9 billion worth of wines exported. This is because the data only capture the "Top 10 Importers." (See below for why the designer did this... France exports wine to more than 180 countries.)

Finally, look at the parade of key importers of French products, as shown at the bottom of the sankey:

Madeinfrance_topimporters

The problem with interpreting this list of countries is best felt by attempting to describe which countries ended up on this list! It's the list of countries that belong to the top 10 importers of one or more of the eight chosen products, ordered by the total value of imports in those 8 categories only but only including the value in any category if it rises to the top 10 of the respective category.

In short, with all those qualifications, the size or rank of the black bars does not convey any useful information.

***

One feature of the chart that surprised me was no flows in the Wine category from France to Italy or Spain. (Based on the above discussion, you should realize that no flows does not mean no exports.) So I went to the Comtrade database that is referenced in the poster, and pulled out all the wine export data.

How does one visualize where French wines are going? After fiddling around the numbers, I came up with the following diagram:

Redo_jc_frenchwineexports

I like this type of block diagram which brings out the structure of the dataset. The key features are:

  • The total wine exports to the rest of the world was $1.4 billion in 2016
  • Half of it went to five European neighbors, the other half to the rest of the world
  • On the left half, Germany took a third of those exports; the UK and Switzerland together is another third; and the final third went to Belgium and the Netherlands
  • On the right half, the countries in the blue zone accounted for three-fifths with the unspecified countries taking two-fifths.
  • As indicated, the two-fifths (in gray) represent 20% of total wine exports, and were spread out among over 180 countries.
  • The three-fifths of the blue zone were split in half, with the first half going to North America (about 2/3 to USA and 1/3 to Canada) and the second half going to Asia (2/3 to China and 1/3 to Japan)
  • As the title indicates, the top 9 importers of French wine covered 80% of the total volume (in litres) while the other 180+ countries took 20% of the volume

 The most time-consuming part of this exercise was finding the appropriate structure which can be easily explained in a visual manner.

 

 


Big Macs in Switzerland are amazing, according to my friend

Bigmac_chNote for those in or near Zurich: I'm giving a Keynote Speech tomorrow morning at the Swiss Statistics Meeting (link). Here is the abstract:

The best and the worst of data visualization share something in common: these graphics provoke emotions. In this talk, I connect the emotional response of readers of data graphics to the design choices made by their creators. Using a plethora of examples, collected over a dozen years of writing online dataviz criticism, I discuss how some design choices generate negative emotions such as confusion and disbelief while other choices elicit positive feelings including pleasure and eureka. Important design choices include how much data to show; which data to highlight, hide or smudge; what research question to address; whether to introduce imagery, or playfulness; and so on. Examples extend from graphics in print, to online interactive graphics, to visual experiences in society.

***

The Big Mac index seems to never want to go away. Here is the latest graphic from the Economist, saying what it says:

Econ_bigmacindex

The index never made much sense to me. I'm in Switzerland, and everything here is expensive. My friend, who is a U.S. transplant, seems to have adopted McDonald's as his main eating-out venue. Online reviews indicate that the quality of the burger served in Switzerland is much better than the same thing in the States. So, part of the price differential can be explained by quality. The index also confounds several other issues, such as local inflation and exchange rate

Now, on to the data visualization, which is primarily an exercise in rolling one's eyeballs. In order to understand the red and blue line segments, our eyes have to hop over the price bubbles to the top of the page. Then, in order to understand the vertical axis labels, unconventionally placed on the right side, our eyes have to zoom over to the left of the page, and search for the line below the header of the graph. Next, if we want to know about a particular country, our eyes must turn sideways and scan from bottom up.

Here is a different take on the same data:

Redo_jc_econbigmac2018

I transformed the data as I don't find it compelling to learn that Russian Big Macs are 60% less than American Big Macs. Instead, on my chart, the reader learns that the price paid for a U.S. Big Mac will buy him/her almost 2 and a half Big Macs in Russia.

The arrows pointing left indicate that in most countries, the values of their currencies are declining relative to the dollar from 2017 to 2018 (at least by the Big Mac Index point of view). The only exception is Turkey, where in 2018, one can buy more Big Macs equivalent to the price paid for one U.S. Big Mac. compared to 2017.

The decimal differences are immaterial so I have grouped the countries by half Big Macs.

This example demonstrates yet again, to make good data visualization, one has to describe an interesting question, make appropriate transformations of the data, and then choose the right visual form. I describe this framework as the Trifecta - a guide to it is here.

(P.S. I noticed that Bitly just decided unilaterally to deactivate my customized Bitly link that was configured years and years ago, when it switched design (?). So I had to re-create the custom link. I have never grasped  why "unreliability" is a feature of the offering by most Tech companies.)


Some Tufte basics brought to you by your favorite birds

Someone sent me this via Twitter, found on the Data is Beautiful reddit:

Reddit_whichbirdspreferwhichseeds_sm

The chart does not deliver on its promise: It's tough to know which birds like which seeds.

The original chart was also provided in the reddit:

Reddit_whichbirdswhichseeds_orig_sm

I can see why someone would want to remake this visualization.

Let's just apply some Tufte fixes to it, and see what happens.

Our starting point is this:

Slide1

First, consider the colors. Think for a second: order the colors of the cells by which ones stand out most. For me, the order is white > yellow > red > green.

That is a problem because for this data, you'd like green > yellow > red > white. (By the way, it's not explained what white means. I'm assuming it means the least preferred, so not preferred that one wouldn't consider that seed type relevant.)

Compare the above with this version that uses a one-dimensional sequential color scale:

Slide2

The white color still stands out more than necessary. Fix this using a gray color.

Slide3

What else is grabbing your attention when it shouldn't? It's those gridlines. Push them into the background using white-out.

Slide4

The gridlines are also too thick. Here's a slimmed-down look:

Slide5

The visual is much improved.

But one more thing. Let's re-order the columns (seeds). The most popular seeds are shown on the left, and the least on the right in this final revision.

Slide6

Look for your favorite bird. Then find out which are its most preferred seeds.

Here is an animated gif to see the transformation. (Depending on your browser, you may have to click on it to view it.)

Redojc_birdsseeds_all_2

 

PS. [7/23/18] Fixed the 5th and 6th images and also in the animated gif. The row labels were scrambled in the original version.

 


Foodies say, add dataviz spice please

This Buzzfeed article proves that foodies love their food served with dataviz (tip: Chris P.). Menus are an undertapped resource when it comes to data visualization.

There are several examples worth discussing.

Buzzfeed-venn-menu

Venn diagrams are not easy to read, people.

Plus they are hard to construct well... note the asymmetric areas.

Here is one without circles:

Jc_redo_vennmenu_1

Then, I pared it down to its essence:

Jc_redo_vennmenu_2

***

This beer map is pretty great:

Buzzfeed-beer-menu

Some of its virtues:

  • The spacious layout utilizing two dimensions, instead of a one-dimensional list of dense text
  • Ordering using two dimensions relevant to the decision problem (assuming those two dimensions are the most important for their clients)
  • Unconventional, attention-grabbing
  • More equitable: different readers will read the chart in different orders. I'll hypothesize that they will end up with a more even distribution of drink orders than with a list in which everyone reads top to bottom

Potential problems:

  • Not enough space to explain the drinks. Don't the clients want to know what's in them?
  • I wonder how they measured the degree of "classic"-ness.

***

This next menu contains an error:

Buzzfeed-coffee-menu

When the drink comes in one size, only one price is listed. If it comes in two sizes, two prices should be listed.

Is the cafe owner shading Americans as not good at math?


Fantastic visual, but the Google data need some pre-processing

Another entry in the Google Newslab data visualization project that caught my eye is the "How to Fix It" project, illustrating search queries across the world that asks "how." The project web page is here.

The centerpiece of the project is an interactive graphic showing queries related to how to fix home appliances. Here is what it looks like in France (It's always instructive to think about how they would count "France" queries. Is it queries from google.fr? queries written in French? queries from an IP address in France? A combination of the above?)

Howtofixit_france_appliances

I particularly appreciate the lack of labels. When we see the pictures, we don't need to be told this is a window and that is a door. The search data concern the relative sizes of the appliances. The red dotted lines show the relative popularity of searches for the respective appliances in aggregate.

By comparison, the Russian picture looks very different:

Howtofixit_russia_appliances

Are the Russians more sensible? Their searches are far and away about the washing machine, which is the most complicated piece of equipment on the graphic.

At the bottom of the page, the project looks at other queries, such as those related to cooking. I find it fascinating to learn what people need help making:

Howtofixit_world_cooking

I have to confess that I searched for "how to make soft boiled eggs". That led me to a lot of different webpages, mostly created for people who search for how to make a soft boiled egg. All of them contain lots of advertising, and the answer boils down to cook it for 6 minutes.

***

The Russia versus France comparison brings out a perplexing problem with the "Data" in this visualization. For competitive reasons, Google does not provide data on search volume. The so-called Search Index is what is being depicted. The Search Index uses the top-ranked item as the reference point (100). In the Russian diagram, the washing machine has Search Index of 100 and everything else pales in comparison.

In the France example, the window is the search item with the greatest number of searches, so it has Search Index of 100; the door has Index 96, which means it has 96% of the search volume of the window; the washing machine with Index 49 has about half the searches of the window.

The numbers cannot be interpreted as proportions. The Index of 49 does not mean that washing machines account for 49% of all France queries about fixing home appliances. That is really the meaning of popularity we want to have but we don't have. We can obtain true popularity measures by "normalizing" the Search Index: just sum up the Index Values of all the appliances and divide the Search Index by the sum of the Indices. After normalizing, the numbers can be interpreted as proportions and they add up to 100% for each country. When not normalized, the indices do not add to 100%.

Take the case in which we have five appliances, and let's say all five appliances are equally popular, comprising 20% of searches each. The five Search Indices will all be 100 because the top-ranked item is given the value of 100. Those indices add to 500!

By contrast, in the case of Russia (or a more extreme case), the top-ranked query is almost 100% of all the searches, so the sum of the indices will be only slightly larger than 100.

If you realize this, then you'd understand that it is risky to compare Search Indices across countries. The interpretation is clouded by how much of the total queries accounted for by the top query.

In our Trifecta Checkup, this is a chart that does well in the Question and Visual corners, but there is a problem with the Data.

 

 


Hog wild about dot maps

Reader Chris P. sent me this chart.

This was meant to be "light entertainment." See the Twitter discussion below.

9gag_hogsmap

***

Let's think a bit about the dot map as a data graphic.

Dot maps are one dimensional. The dot's location is used to indicate the latitude and longitude and therefore the x,y coordinates cannot encode any other data. If we have basically a black/white chart, as in this hog map, the dot can only encode binary data (yes/no).

The legend says "each dot represents 5,000 hogs." Think about how that statement applies to these scenarios:

  • Do you expect to see something different between the dot representing 4,200 and the one showing 4,900?
  • Do you expect to see something different between the dot representing 400 and 4,000?
  • Do you expect to see something different between the location with 4,800 hogs and 9,600 hogs?


Based on the legend, the designer would need two dots to represent 10,000 hogs. But those two dots pertain to the same location. Sometimes, "jitter" is added, and the two dots are placed side by side. However, with the scale of the map of the U.S., and the dots representing seemingly small neighborhoods, jitter creates more confusion than anything. Also, what about 3, 4, 5, .. dots in the same location?

 9gag_hogmap_inset

Looking at the details above, are the dots jittered or do they represent neighboring locations?

Sometimes, colors are used to encode data on a dot map. But each dot can only contain one color, so it only typically shows the top category in each location.

Dot maps are very limited. Think before you use them.

 


The less-is-more story, and its meta

The Schwab magazine has an interesting discussion of a marketing research study purportedly showing "less is more" when it comes to consumer choice. They summarized the experimental setup and results in the following succinct graphic:

Schwab jam displays - Jun 4 2017 - 3-45 PM - p3

The data consist of nested proportions. For example, among those seeing display 1, 60% stopped to look at the jams, and among those who stopped, 3% purchased.

The nesting is presented as overlap in this design. The blue figures on pink are those shoppers who stopped as well as purchased. The blue figures with no background are those who stopped but did not purchase. The blue figures disregarding background color include everyone who stopped. What about the gray? Those are the shoppers who did not stop at the jam display, which is not a key number. To understand what proportion of shoppers stopped, the reader must take in the entire set of figures, in effect giving the blue and blue/pink figures a change of clothes.

***

In this version, we make it easier to estimate the proportions:

Redo_schwab_jams

Each branch starts with 100 figures. The nesting structure is clearly depicted.

***

It turns out that the original design messed up the numbers. They were trying to be precise. The right side (Display 2) had 29 figures on each row, summing to 260, exactly the number of subjects in that treatment cell. The left side had 28 figures per row (one fewer!), summing to 233. However, according to the research paper being cited, they analyzed 242 subjects who saw Display 1. Nine shoppers went missing.

The extra precision, even if correctly rendered, interferes with our comprehension of proportions. Less is more, indeed!

***

P.S. If you know someone interested in upgrading their skills to join the expanding business analytics workforce, send them to my new venture, Principal Analytics Prep, a next-gen bootcamp that helps people transition careers. Contact me for more information.


Layered donuts have excess fats and oils

Via Twitter, Nicholas S. sent this chart:

Usda_donutchart

It's a layered donut. There isn't much context here except that the chart comes from USDA. Judging from the design, I surmise that the key message is the change in proportion by food groups between 1970 and 2014. I am assuming that these food groups are exhaustive so that it makes sense to put them in a donut chart, with all pieces adding up to 100%.

The following small-multiples line chart conveys most of the information:

Redo_usdadonutchart_jc

The story is the big jump in "Added fats and oils".  In the layered donut, the designer highlighted this by a moire effect, something to be avoided.

Note the parenthetical 2010 next to the Added fats and oils label. The data for all other food groups come from 2014 but the number for the most important category is four years older. The chart would be more compelling if they used 2010 data for everything.

One piece of information is ostensibly absent in the line chart version - the growth in the size of the pie. The total of the data increased about 20% from 1970 to 2014. In theory, the layered donut can convey this growth by the perimeters of the circles. But it doesn't appear that the designer saw this as an important insight since the total area of the outer donut is clearly more than 20% of the area of the inner donut.

 


A quick lesson in handling more than one messages on one chart

Between teaching two classes, and a seminar, and logging two coast-to-coast flights, I was able to find time to rethink the following chart from the Wall Street Journal: (link to article)

Uk_drinks

I like the right side of this chart, which helps readers interpret what the alcohol consumption guidelines really mean. When we go out and drink, we order beers, or wine, or drinks - we don't think in terms of grams of alcohol.

The left side is a bit clumsy. The biggest message is that the UK has tightened its guidelines. This message is delivered by having U.K. appear twice in the chart, the only country to repeat. In order to make this clear, the designer highlights the U.K. rows. But the style of highlighting used for the two rows differs, because the current U.K. row has to point to the right side, but not the previous U.K. row. This creates a bit of confusion.

In addition, since the U.K. rows are far apart, figuring out how much the guidelines have changed is more work than desired.

The placement of the bars by gender also doesn't help. A side message is that most countries allow men to drink more than women but the U.K., in revising its guidelines, has followed Netherlands and Guyana in having the same level for both genders.

***

After trying a few ideas, I think the scatter plot works out pretty well. One advantage is that it does not arbitrarily order the data men first, women second as in the original chart. Another advantage is that it shows the male-female balance more clearly.

Redo_ukalcohol_2

An afterthought: I should have added the words "Stricter", "Laxer" on the two corners of the chart. This chart shows both the U.K. getting stricter but also that it joins Guyana and Netherlands as countries which treat men and women equally when it comes to drinking.