At first glance, this Wall Street Journal chart seems unlikely to impress as it breaks a number of "rules of thumb" frequently espoused by dataviz experts. The inconsistency of mixing a line chart and a dot plot. The overplotting of dots. The ten colors...

However, I actually like this effort. The discontinuity of chart forms nicely aligns with the split between the actual price movements on the left side and the projections on the right side.

The designer also meticulously placed the axis labels with monthly labels for actual price movements and quarterly labels for projections.

Even the ten colors are surprisingly manageable. I am not sure we need to label all those banks; maybe just the ones at the extremes. If we clear out some of these labels, we can make room for a median line.

***

How good are these oil price predictions? It is striking that every bank shown is predicting that oil prices have hit a bottom, and will start recovering in the next few quarters. Contrast this with the left side of the chart, where the line is basically just tumbling down.

Step back six months earlier, to September 2015. The same chart looks like this:

Again, these analysts were calling a bottom in prices and predicting a steady rise over the next quarters.

The track record of these oil predictions is poor:

The median analyst predicted oil prices to reach $50 by Q1 of 2016. Instead, prices fell to $30.

Given this track record, it's shocking that these predictions are considered newsworthy. One wonders how these predictions are generated, and how did the analysts justify ignoring the prevailing trend.

*When this post appears, I will be on my way to Seattle. Maybe I will meet some of you there. You can still register here. *

I held onto this tip from a reader for a while. I think it came from Twitter:

The Economist found a fun topic but what's up with the axis not starting at zero?

The height x weight gimmick seems cool but on second thought, weight is not the same as girth so it doesn't make much sense!

In the re-design, I use bubbles to indicate weight and vertical location to indicate height. The data aren't as interesting as one might think. All the actors pretty much stayed true to the comic-book ideal, with Adam West being the closest. I also changed the order of the actors.

I left out the Lego, as it creates a design challenge that does not justify the effort.

I have been looking forward to reading Alberto Cairo’s new book since he started teasing about it last year. I enjoyed his first book, *The Functional Art*, mostly because we share the desire to bring the design and the analytical schools of data visualization closer together. His new book, *The Truthful Art*, represents another step in this ambitious project, and I found much to like about it.

*The Truthful Art* is really two books for the price of one. There is one book about analytical thinking, and interspersed between these chapters, there is a short book about graphical design. The chapters on analytical thinking take readers--presumably many will be journalists--through the standard diet of statistics, from summary statistics (ch. 6) to distributions (ch. 7) to correlations (ch. 9) to sampling theory (ch. 11) to time-series data (ch. 8). Cairo also devotes some delightful pages to cognitive biases (ch. 3) and research design (ch. 4).

Readers meet these analytical chapters as if they are enjoying a chef’s tasting menu at a top-line restaurant. Small delicious bites of knowledge are served quickly, with the expectation that readers will pursue the advanced reading list, curated by the author and printed at the end of each chapter. Cairo’s love of reading bursts through the pages.

At various points, Cairo delves into equations. In a wise move to balance the book, and to keep readers awake (we all know how boring statistics is), he weaves in several chapters on basic graphical design. There are materials on chart forms (ch. 5), on maps (ch. 10), and on visual design (ch. 2, a nice summary of his previous book). My favorite is chapter 12, which is a kaleidoscope of data visualization projects, at once celebrating the vitality of the field, and revealing its unruly, sometimes clashing, strands.

A good book is one that leaves the reader with lingering thoughts that transcend its pages. *The Truthful Art* succeeds in this respect. The purview of this book intersects directly with several lines of my own work: on the ethics of data analytics, and the teaching of statistical reasoning.

In chapter 11, Cairo tells a story familiar to anyone who is paying attention to the current U.S. presidential elections. In some past elections, El Pais, a Spanish newspaper, published a headline proclaiming that "Catalan public opinion swings to 'no' for independence," when the margin of difference was well within the margin of error of the survey. Cairo complains about the misleading headline, and pointed out the need to look at the margin of error. It’s refreshing that a journalist points this out. I used to see this as a matter of statistical illiteracy, but now I see this as an ethical issue.

Let’s say a pollster runs a new poll every hour. Because the race is a deadheat, the hourly results would flip-flop, as if one were observing a sequence of coin flips. A journalist would report the sequence as *A*, *A*, *B*, *A*, *B*, *B*, *A*, … while a statistician would write down *tie, tie, tie, tie, ….* Notice how boring this last sequence is, despite it being more truthful. I don't believe that journalists report the horse-race because of ignorance of margins of errors--they just choose to ignore them.

Readers of my blogs and books know what I am about to say about the teaching of statistics. Cairo follows a conventional approach, even including some equations, in an otherwise very readable account. The convention is the “how-to” approach that assumes learning comes from knowing formulas. It’s also an approach that has earned statistics departments at all universities a reputation of being uninspiring and obtuse.

Take hypothesis testing and p-values for example. Cairo’s account is more readable than most textbooks but at heart, it is a step-by-step manual for how to do hypothesis testing. To me, this method of instruction solves the wrong problem. The real issue is whether journalists are equipped to separate the wheat from the chaff when they read peer-reviewed journal articles, all of which use conventional hypothesis testing and attain p < 0.05. There are many things in life we learn to use without knowing any formulas. We learn to use a smartphone app without knowing how to code an app. We learn to ride a bike without having to learn mechanical engineering formulas.

These comments are not a specific criticism of Cairo’s project. I leave them here to encourage some creative thinking around the problem of statistical illiteracy that seems to never go away. I’m suggesting two shifts: from a set of formulas to a system of thinking; from imparting knowledge to promoting ethics.

***

Cairo’s book is an important contribution to bringing together the design and analytical perspectives on data visualization. He is an entertaining and lucid writer and thinker. Since he does not have mathematical training, he is able to explain the analytical materials in a way that would make sense to readers with non-technical backgrounds. So I highly recommend that you get a copy, get hooked, and do the advanced reading.

Chris Y. asked how to read this BBC Sports graphic via Twitter:

These are managers of British football (i.e. soccer) teams. Listed are some of the worst tenures of some managers. But what do the numbers mean?

The character "V" holds the key. When I first read the chart title, I wonder why managers are opposed to win percentages. Also, the legend at the bottom right confuses me. Did they mean "W" when they printed "V"? "Games W%" seems like a shorthand for winning percentage.

After looking up John Carver's not-so-impressive record, I learned that the left column are total number of matches managed and the right column is the winning percentage expressed as a number between 0 and 100.

I think even the designer got confused by those scales. Witness the little bar charts in the middle:

The two numbers are treated as if they are on the same scale. The left column is assumed to be the number of matches won while the right column is treated as the number of matches lost (or vice versa). Under this interpretation, the bar charts would depict the winning percentages. Let me fix the data:

While these managers have compiled similar losing records on a relative basis, some of them lasted longer than others. The following chart brings out the difference in tenure while keeping the winning percentages: (I have re-sorted the managers.)

When they finally got the sack, they reached the end of the line.