Book review: The Truthful Art by Alberto Cairo, and the Enduring Problem of Statistical Illiteracy
Apr 05, 2016
I have been looking forward to reading Alberto Cairo’s new book since he started teasing about it last year. I enjoyed his first book, The Functional Art, mostly because we share the desire to bring the design and the analytical schools of data visualization closer together. His new book, The Truthful Art, represents another step in this ambitious project, and I found much to like about it.
The Truthful Art is really two books for the price of one. There is one book about analytical thinking, and interspersed between these chapters, there is a short book about graphical design. The chapters on analytical thinking take readers--presumably many will be journalists--through the standard diet of statistics, from summary statistics (ch. 6) to distributions (ch. 7) to correlations (ch. 9) to sampling theory (ch. 11) to time-series data (ch. 8). Cairo also devotes some delightful pages to cognitive biases (ch. 3) and research design (ch. 4).
Readers meet these analytical chapters as if they are enjoying a chef’s tasting menu at a top-line restaurant. Small delicious bites of knowledge are served quickly, with the expectation that readers will pursue the advanced reading list, curated by the author and printed at the end of each chapter. Cairo’s love of reading bursts through the pages.
At various points, Cairo delves into equations. In a wise move to balance the book, and to keep readers awake (we all know how boring statistics is), he weaves in several chapters on basic graphical design. There are materials on chart forms (ch. 5), on maps (ch. 10), and on visual design (ch. 2, a nice summary of his previous book). My favorite is chapter 12, which is a kaleidoscope of data visualization projects, at once celebrating the vitality of the field, and revealing its unruly, sometimes clashing, strands.
A good book is one that leaves the reader with lingering thoughts that transcend its pages. The Truthful Art succeeds in this respect. The purview of this book intersects directly with several lines of my own work: on the ethics of data analytics, and the teaching of statistical reasoning.
In chapter 11, Cairo tells a story familiar to anyone who is paying attention to the current U.S. presidential elections. In some past elections, El Pais, a Spanish newspaper, published a headline proclaiming that "Catalan public opinion swings to 'no' for independence," when the margin of difference was well within the margin of error of the survey. Cairo complains about the misleading headline, and pointed out the need to look at the margin of error. It’s refreshing that a journalist points this out. I used to see this as a matter of statistical illiteracy, but now I see this as an ethical issue.
Let’s say a pollster runs a new poll every hour. Because the race is a deadheat, the hourly results would flip-flop, as if one were observing a sequence of coin flips. A journalist would report the sequence as A, A, B, A, B, B, A, … while a statistician would write down tie, tie, tie, tie, …. Notice how boring this last sequence is, despite it being more truthful. I don't believe that journalists report the horse-race because of ignorance of margins of errors--they just choose to ignore them.
Readers of my blogs and books know what I am about to say about the teaching of statistics. Cairo follows a conventional approach, even including some equations, in an otherwise very readable account. The convention is the “how-to” approach that assumes learning comes from knowing formulas. It’s also an approach that has earned statistics departments at all universities a reputation of being uninspiring and obtuse.
Take hypothesis testing and p-values for example. Cairo’s account is more readable than most textbooks but at heart, it is a step-by-step manual for how to do hypothesis testing. To me, this method of instruction solves the wrong problem. The real issue is whether journalists are equipped to separate the wheat from the chaff when they read peer-reviewed journal articles, all of which use conventional hypothesis testing and attain p < 0.05. There are many things in life we learn to use without knowing any formulas. We learn to use a smartphone app without knowing how to code an app. We learn to ride a bike without having to learn mechanical engineering formulas.
These comments are not a specific criticism of Cairo’s project. I leave them here to encourage some creative thinking around the problem of statistical illiteracy that seems to never go away. I’m suggesting two shifts: from a set of formulas to a system of thinking; from imparting knowledge to promoting ethics.
Cairo’s book is an important contribution to bringing together the design and analytical perspectives on data visualization. He is an entertaining and lucid writer and thinker. Since he does not have mathematical training, he is able to explain the analytical materials in a way that would make sense to readers with non-technical backgrounds. So I highly recommend that you get a copy, get hooked, and do the advanced reading.
The reaction to the Australian unemployment figures http://www.abs.gov.au/ausstats/abs@.nsf/mf/6202.0 always amazes me. To a statistician they have a fairly obvious 95% CI of +/-0.2 pp, so they vary about 0.1 to 0.2 per month, occasionally more, but it is purely sampling variation. It takes at least 3 months, maybe more, of data before it is obvious what is happening in the economy. Even economists who you would hope would understand can't seem to understand this.
This graph will be updated, but if your looking at the February 2016 one, it is going down because of a massive property bubble. Something that economists can't understand either.
There really needs to be better understanding of sampling variation. A general course on statistics would work well without any hypothesis testing, only descriptive statistics and confidence intervals. Simulation could be used to explain the concepts.
Posted by: Ken | Apr 07, 2016 at 01:22 AM
Very interesting view. When you speak about ethics, my first thought goes to this video: https://www.youtube.com/watch?v=jWmUnU7HS-I (see in particular the animation at 1:20). It's a curious coincidence (really?) that I am the second reader who cited statistics about unemployment.
Posted by: Antonio | Apr 19, 2016 at 03:52 PM