« March 2016 | Main | May 2016 »

Batmen not as interesting as it seems

When this post appears, I will be on my way to Seattle. Maybe I will meet some of you there. You can still register here.

I held onto this tip from a reader for a while. I think it came from Twitter:

20160326_woc432_1 batman

The Economist found a fun topic but what's up with the axis not starting at zero?

The height x weight gimmick seems cool but on second thought, weight is not the same as girth so it doesn't make much sense!

In the re-design, I use bubbles to indicate weight and vertical location to indicate height. The data aren't as interesting as one might think. All the actors pretty much stayed true to the comic-book ideal, with Adam West being the closest. I also changed the order of the actors.

Redo_batman

I left out the Lego, as it creates a design challenge that does not justify the effort.

 

 


Book review: The Truthful Art by Alberto Cairo, and the Enduring Problem of Statistical Illiteracy

Truthfulart_cover


I have been looking forward to reading Alberto Cairo’s new book since he started teasing about it last year. I enjoyed his first book, The Functional Art, mostly because we share the desire to bring the design and the analytical schools of data visualization closer together. His new book, The Truthful Art, represents another step in this ambitious project, and I found much to like about it.

The Truthful Art is really two books for the price of one. There is one book about analytical thinking, and interspersed between these chapters, there is a short book about graphical design. The chapters on analytical thinking take readers--presumably many will be journalists--through the standard diet of statistics, from summary statistics (ch. 6) to distributions (ch. 7) to correlations (ch. 9) to sampling theory (ch. 11) to time-series data (ch. 8). Cairo also devotes some delightful pages to cognitive biases (ch. 3) and research design (ch. 4).

Readers meet these analytical chapters as if they are enjoying a chef’s tasting menu at a top-line restaurant. Small delicious bites of knowledge are served quickly, with the expectation that readers will pursue the advanced reading list, curated by the author and printed at the end of each chapter. Cairo’s love of reading bursts through the pages.

At various points, Cairo delves into equations. In a wise move to balance the book, and to keep readers awake (we all know how boring statistics is), he weaves in several chapters on basic graphical design. There are materials on chart forms (ch. 5), on maps (ch. 10), and on visual design (ch. 2, a nice summary of his previous book). My favorite is chapter 12, which is a kaleidoscope of data visualization projects, at once celebrating the vitality of the field, and revealing its unruly, sometimes clashing, strands.

A good book is one that leaves the reader with lingering thoughts that transcend its pages. The Truthful Art succeeds in this respect. The purview of this book intersects directly with several lines of my own work: on the ethics of data analytics, and the teaching of statistical reasoning.

In chapter 11, Cairo tells a story familiar to anyone who is paying attention to the current U.S. presidential elections. In some past elections, El Pais, a Spanish newspaper, published a headline proclaiming that "Catalan public opinion swings to 'no' for independence," when the margin of difference was well within the margin of error of the survey. Cairo complains about the misleading headline, and pointed out the need to look at the margin of error. It’s refreshing that a journalist points this out. I used to see this as a matter of statistical illiteracy, but now I see this as an ethical issue.

Let’s say a pollster runs a new poll every hour. Because the race is a deadheat, the hourly results would flip-flop, as if one were observing a sequence of coin flips. A journalist would report the sequence as A, A, B, A, B, B, A, … while a statistician would write down tie, tie, tie, tie, …. Notice how boring this last sequence is, despite it being more truthful. I don't believe that journalists report the horse-race because of ignorance of margins of errors--they just choose to ignore them.

Readers of my blogs and books know what I am about to say about the teaching of statistics. Cairo follows a conventional approach, even including some equations, in an otherwise very readable account. The convention is the “how-to” approach that assumes learning comes from knowing formulas. It’s also an approach that has earned statistics departments at all universities a reputation of being uninspiring and obtuse.

Take hypothesis testing and p-values for example. Cairo’s account is more readable than most textbooks but at heart, it is a step-by-step manual for how to do hypothesis testing. To me, this method of instruction solves the wrong problem. The real issue is whether journalists are equipped to separate the wheat from the chaff when they read peer-reviewed journal articles, all of which use conventional hypothesis testing and attain p < 0.05. There are many things in life we learn to use without knowing any formulas. We learn to use a smartphone app without knowing how to code an app. We learn to ride a bike without having to learn mechanical engineering formulas.

These comments are not a specific criticism of Cairo’s project. I leave them here to encourage some creative thinking around the problem of statistical illiteracy that seems to never go away. I’m suggesting two shifts: from a set of formulas to a system of thinking; from imparting knowledge to promoting ethics.

***

Cairo’s book is an important contribution to bringing together the design and analytical perspectives on data visualization. He is an entertaining and lucid writer and thinker. Since he does not have mathematical training, he is able to explain the analytical materials in a way that would make sense to readers with non-technical backgrounds. So I highly recommend that you get a copy, get hooked, and do the advanced reading.


Football managers on the hot seat

Chris Y. asked how to read this BBC Sports graphic via Twitter:

Bbcsports_managers_winprecent

These are managers of British football (i.e. soccer) teams. Listed are some of the worst tenures of some managers. But what do the numbers mean?

The character "V" holds the key. When I first read the chart title, I wonder why managers are opposed to win percentages. Also, the legend at the bottom right confuses me. Did they mean "W" when they printed "V"? "Games W%" seems like a shorthand for winning percentage.

After looking up John Carver's not-so-impressive record, I learned that the left column are total number of matches managed and the right column is the winning percentage expressed as a number between 0 and 100.

I think even the designer got confused by those scales. Witness the little bar charts in the middle:

Bbcsports_managers_winpercent1

The two numbers are treated as if they are on the same scale. The left column is assumed to be the number of matches won while the right column is treated as the number of matches lost (or vice versa). Under this interpretation, the bar charts would depict the winning percentages. Let me fix the data:

Redo_bbcsport_managers

While these managers have compiled similar losing records on a relative basis, some of them lasted longer than others. The following chart brings out the difference in tenure while keeping the winning percentages: (I have re-sorted the managers.)

  Redo_bbcsports-managers-2

 When they finally got the sack, they reached the end of the line.