« November 2014 | Main | January 2015 »

Sheep tramples sense

Merry Christmas, readers.


A Twitter follower pointed me to this visual:


I have yet to understand why the vertical axis of the top chart keeps changing scales over time. The white dot labelled "Peak 1982" (70 million) is barely above the other white dot for "2007" (38 million). This chart hides a clear trend: the population of sheep in New Zealand has plunged by 45% over 25 years.

To address the question of sheep versus human, one should plot the ratio of sheep-to-human directly. In this case, the designer probably faced a problem: because of the plunging population of sheep, the ratio has plunged steeply in 25 years. To make a point that "people are outnumbered more than 9 to 1", the designer didn't want to show a plunging trend. (Could this be the reason why the human population in 1982 was not printed?)

This is a case of too many details. Instead of manipulating the scale to distort the data, one can simply show the current ratio, or the average ratio in the last five years.


As the reader scans to the bottom set of charts, a cognitive wedge is encountered, as the curved scale of the New Zealand chart gave way to the normal uniform scale. These smaller charts are no less confusing, however.


The two lines on these two charts appear almost the same and yet, the Australian chart (on the left) shows a ratio of 4 to 1 while the Icelandic chart (on the right) shows a ratio of 1.5 times. Makes you wonder if each one of the small-multiples have a dual axis.

Again, I'm not convivned that the time series adds anything to the message.


Cloudy and red

Note: I'm traveling during the holidays so updates will be infrequent.


Reader Daniel L. pointed me to a blog post discussing the following weather map:


The author claimed that many readers misinterpreted the red color as meaning high temperatures when he intended to show higher-than-normal temperatures. In other words, the readers did not recognize a relative scale is in play.

That is a minor issue that can be fixed by placing a label on the map.

There are several more irritants, starting with the abundance of what Ed Tufte calls chartjunk. The county boundaries do not serve a purpose, nor is it necessary to place so many place names. State boundaries too are  too imposing. The legend fails to explain what the patch of green in Florida means.

The article itself links to a different view of this data on a newly launched site called Climate Prediction Center, by the National Oceanic and Atmospheric Administration (link). Here is a screenshot of the continental U.S.


This chart is the other extreme, bordering on too simple.

I'd suggest adding a little bit of interactivity to this chart, such as:

  • Hiding the state boundaries and showing them on hover only
  • Selectively print the names of major cities to help readers orient themselves
  • Selectively print the names of larger cities around the color boundaries
  • Using a different background map that focuses on the U.S. rather than the entire North American continent 

This is a Type V chart.

How to take exams

Here is something different: I wrote a piece on exam-taking tips. It's on a new website, Cafe, which has lots of good (non-quant) reads. The motivation for the piece is my observation that most American students are not taught how to take exams. As a professor, I notice that many students get lower scores than they deserve because of this. 

In this article, I describe five things that are often neglected here, but are common knowledge in exam-heavy cultures.

The link is here.

Three axes or none

Catching up on some older submissions. Reader Nicholas S. saw this mind-boggling chart about Chris Nolan movies when Interstellar came out:


This chart was part of an article by Vulture (link).

It may be the first time I see not one, not two, but three different scales on the same chart.

First we have Rotten Tomatoes score for each movie in proportions:


The designer chopped off 49% of each column. So the heights of the columns are not proportional to the data.

Next we see the running time of movies in minutes (dark blue columns):


For this series, the designer hid 40 minutes worth of each movie below the axis. So again, the heights of the columns do not convey the relative lengths of the movies.

Thirdly, we have light blue columns representing box office receipts:


Or maybe not. I can't figure out what is the scale used here. The same-size chunks shown above display $45,000 in one case, and $87 million in another!

So the designer kneaded together three flawed axes. Or perhaps the designer just banished the idea of an axis. But this experiment floundered.


Here is the data in three separate line charts:



In a Trifecta Checkup (link), the Vulture chart falls into Type DV. The question might be the relationship between running time and box office, and between Rotten Tomatoes Score and box office. These are very difficult to answer.

The box office number here refers to the lifetime gross ticket receipts from theaters. The movie industry insists on publishing these unadjusted numbers, which are completely useless. At the minimum, these numbers should be adjusted for inflation (ticket prices) and for population growth, if we are to use them to measure commercial success.

The box office number is also suspect because it ignores streaming, digital, syndication, and other forms of revenues. This is a problem because we are comparing movies across time.

You might have noticed that both running time and box office numbers have gone up over time. (That is to say, running time and box office numbers are highly correlated.) Do you think that is because moviegoers are motivated to see longer films, or because movies are just getting longer?



PS. [12/15/2014] I will have a related discussion on the statistics behind this data on my sister blog. Link will be active Monday afternoon.

Where a scatter plot fails

Found this chart in the magazine that Charles Schwab sends to customers:


When there are two variables, and their correlation is of interest, a scatter plot is usually recommended. But not here!

The text labels completely dominate this chart and the designer tried very hard to place them but a careful look reveals that some boxes are placed above the dots while others are placed to their right and the dot for "Short Treasuries" holds refuge quite a while away from the dot. This means the locations of the text boxes do not substitute for the dots.


Here is a different view of this data:


I am using a bumps-style chart, which allows the labels to be written horizontally outside the canvass. Instead of all categories plotted on the same chart, I use a small multiples setup to differentiate three types of risk-return relationships.