Fifty-nine intersections supporting forty dots of data

My friend Ray V. asked how this chart can be improved:


Let's try to read this chart. The Economist is always the best at writing headlines, and this one is simple and to the point: the rich get richer. This is about inequality but not just inequality - the growth in inequality over time.

Each country has four dots, divided into two pairs. From the legend, we learn that the line represents the gap between the rich and the poor. But what is rich and what is poor? Looking at the sub-header, we learn that the population is divided by domicile, and the per-capita GDP of the poorest and richest regions are drawn. This is a indirect metric, and may or may not be good, depending on how many regions a country is divided into, the dispersion of incomes within each region, the distribution of population between regions, and so on.

Now, looking at the axis labels, it's pretty clear that the data depicted are not in dollars (or currency), despite the reference to GDP in the sub-header. The numbers represent indices, relative to the national average GDP per head. For many of the countries, the poorest region produces about half of the per-capita GDP as the richest region.

Back to the orginal question. A growing inequality would be represented by a longer line below a shorter line within each country. That is true in some of these countries. The exceptions are Sweden, Japan, South Korea.

It doesn't jump out that the key task requires comparing the lengths of the two lines. Another issue is the outdated convention of breaking up a line (Britian) when the line is of extreme length - particularly unwise given that the length of the line encodes the key metric in the chart.

Further, it has low data-ink ratio a la Tufte. The gridlines, reference lines, and data lines weave together in a complex pattern creating 59 intersections in a chart that contains only 40  36 numbers.


 I decided to compute a simpler metric - the ratio of rich to poor.  For example, in the UK, the richest area produces about 20 times as much GDP per capita as the poorest one in 2015.  That is easier to understand than an index to the average region.

I had fun making the following chart, although many standard forms like the Bumps chart (i.e. slopegraph) or paired columns and so on also work.


This chart is influenced by Ed Tufte, who spent a good number of pages in his first book advocating stripping even the standard column chart to its bare essence. The chart also acknowledges the power of design to draw attention.



PS. Sorry I counted incorrectly. The chart has 36 dots not 40. 

Making people jump over hoops

Take a look at the following chart, and guess what message the designer wants to convey:


This chart accompanied an article in the Wall Street Journal about Wells Fargo losing brokers due to the fake account scandal, and using bonuses to lure them back. Like you, my first response to the chart was that little has changed from 2015 to 2017.

It is a bit mysterious the intention of the whitespace inserted to split the four columns into two pairs. It's not obvious that UBS and Merrill are different from Wells Fargo and Morgan Stanley. This device might have been used to overcome the difficulty of reading four columns side by side.

The additional challenge of this dataset is the outlier values for UBS, which elongates the range of the vertical axis, squeezing together the values of the other three banks.

In this first alternative version, I play around with irregular gridlines.


Grouped column charts are not great at conveying changes over time, as they cause our eyes to literally jump over hoops. In the second version, I use a bumps chart to compactly highlight the trends. I also zoom in on the quarterly growth rates.


The rounded interpolation removes the sharp angles from the typical bumps chart (aka slopegraph) but it does add patterns that might not be there. This type of interpolation however respects the values at the "knots" (here, the quarterly values) while a smoother may move those points. On balance, I like this treatment.


PS. [6/2/2017] Given the commentary below, I am including the straight version of the chart, so you can compare. The straight-line version is more precise. One aspect of this chart form I dislike is the sharp angles. When there are more lines, it gets very entangled.


Sorting out the data, and creating the head-shake manual

Yesterday's post attracted a few good comments.

Several readers don't like the data used in the NAEP score chart. The authors labeled the metric "gain in NAEP scale scores" which I interpreted to be "gain scores," a popular way of evaluating educational outcomes. A gain score is the change in test score between (typically consecutive) years. I also interpreted the label "2000-2009" as the average of eight gain scores, in other words, the average year-on-year change in test scores during those 10 years.

After thinking about what reader mankoff wrote, which prompted me to download the raw data, I realized that the designer did not compute gain scores. "2000-2009" really means the difference between the 2009 score and the 2000 score, ignoring all values between those end points. So mankoff is correct in saying that the 2009 number was used in both "2000-2009" and "2009-2015" computations.

This treatment immediately raises concerns. Why is a 10-year period compared to a 7-year period?

Andrew prefers to see the raw scores ("scale scores") instead of relative values. Here is the corresponding chart:


I placed a line at 2009, just to see if there is a reason for that year to be a special year. (I don't think so.) The advantage of plotting raw scores is that it is easier to interpret. As Andrew said, less abstraction. It also soothes the nerves of those who are startled that the lines for white students appear at the bottom of the chart of gain scores.

I suppose the reason why the original designer chose to use score differentials is to highlight their message concerning change in scores. One can nitpick that their message isn't particularly cogent because if you look at 8th grade math or reading scores, comparing 2009 and 2015, there appeared to be negligible change, and yet between those end-points, the scores did spike and then drop back to the 2009 level.

One way to mitigate the confusion that mankoff encountered in interpreting my gain-score graphic is to use "informative" labels, rather than "uninformative" labels.


Instead of saying the vertical axis plots "gain scores" or "change in scores," directly label one end as "no progress" and the other end as "more progress."

Everything on this chart is progress over time, and the stalling of progress is their message. This chart requires more upfront learning, after which the message jumps out. The chart of raw scores shown above has almost no perceptive overhead but the message has to be teased out. I prefer the chart of raw scores in this case.


Let me now address another objection, which pops up every time I convert a bar chart to a line chart (a type of Bumps chart, which has been called slope graphs by Tufte followers). The objection is that the line chart causes readers to see a trend when there isn't one.

So let me make the case one more time.

Start with the original column chart. If you want to know that Hispanic students have seen progress in their 4th grade math scores grind to a halt, you have to shake your head involuntarily in the following manner:


(Notice how the legend interferes with your line of sight.)

By the time you finish interpreting this graphic, you would have shaken your head in all of the following directions:


Now, I am a scavenger. I collect all these lines and rearrange them into four panels of charts. That becomes the chart I showed in yesterday's post. All I have done is to bring to the surface the involuntary motions readers were undertaking. I didn't invent any trends.

Involuntary head-shaking is probably not an intended consequence of data visualization

This chart is in the Sept/Oct edition of Harvard Magazine:

Naep scores - Nov 29 2016 - 4-21 PM

Pretty standard fare. It even is Tufte-sque in the sparing use of axes, labels, and other non-data-ink.

Does it bug you how much work you need to do to understand this chart?

Here is the junkchart version:


In the accompanying article, the journalist declared that student progress on NAEP tests came to a virtual standstill, and this version highlights the drop in performance between the two periods, as measured by these "gain scores."

The clarity is achieved through proximity as well as slopes.

The column chart form has a number of deficiencies when used to illustrate this data. It requires too many colors. It induces involuntary head-shaking.

Most unforgivingly, it leaves us with a puzzle: does the absence of a column means no progress or unknown?


PS. The inclusion of 2009 on both time periods is probably an editorial oversight.



Political winds and hair styling

Washington Post (link) and New York Times (link) published dueling charts last week, showing the swing-swang of the political winds in the U.S. Of course, you know that the pendulum has shifted riotously rightward towards Republican red in this election.

The Post focused its graphic on the urban / not urban division within the country:


Over Twitter, Lazaro Gamio told me they are calling these troll-hair charts. You certainly can see the imagery of hair blowing with the wind. In small counties (right), the wind is strongly to the right. In urban counties (left), the straight hair style has been in vogue since 2008. The numbers at the bottom of the chart drive home the story.

Previously, I discussed the Two Americas map by the NY Times, which covers a similar subject. The Times version emphasizes the geography, and is a snapshot while the Post graphic reveals longer trends.

Meanwhile, the Times published its version of a hair chart.


This particular graphic highlights the movement among the swing states. (Time moves bottom to top in this chart.) These states shifted left for Obama and marched right for Trump.

The two sets of charts have many similarities. They both use curvy lines (hair) as the main aesthetic feature. The left-right dimension is the anchor of both charts, and sways to the left or right are important tropes. In both presentations, the charts provide visual aid, and are nicely embedded within the story. Neither is intended as exploratory graphics.

But the designers diverged on many decisions, mostly in the D(ata) or V(isual) corner of the Trifecta framework.


The Times chart is at the state level while the Post uses county-level data.

The Times plots absolute values while the Post focuses on relative values (cumulative swing from the 2004 position). In the Times version, the reader can see the popular vote margin for any state in any election. The middle vertical line is keyed to the electoral vote (plurality of the popular vote in most states). It is easy to find the crossover states and times.

The Post's designer did some data transformations. Everything is indiced to 2004. Each number in the chart is the county's current leaning relative to 2004. Thus, left of vertical means said county has shifted more blue compared to 2004. The numbers are cumulative moving top to bottom. If a county is 10% left of center in the 2016 election, this effect may have come about this year, or 4 years ago, or 8 years ago, or some combination of the above. Again, left of center does not mean the county voted Democratic in that election. So, the chart must be read with some care.

One complaint about anchoring the data is the arbitrary choice of the starting year. Indeed, the Times chart goes back to 2000, another arbitrary choice. But clearly, the two teams were aiming to address slightly different variations of the key question.

There is a design advantage to anchoring the data. The Times chart is noticeably more entangled than the Post chart. There are tons more criss-crossing. This is particularly glaring given that the Times chart contains many fewer lines than the Post chart, due to state versus county.

Anchoring the data to a starting year has the effect of combing one's unruly hair. Mathematically, they are just shifting the lines so that they start at the same location, without altering the curvature. Of course, this is double-edged: the re-centering means the left-blue / right-red interpretation is co-opted.

On the Times chart, they used a different coping strategy. Each version of their charts has a filter: they highlight the set of lines to demonstrate different vignettes: the swing states moved slightly to the right, the Republican states marched right, and the Democratic states also moved right. Without these filters, the readers would be winking at the Times's bad-hair day.


Another decision worth noting: the direction of time. The Post's choice of top to bottom seems more natural to me than the Times's reverse order but I am guessing some of you may have different inclinations.

Finally, what about the thickness of the lines? The Post encoded population (voter) size while the Times used electoral votes. This decision is partly driven by the choice of state versus county level data.

One can consider electoral votes as a kind of log transformation. The effect of electorizing the popular vote is to pull the extreme values to the center. This significantly simplifies the designer's life. To wit, in the Post chart (shown nbelow), they have to apply a filter to highlight key counties, and you notice that those lines are so thick that all the other countries become barely visible.



Bumps chart goes mainstream

It’s a happy day when one of my favorite chart types, the Bumps chart, makes it to the Wall Street Journal, and the front page no less! (Link to article)

This chart shows the ground shifting in global auto production in the next five years, with Mexico and India gaining in rank over Germany and South Korea.


The criss-crossing of lines is key to reading these charts. A crossing ("bump") necessarily means one entity has surpassed the other entity in absolute terms, even though we are looking at the relative rank.

Of course, there is no Swiss Army Knife of charts. This graphic provides no clue as to the share of world production. It's quite possible that the first few countries account for the majority of the world's producction, so that the rank shifts toward the bottom of the chart are relatively inconsequential. Wikipedia says that the top player (China) produces a quarter of the world's vehicles, and twice as many as the next biggest producer. Any country ranked below 4 accounts for less than 5 percent of global volume.


I made a few minor edits in this version below. Fro example, it's unclear why both 2014 and 2015 are depicted since there were no rank shifts and also the 2015 data is a projection. (I don't have any problem with the two red lines even though I didn't carry over the color scheme.)


A startling chart about income inequality, with interpretative difficulties

Reader Robbi B. submitted the following chart posted to Twitter by Branko Milanovic:


The chart took a little time to figure out. This isn't a bad chart. Robbi wondered if there are alternative ways to plot this information.

The U.S. population is divided into percentiles across the horizontal axis, presumably based on the income distribution in some year (I'm guessing 2007, the start of the recession). For each percentile of people, the real per capita growth (decline) in disposable income is computed for two periods: the blue line shows the decline during the recession (2007-2010) and the orange shows the growth (in some cases further decline) during the recovery (2010-2013).

This chart draws attention to the two tails of the distibution, namely, the bottom 10 percent, and the top 5 percent. At one level, these two groups (excepting the bottom 2%) experienced the best of the recovery. But then, they also suffered the worst declines during the recession.


Here is one possible view of the same data, in a format with which I have been experimenting recently. You might call this a Bumps panel or a slopegraph panel.


The slopes draw attention to the relative magnitude of the declines and the subsequent recoveries. (I thinned the middle 80% substantially because there isn't much going on in that part of the dataset.) If I have more time, I'd have chosen a different color instead of grayscale for those lines.

I ignored any questions I have about the underlying data. How is disposable income defined and measured? Does it carry the same meaning across the entire spectrum of income distribution? etc. (Milanovic points to the Survey of Consumer Fiannces as the source.)


One reason for the reading difficulty is the absence of a reference point. It's unclear how to judge the orange line. Two answers are suggestive (but problematic). One is the zero line: which segments of the population experienced a recovery and which didn't? Another is the mirror image of the blue line: how much of what one lost during the recession did one recover by 2013 (roughly speaking)?

Both of these easy interpretations worry me because they carry an assumption of equal guilt (blue line) and/or equal spoils (orange line). It is very possible that the unwarranted risk-taking or fraud was not evenly spread out amongst the percentiles, and if so, it is impossible to judge whether the distribution exhibited in the blue line was "fair". It is then also impossible to know if the distribution contained in the orange line was "fair". Indeed, if the orange line mirrored the blue line, then all segments recovered similarly what they lost--this would only make sense if all segments are equally culpable in the recession.

Where a scatter plot fails

Found this chart in the magazine that Charles Schwab sends to customers:


When there are two variables, and their correlation is of interest, a scatter plot is usually recommended. But not here!

The text labels completely dominate this chart and the designer tried very hard to place them but a careful look reveals that some boxes are placed above the dots while others are placed to their right and the dot for "Short Treasuries" holds refuge quite a while away from the dot. This means the locations of the text boxes do not substitute for the dots.


Here is a different view of this data:


I am using a bumps-style chart, which allows the labels to be written horizontally outside the canvass. Instead of all categories plotted on the same chart, I use a small multiples setup to differentiate three types of risk-return relationships.

Circular but insufficient

One of my students analyzed the following Economist chart for her homework.


I was looking for it online, and found an interactive version that is a bit different (link). Here are three screen shots from the online version for years 2009, 2013 and 2018. The first and last snapshots correspond to the years depicted in the print version.


The online version is the self-sufficiency test for the print version. In testing self-sufficiency, we want to see if the visual elements (i.e. the circular sectors on the print version) pull their own weights. The quick answer is no. The reader can't tell how much sales are represented in each sector, nor can they reliably estimate the relative scales of print versus ebook (pink/red vs yellow/orange) or year-to-year growth rates.

As usual, when we see the entire data set printed on the chart itself, it is giveaway that the visual elements are mere ornaments.

The online version does not have labels unless you hover over the hemispheres. But again it is a challenge to learn anything from the picture.

In the Trifecta checkup, this is a Type V chart.


This particular dataset is made for the bumps-style chart:





Respect the reader's time

A graphic illustrating how Americans spend their time is a perfect foil to make the important case that the reader's time is a scarce resource. I wrote about this at the ASA forum in 2011 (link).

In the same WSJ that carried the DSL speed chart (link), they boldly placed the following graphic in the center of the front page of the printed edition:


The visual form is of a treemap displaying the results of the recently released Time Use Survey results (link to pdf).

What does the designer want us to learn from this chart?


What jumps out first is the importance of various activities, starting with sleep, then work, TV, leisure/sports, etc.

If you read the legend, you'll notice that the colors mean something. The blue activities take up more time in 2013 compared to 2003. Herein, we encounter the first design hiccup.

The size of the blocks (which codes the absolute amount) and the color of the blocks (which codes the relative change in the amount) compete for our attention. According to Bill Cleveland's research, size is perceived more strongly than color. Thus, the wrong element wins.

Next, if we have time on our hands, we might read the data labels. Each block has two labels, the absolute values for 2003 and for 2013. In this, the designer is giving an arithmetic test. The reader is asked to compute the change in time spent in his or her head.

It appears that the designer's key message is "Aging Americans sleep more, work less", with the subtitle "TV remains No.1 hobby".


Wsj_atus2013Now compare the treemap to this set of "boring" bar charts.

This visualization of the same data appears in WSJ online in lieu of the treemap. Here, the point of the article is made clear; the reader needs not struggle with mental gymnastics.

(One can grumble about the red-green color-blindness blindness but otherwise, the graphic is pretty good.)



When I see this sort of data, I like to make a Bumps chart. So here it is:


The labeling of the smaller categories poses a challenge because the lines are so close together. However, those numbers are so small that none of the changes would be considered statistically significant.


From a statistical/data perspective, a very important question must be raised. What is the error bar around these estimates? Is there anything meaningful about an observed difference of fewer than 10 minutes?

Amusingly, the ATUS press release (link to pdf) has a technical note that warns us about reliability of estimates but nowhere in the press release can one actually find the value of the standard error, or a confidence interval, etc. After emailing them, I did get the information promptly. The standard error of one estimate is roughly 0.025-0.05 hours, which means that standard error of a difference is roughly 0.05- 0.1 hours, which means that a confidence interval around any estimated difference is roughly 0.1-0.2 hours, or 6-12 minutes.

Except for the top three categories, it's hard to know if the reported differences are due to sampling.


A further problem with the data is its detachment from reality. There are two layers of averaging going on, once at the population level and once at the time level. In reality, not everyone does these things every day. This dataset is really only interesting to statisticians.

So, in a Trifecta Checkup, the treemap is a Type DV and the bar chart is a Type D.