When a chart does nothing for the story

PixardeclineexcelThere is some banter on Twitter about a chart that appeared in The Atlantic on "Pixar's Sad Decline--in One Chart". (@thewhyaxis, @jschwabish, @tealtan).

Link to article


It's a bit horrible but not the worst chart ever.

The most offensive aspect is the linear regression line. It's clearly an inappropriate model for this dataset.

I also don't like charts that include impossible values on the axis, in this case, the Rotten Tomato Score does not ever go above 100%.

If the chart is turned on its side, the movie titles can be read horizontally.


I am compelled by the story but the chart doesn't help at all. Of course, it would be better if they can find data on the profitability of each movie. Readers should ask how correlated the Rotten Tomato Score is with box office, and also, what are the relative costs of producing these different movies. Jon has the score against profit chart (link).


Breaking every limb is very painful

This Financial Times chart is a big failure:


Look at the axis. Usually a break in the axis is reserved for outliers. If there is one bar in a bar chart that extends way beyond the rest of the data, then you would sever that bar to let readers know that the scale is broken. Here, the designer broke every bar in the entire chart. It's as if the designer knows we'll complain about not starting the chart at zero -- so the bars all start at zero except they jump from zero to 70 right away.


Trifecta_checkupThe biggest issue with this chart is not its graphical element. It's the other two corners of the Trifecta checkup: what is the question being asked? And what data should be used to address that question?

The accompanying article complains about the dearth of HB1 H-1B visas for technical talent at businesses. But it never references the data being plotted.

It's hard for me to even understand what the chart is saying. I think it is saying that in Bloomington-Normal, IL, 94.8 percent of its HB1 H-1B visa requests are science related. There is no way to interpret this number without knowing the percentage for the entire country. It is most likely true that HB1 H-1B visas are primarily used to recruit technical talent from overseas, and the proportion of such requests that are STEM related is high everywhere. In this sense, it's not clear that the proportion of HB1 H-1B requests is a useful indicator of the dearth of technical talent.

Secondly, it is highly unlikely that the decimal point is meaningful. Given the highly variable total number of requests across different locations, the decimal point would represent widely varying numbers of requests.

I'd prefer to look at absolute number of requests for this type of analysis, given that Silicon Valley has orders of magnitude more technical jobs than most of the other listed locations. Requests aren't even a good indicator of labor shortage. Typically HB1 H-1B visas run up against the quota sometime during the year, and companies will stop requesting new visas since there is no chance of getting approved. This is a form of survivorship bias. Wouldn't it be easier to collect data on the number of vacant technical jobs in each location?



Interpreting some charts about guns

Felix linked to a set of charts about guns in the U.S. (and elsewhere). The original charts, by Liz Fosslien, are found here.

I like the clean style used by Fosslien. Some of the charts are thought-provoking. Many of them may raise more questions than they answer. Here are a few that caught my eye.


A simplistic interpretation would claim that banning handguns is futile, and may even have an adverse impact on murder rate. However, this chart does not reveal the direction of causality. Did some countries ban handguns because they are reacting to higher violence? If that is the case, this chart is confirming that the countries with handgun bans are a self-selected group.



The U.S. is an outlier, both in terms of firearm ownership and firearm homicides. This makes the analysis much harder because the U.S. is really in a class of its own. It's not at all clear whether there is a positive correlation in the cluster below, and even if there is, whether we can draw a straight line up to the U.S. dot is also dubious.



Fosslien is being cheeky to deny us the identity of the other outlier, the country with few firearms but even higher death rate from intentional homicide. These scatter plots are great by the way to show bivariate distributions.



I'd still prefer a line chart for this type of data but this particular paired bar chart works for me as well. The contents of this chart is a shock to me.



I just don't get this one. Why is there a fan?

Mountain, molehill

Reader Jordan G. wasn't impressed by an attempt to visualize medal counts by country and sport in the Olympics over 112 years by Christian Gross at Visualizing.org (link).

Vis_gb_1988The author chose to use the metaphor of "mountains" to portray the cumulative medals earned by each country. Each country is treated as a unit in a small-multiples-style presentation (see right). The bars represent different sports, and they are arranged as if arranging lanes in a swimming contest, with the largest haul in the middle, and second largest on the left, third largest on the right, etc.

This exercise highlights two important considerations from the designer's perspective.

The first is scaling. You'll notice that the first page (for Athens, 1896; excerpted below) is essentially unreadable. This is because the designer uses the same scale for every single page, and because he is plotting the cumulative number of medals over time. These two decisions mean that the initial pages would have much lower values than the latter pages. 


It also means that on other pages, the extreme values walk off the edge of the chart area. (I think the reason is that if the scale has been tailored to present these extreme values, then pretty much every chart that doesn't contain extreme values would become unreadable.)


The choice of making countries units (discussed further below) makes for some awkwardness in latter years as the medals became more spread out among more countries. In the first Olympics, only 10 countries won any medals but in 2008, 127 different countries won at least one medal, among which 50 countries or so had never won more than 20 medals in all sports combined. This skewed distribution causes the designer to break one of the cardinal rules of small multiples, which is that the design of each unit must be the same, with only the data varying. Here, the top countries have their data plotted on a different scale from that of the other countries, as we can see from the different sized squares. When you mouse over a particular bar for a particular country, that sport is now colored red and corresponding bars are highlighted for every other country -- the problem is that the scales are not the same so the lengths of the bars give us misleading comparisons.


The second consideration is pagination. The data set has three dimensions: country, sport and time. In this presentation, the designer places sport within country within time. Put differently, the time categories are placed furthest apart - in fact, the reader must load a different page to see the evolution from one Olympics to another. The sport categories are placed closest to each other, in the same chart unit and so it requires the least effort for the reader to compare the number of medals won by the US in athletics (say) compared to gymnastics.

This goes back to the top corner of our Trifecta Checkup. What is the most important question the designer is trying to address? If it is the evolution over time, then the time dimension should not be placed furthest apart. If it is comparison across sports, then the sport dimension should be placed innermost.

For me the country dimension is the least important because everyone knows the US typically wins the most medals, and the top 5-10 countries are quite stable. Within a sport, I might wonder if certain countries are dominant in certain periods, and if certain countries started developing particular sports from a certain time period onwards. In this case, I'd place country and time within sport. 

The following gives an idea of an alternative way of visualizing this data:


Apologies for not completing the dataset. Both charts are missing countries as well as years of history. But you can see where I'm going with this. There would be one chart per sport.

In gymnastics, we see that the US and China are latecomers, Russia has been the superpower until recently while Japan and Germany have stagnated.

Look what I found: two amazing charts

While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link).

How embarrassing is the cost effectiveness of U.S. health care spending?


When a chart is executed well, no further words are necessary.

I'd only add that the other countries depicted are "wealthy nations".


Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. An important point to note is that the U.S. started out in 1970 similar to the other nations.


Let's appreciate this beauty:

  • Let the data speak for itself. Time goes from bottom left to upper right. As more money is spent, life expectancy goes up. However, the slope of the line is much smaller for the US than the other countries. There is no need to add colors, data labels, interactivity, animation, etc.
  • Recognize what's important, what's not. The US line is in a different color, much thicker and properly made the foreground of the chart.
  • Rather than clutter up the chart, the other 19 lines are anonymized. They all have the same color and thickness, and all given one aggregate label. This is an example of overcoming loss aversion (see this post for more): it is ok to suppress some of the data.
  • The axis labeling is superb. Tufte preaches this clean style. There is no need to use regularly-spaced axis labels... use data-informed labels. Unfortunately, software is way behind on this issue. You can do this in R but that's about it.


Someone submits a good infographic

Reader Chris P. sent me to this Mint infographic showing the income distribution in the U.S. (link). I found the second section more interesting so this post will focus on that one chart. But I want to let Chris have his word also, so we have a double post. To see Chris's comment on the chart, see here.

Here is the chart from the second section:


What do I like about this chart?

It tells a story without appealing directly to the data.  I see only 7x2 = 14 numbers on the chart, all embedded into the legend/scale. So many charts of this type send readers immediately into a twister by bombarding our eyes with data.

In the middle of the chart, for instance, states like MD and MA contrast with states like MI and MS. Poorer people are in the yellow segments while richer people are in the greener segments. So we can see that in MD and MA, the green part extends below the first horizontal gridline while in MI and MS, that gridline cuts into the orange. The implication is that there are more rich people in MD and MA than in MI and MS.

The horizontal gridlines are subtle but surprisingly functional, allowing readers to pick out the information. The gridlines divide each column into 4 equal parts so each part is a quarter (quartile) of the state population. In MD and MA, at least the top 25% of their populations are considered rich by national standards. Rich, as defined by the green as defined by the legend, means household incomes greater than $75,000. In both those states, the top 25% earn at least $100,000.

Similarly, by looking at the color of the segment that crosses the lowest horizontal gridline, we know how much the bottom 25% earn in each state. The poorest segment seems to be smaller in AK than in other states.

The row of state boundaries at the bottom of the chart is very cute. And it encodes information, which is a wonderful touch. I believe (though haven't verified) the color of the state map tells us the mean household income within the state.


A few improvements would make this column chart better. One shouldn't place the national average above the chart horizontally using a different scale. Just place it as an additional column next to the other 50+ columns, with a slight offset and proper labeling. This allows direct lookup of how a state compares to the national average.

Also, try ordering by income inequality. The alphabetical order does the reader no favors. The ordering is particularly important because the main finding of the chart is that income distribution exhibits only moderate variability by state - most states look alike.


Given the low variability, the challenge is how to bring out the mild differences: which parts of the income distribution of which state show variance against the national average?

In the following attempt, we plot the "excess" proportion relative to the national average by state. 

For example, in the most "unequal" "state", District of Columbia (first chart), we find that it has a shortage (negative excess) of people earning below $75,000, and an excess of people earning above $75,000 when compared to the national income distribution. The proportion of "excess" increases with each higher income bracket (moving from left to right of the chart).


I have grouped and ordered the states by the orientation of the line plots. The first group of states, boxed in red, are all similar to DC, in the sense that they have a shortage of low earners and an excess of high earners.

Some states, like Texas, Pennsylvania and Georgia, have an income distribution that almost exactly mirrors the national average. Then, those states boxed in aquamarine have a small excess of poor people and a shortage of rich people compared to the national average. Not unexpectedly, Puerto Rico is on its own.


One has to be careful with this type of data because the income distributions are highly skewed. How are the income brackets determined?

Lumping everyone in the top 4% or so (earning $300,000 or more) into one bracket obscures the tremendous income inequality even within that bracket. In fact, for my chart above, I have to decide where to put the last data point, i.e. the people earning $200,000 or more, because $200,000 or more is not a point on the horizontal axis but an open-ended range. I just used $300,000 but the better thing to do is to find out the average income within that top bracket and place the point there.


The meaning of pretty pictures and the case of 15 scales

When we call something a "pretty picture", what do we mean? 

Based on the evidence out there, it would seem like "pretty" means one or more of the following:

  • unusual: not your Grandma's bar chart or line chart
  • visually appealing: say, have irregular shapes, lots of colors, curved lines and so on
  • complex: if you don't get the point right away, the chart must be smart, and must contain a lot of information
  • data-rich: a variant of complex


I pondered that question while staring at this chart, reprinted in the NYT Magazine, in which they pitched a new book by Craig Robinson called "Fip Flop Fly Ball".  According to the editors, the book is a "beautiful, number-crunched (sic) combination of statistical and graphic-design geekery". So here's Exhibit A:

Nytm_flipflop This chart is supposed to tell us whether big payroll equals success in Major League Baseball, and success is measured variously by making the playoffs, making the championship series or winning the championship. It nicely uses a relatively long time horizon of 15 years.

The problem: how are we supposed to learn the answer to the question?

To learn it, we have to go through these steps:

Read the fine print under the title that tells us the vertical scale is the rank by payroll, so within each season, the top spender is at the top, and the bottom spender at the bottom. (Strictly speaking, there are 15 different scales, see discussion below.)

Figure out that the black row has all of the championship teams aligned at the same vertical level.

Realize that the more teams that are listed below the black line, the bigger the payroll of the championship team in that season.

Alternatively, the more teams that are found above the black line, the smaller the payroll is of the winning team that year.

From that, we see that for almost every season in the last 15 years, the winner comes from a relatively free-spending team. Florida in 2003 is a big outlier.


Maybe that isn't too bad. Now, try to interpret the blue boxes, which label all the playoff teams in every season. Is it that playoff teams also are bigger spenders than non-playoff teams?

To learn this, try the following step:

Ignore the relative height of the columns from season to season, and focus only on the relative positions of the blue slots within each column.

Are these blue slots more likely to be crowded towards the top of the column than the bottom?

The answer should be obvious but why does it feel so hard?


You may be confused by the vertical scale. Is it the case that in 2003, the entire league decided to splurge on spending? Does the protruding tower in 2003 indicate especially high payrolls?

No, it doesn't. It turns out there are really 15 separate vertical scales on this one chart; each column has to be viewed separately. There is a ranking within each column but the relative height  from one column to the next means nothing. Each column is hinged to the black row which is the rank by payroll of the championship team in that season.

The decision to anchor the columns in this way is what dooms this chart. In the junkart version below, I reversed this decision and ended up with a much clearer picture:


It's now clear that almost all the playoff teams come from the top quartile or top third of the table in terms of payroll. In more recent years, the correlation between spending and success seems less assured - perhaps it's partly a result of the analytics revolution, as nicely portrayed in Moneyball. It is still true that any team in the bottom third of the payroll scale has little chance to making the playoffs; however, once the smaller-payroll team makes the playoffs, it appears that they do well, as in three of the last four seasons, a small-payroll team has made the finals.

Note that I grayed out the four cells at the bottom left. There were only 28 teams before 1997. I also removed the names of the teams that didn't make the playoffs, which serves no purpose in a chart like this.


That's the descriptive statistics. It's really hard to draw robust conclusions from such data. You can say it's harder for small-payroll teams to have consistently great performance in the regular season but easier in a short playoff series - so in a sense, we are looking at luck, not skill.

But could it be that those small-payroll teams, given that they made the playoffs, must have some usual success in that season, perhaps because they discovered some young talent that cost next to nothing, and so the fact that they made the playoffs despite the smaller payroll is a good predictor that they would do well in the playoff?

The other important issue to realize is that by plotting the rank of payroll, rather than true payroll, the scale of payroll differences has been taken out of the picture. The team listed at the median rank most likely spent much less than half of the team listed at the top of the table. If you grab the actual payroll amounts, there is much more you can do to display this data.


A good question deserves good data

The last chart in the infographics on OECD education data asks another intriguing question: do countries that pay teachers more achieve better test scores?


This chart suffers from the same ill as the one previously discussed (here): the data is not suitable to address the question. It is mighty hard to see any pattern in the set of bar charts on offer. This lack of correlation can be confirmed by displaying the data in a scatter plot:

The scatter on the left presents the data as shown in the original, with a regression line drawn in that appears to indicate a positive correlation of higher spending and higher achievement.

Here, spending is measured by the ratio of primary teacher pay after 15 years of service to average GDP while achievement is indicated by the proportion of students who attain a "top" level of proficiency in any or all of the three test subjects.

But notice the solitary point sitting on the top right corner (labelled "1"). That point is Korea, which has both the highest achievement and the highest spending (by far). Korea is an outlier (known as a leverage point). The chart on the right is the same as the one on the left with Korea removed. What appears to be a moderate positive correlation vanishes. (The numbers plotted are the ranking of countries by the proportion of students attaining top proficiency, the metric on the vertical axis.)

So, either the message is that achievement and spending are uncorrelated (for every country except Korea), or that we have a measurement problem. I think the latter is more likely, and would defer to psychometricians to say what are acceptable measures for spending and for achievement. Do primary teachers with 15 years or more of service represent "education spending"? Do top students adequately capture general achievement in the education system?


Soshable_payperf_closeup The original chart contains a serious misinterpretation of the data (source: Education at a Glance 2009, OECD). It falsely assumes that the proportion of students attaining top proficiency in each subject is additive. In fact, because the same student could be top in one or more subjects, the base of such a sum would not be 100%.

In my version, the metric used is the proportion of students who attain top proficiency in 1, 2 or all 3 subjects. This metric is computed off a 100% base.

I also removed the breakdown by gender. This creates clutter, and I can't find any interest in the male or female data.


See also our first post on this infographics.


Eye heart this

Dan at Eye Heart New York has a fantastic post relating to the recent release of restaurant health inspection data by New York City. This has caused a furor among the restaurant owners because they are now required to wear their A/B/C badges front and center. Dan collected some data (which he also posted), made some charts, and reported some interesting insights.

Here is an overview chart that shows the distribution of scores (the higher the score, the lower the grade). He called it a "scatter plot" but it is really a histogram where the bucket size is 1 except for the rightmost bucket.


I like the use of green, yellow and red colors to indicate (without words) the conversion scale from scores (violation points) to grades (A/B/C). The legend "Count" is an Excel monstrosity. I'd have used a bucket size of at least 5, which would smooth out the gyrations in the green zone.

A more typical way to summarize numeric data in groups is Tukey's boxplot, as shown below.


I use Dan's raw data on this chart. 1 = A, 2 = B, 3 = C. What is group 4?

It turns out Dan has removed this group from all of his analysis. A little research shows that group 4 are restaurants that have been closed by the Dept of Health. Interestingly, the scores of these restaurants are spread widely so the DOH appears to be closing restaurants not just for health violations. (In the rest of this post, I have removed group 4.)

For those not familiar with box plots, the box contains the middle 50% of the data (in this case, the scores of the middle half of the restaurants in the respective group); the line inside the box is the median score; the dots above (or below, though nonexistent here) the vertical lines are outliers. As Dan pointed out, group C has lots of outliers on the high end of the score.

Score111Just for fun, I pulled the violations of the highest scoring restaurant (111 violation points). What I find intriguing is the huge fluctuation in scores over the last 5 inspections. Does this happen to other restaurants too? What does that say about the grading system?



Next, Dan then attempted to address the questions: did scores vary across the 5 boroughs? and did scores vary across cuisine groups? This is the concept covered in Chapter 1 of my book: always look at the variation around averages, that's where the most interesting stuff is.

He calculated the means and standard deviations of different subgroups. It is simpler to visualize the data, again using boxplots.

Here's one dealing with boroughs, and it is clear that there is not much to pick between them. You could possibly say Staten Island is better than the other 4 boroughs.


Here's one dealing with cuisine groups, using Dan's definitions.


The order of the cuisine groups is by median score from lowest on the left to highest on the right. Again, there is no drastic difference. It is certainly not the case that Asian/Latin American restaurants are worse than say European or American ones.

About half of the restaurants under desserts, drinks, misc., african, and others received As while a bit less than half of the other cuisine groups got As. Some of the cuisine groups had few egregious violators (African, Middle East) - but this data is perhaps skewed by the removal of the "closed" restaurants.

One shortcoming of the traditional boxplot is the omission of how large each group is. For groups that are too small, it is difficult to draw any statistical conclusions. We know from Dan's table, for instance, that there were only 17 restaurants classified as "African".

(Unfortunately, Excel does not have built-in capability for generating boxplots.)

Reader's indigestion

Kidsdisc1 Reader Chris B. pointed us to this unfortunate chart, based on a one-question on-line poll conducted by Reader's Digest. 

The data is highly structured: for each country, respondents, identified as male or female, are asked about their favorite methods to discipline their kids. (At first, I thought the "male" and "female" meant what methods they would apply to sons versus daughters but based on the summary paragraph, I now feel they refer to the genders of the respondents.)

The textual summary is extremely well-written, and successfully points to the most salient information (my italics and bolding):

Spare the rod, period. That's what parents across the globe told us when we asked how they discipline their children. Respondents in all 16 countries in this month's global survey picked a good talking-to as the best tactic for teaching a lesson, by a wide margin. Taking away a privilege placed second. Two other traditional forms of discipline-sending kids to their rooms and spanking-were the least favored choices in all but two countries. Among respondents who did favor physical punishment, men outnumbered women in every country except Canada, France, and India. Not a single woman in the United States expressed a preference for spanking.


Unfortunately, the graphical summary is a complete failure.

One feature plotting against the designer is that the general profiles of the responses are very similar between countries, and so the differences are well hidden inside this small-multiples display.

It also takes on an elongated form, making it almost impossible to compare the top two countries with the bottom two countries.

When data has such strong structure, it is a blessing to the chart designer. In the first chart, I made a set of profile charts, in small multiples. On average, parents everywhere act very similarly. There are some subtle differences: one common pattern, occurring in the Philippines, Malaysia, India, France, Brazil, etc., is the preference for a talking-to over all other methods; another pattern, applying to Netherlands, Spain, Australia, Canada, etc. is a talking-to, followed by taking away privileges with sparing use of the other two methods.


In some countries, like Australia, Brazil, Canada, Spain, Italy, etc., the gender of respondents mattered little but in the United States for instance, female respondents are more likely to prefer a talking-to while men liked using sticks. 

Is it really the case that parents punish sons and daughters using the same methods? This poll seems to think so.


If we want to expose the minute differences at the level of country-gender, then something like this would do:


The purpose is to surface any outliers. I really can't say there are any here. The supposed reversion of responses by gender in India, France, and Canada is hardly worth noting since the physical punishment category is hardly used. (Reflection of reality, or response bias due to sensitive subject?)

Notice that these new charts do not have the data printed on them - the graphical elements are sufficient to show what the data is; readers are not auditors.