« June 2017 | Main | August 2017 »

Another simple Excel chart needs help

Twitter friend Jimmy A. asked if I can help Elon Musk make this chart "more readable".


Let's start with a couple of things he did right. Placing SpaceX, his firm's data, at the bottom of the chart is perfect, as the bottom part of a stacked column chart is the only part that is immediately readable. Combining all of Europe into one category and Other U.S. into one group reduce the number of necessary colors.

Why is this chart unreadable? Here is a line-up of the culprits:

  • Red Russia is stealing the thunder
  • SpaceX is sharing the blues with Japan/China/Other U.S.
  • The legend is sorted in the opposite way as the column segments (courtesy of Excel defaults)
  • Axis labels given to two decimal places for market share split only a small number of ways
  • It's unclear what "market share" means: is it share of the number of launches or the revenues generated by those launches? Is the "base" of the market share changing over time?
  • The last two columns are speculative and these are the two years in which SpaceX has a noticeable advantage (unless they are talking about contracts already concluded)

 According to the underlying data, there are some very big changes at foot. The following small-multiples chart shows what is going on:




This one takes time to make, takes even more time to read

Reader Matt F. contributed this confusing chart from Wired, accompanying an article about Netflix viewing behavior. 


Matt doesn't like this chart. He thinks the main insight - most viewers drop out after the first episode - is too obvious. And there are more reasons why the chart doesn't work.

This is an example of a high-effort, low-reward chart. See my return-on-effort matrix for more on this subject.

The high effort is due to several design choices.

The most attention-grabbing part of the chart is the blue, yellow and green bars. The blue and yellow together form a unity, while the green color refers to something else entirely. The shows in blue are classified as "savored," meaning that "viewers" on average took in less than two hours per day "to complete the season." The shows in yellow are just the opposite and labeled "devoured." The distinction between savored and devoured shows appears to be a central thesis of the article.

The green cell measures something else unrelated to the average viewer's speed of consumption. It denotes a single episode, the "watershed" after which "at least 70 percent of viewers will finish the season." The watershed episode exists for all shows, the only variability is which episode. The variability is small because all shows experience a big drop-off in audience after episode 1, the slope of the audience curve is decreasing with further episodes, and these shows have a small number of episodes (6 to 13). In the shows depicted, with a single exception of BoJack Horseman, the watershed occurs in episode 2, 3, or 4. 

Wired_netflix_inset1Beyond the colors, readers will consider the lengths of the bars. The labels are typically found on the horizontal axis but here, they are found facing the wrong way on pink columns on the right edge of the chart. These labels are oriented in a way that makes readers think they represent column heights.

The columns look like they are all roughly the same height but on close inspection, they are not! Their heights are not given on top of the columns but on the side of the vertical axis.

The bar lengths show the total number of minutes of season 1 of each of these shows. This measure is a peripheral piece of information that adds little to the chart.

The vertical axis indicates the proportion of viewers who watched all episodes within one week of viewing. This segmentation of viewers is related to the segmentation of the shows (blue/yellow) as they are both driven by the speed of consumption. 

Not surprisingly, the higher the elevation of the bar, the more likely it is yellow. Higher bar means more people are binge-watching, which should imply the show is more likely classified as "devoured". Despite the correlation, these two ways of measuring the speed of consumption is not consistent. The average show on the chart has about 7 hours of content. If consumed within one week, it requires only one hour of viewing per day... so the average show would be classified as "savored" even though the average viewer can be labeled a binge-watcher who finishes in one week.


[After taking a breath of air] We may have found the interesting part of this chart - the show Orange is the New Black is considered a "devoured" show and yet only half the viewers finish all episodes within one week, a much lower proportion than most of the other shows. Given the total viewing hours of about 12, if the viewer watches two hours per day, it should take 6 days to finish the series, within the one-week cutoff. So this means that the viewers may be watching more than one episode at a time, but taking breaks between viewing sessions. 

The following chart brings out the exceptional status of this show:


PS. Above image was replaced on 7/19/2017 based on feedback from the commenters. Labels and legend added.

The art of arranging bars

Twitter friend Janie H. asked how I would visualize a hypothetical third column of this chart that contains the change from 2016 to 2017:


This table records the results from a survey question by eMarketer, asking respondents ("marketers") to identify their top 5 technology priorities in the next 12 months.

I suggested the following:


A hype-chasing phenomemon is clearly at play. Internet of Things and wearable technology are so last year. This year, it's all about A.I. Interestingly, something like "Big data" has been able to sustain the hype for another year.

A design decision I made is to encode the magnitude of the change in the bar lengths while encoding the direction of the change in the colors. One can of course follow the more canonical design of placing the negative bars on the left side of the data labels. My decision is a subtle way of imposing the hierarchy - first I care about magnitude, then I care about direction.

Here is a third way:


This design imposes a different hierarchy. Your eyes are drawn to the top/bottom of the chart.

Any of these designs beat the data table by a mile. It's just too much work for the reader to figure out the value of the changes from the table.

Wheel of fortune without prizes: the negative report about negativity

My friend, Louis V., handed me a report from Harvard's Shorenstein Center, with the promise that I can make a blog post or two from it. And I wasn't disappointed.

This report (link) caught some attention a few months ago because of the click-bait headline that the media is "biased" against Trump in his first 100 days. They used the most naive definition of "bias". The metric is the amount of coverage that is "negative," with the unspoken standard that the media should be 50% negative.

In the court of law, it is already established that, for example, a loan company cannot be sued for racial bias simply because the rejection rate of loans for black Americans is higher than that of white Americans. Similarly, a university is not necessarily biased against black applicants even if the proportion of black students is found to be below the national proportion of black Americans (in a statistically significant way).

The appropriate amount of negative coverage is a function of the content of the President's actions, which is a hard standard to nail down, much harder than the two examples shown above - which just says that such an analysis is futile from start to finish. Let alone the irony of generating a negative media headline criticizing the negative tone of media coverage.


Now let's turn to their use of visuals. The following pair of pie charts is used to show the differences in coverage between U.S. and European media.


These pie charts are inspired by Wheel of Fortune, without the prizes. 


Notice how our attention is caught by certain colors (red, orange, etc.) and the size of the slices. The largest red slice is labeled "Other Foreign/Defense" in the U.S. pie chart, although it did not merit a mention in the accompanying writeup so it's not clear what that category means.

Instead of ordering the slices by their sizes, the design puts the larger slices as far apart as possible. Further, each color is used twice in a mirrored way, causing us to infer an association between categories that don't exist.


There are lots of conventional ways to display this data better. I decided to experiment with word clouds (using the Wordle tool).

Here is one in which the color indicates whether the coverage is American or European. Each word appears twice, and in proximity to one another for comparison.


One can directly compute a discrepancy metric between the two regions. This next chart shows the difference in importance accorded each topic by American versus European media: