When design goes awry

One can't accuse the following chart of lacking design. Strong is the evidence of departing from convention but the design decisions appear wayward. (The original link on Money here)

Mc_cellphones_money17

 

The donut chart (right) has nine sections. Eight of the sections (excepting A) have clearly all been bent out of shape. It turns out that section A does not have the right size either. The middle gray circle is not really in the middle, as seen below.

Redo_mc_cellphone

The bar charts (left) suffer from two ills. Firstly, the full width of the chart is at the 50 percent mark, so readers are forced to read the data labels to understand the data. Secondly, only the top two categories are shown, thus the size of the whole is lost. A stacked bar chart would serve better here.

Here is a bardot chart; the "dot" part of it makes it easier to see a Top 2 box analysis.

Redo_jc_mc_cellphone_2

I explain the bardot chart here.

 

 PS. Here is Jamie's version (from the comment below):

Jamie_mc_cellphone

 

 


Layered donuts have excess fats and oils

Via Twitter, Nicholas S. sent this chart:

Usda_donutchart

It's a layered donut. There isn't much context here except that the chart comes from USDA. Judging from the design, I surmise that the key message is the change in proportion by food groups between 1970 and 2014. I am assuming that these food groups are exhaustive so that it makes sense to put them in a donut chart, with all pieces adding up to 100%.

The following small-multiples line chart conveys most of the information:

Redo_usdadonutchart_jc

The story is the big jump in "Added fats and oils".  In the layered donut, the designer highlighted this by a moire effect, something to be avoided.

Note the parenthetical 2010 next to the Added fats and oils label. The data for all other food groups come from 2014 but the number for the most important category is four years older. The chart would be more compelling if they used 2010 data for everything.

One piece of information is ostensibly absent in the line chart version - the growth in the size of the pie. The total of the data increased about 20% from 1970 to 2014. In theory, the layered donut can convey this growth by the perimeters of the circles. But it doesn't appear that the designer saw this as an important insight since the total area of the outer donut is clearly more than 20% of the area of the inner donut.

 


An unsuccessful adaptation of a classic

Found this chart in Hemispheres magazine on board a United flight:

United_sfemploy_sm

A quick self-sufficiency test reveals the biggest shortcoming of this visual presentation.

United_sfemployment_sufficiency

What would you guess is the difference in areas between the two white-ish sectors (pointing at 9 o'clock and 2 o'clock)? The actual numbers are 18.3% and 12.5%. So roughly, if one takes the 2-o'clock sector (right), halve it and add it back to itself, one should obtain the area of the 9-o'clock sector (left). Clearly, the piece on the left is much too big.

The following chart shows the index of exaggeration increasing with the value of the data. (For example, the highest value of 18.3% is about 9 times the lowest value of 2.3% but the the ratio of the areas depicted is ~500 times.)

United_employment_exag

The distortion is larger than usual because the designer encodes the data twice, once in the angle of the sector, and again in the radius. Both those quantities contribute to the area of a circle.

Readers must look at the data in order to read this chart properly, therefore the visual elements are not self-sufficient. Further, if readers chose to perceive the relative sizes of the sectors, they would have misread the data massively.

***

The designer was probably inspired by the Nightingale rose diagram (link to Wikipedia):

800px-Nightingale-mortality

In the original, Nightingale does not encode data into the angles. The circle is divided evenly into 12 pieces to display the 12 months of the year (She might have taken into account 28-31 days; it's hard to tell by inspection). The data is encoded once along the radial axes.

Another difference between the two charts is the ordering of the data. In Nightingale's version, the order is logically determined by the passing of time. In the Hemispheres chart, the order is chosen based on taste. A more natural order would be by the proportion of employment but I think the resulting chart would look like a snail's shell, or worse. I must say a more balanced "rose diagram" looks nicer but it forces my eyes to jump around to answer a simple question such as which are the top three employment sectors in San Francisco.


Two charts that fail self-sufficiency

My twitter followers have been sending in several howlers.

Twitter (link) made a bunch of bold claims about its own influence by using the number of tweets about the Oscars as fodder. They also adopt the euphenism common to the digital marketing universe, the so-called "view", which credit to them, they define as "how many times tweets are displayed to users". Yes, you read that right, displaying is the same as viewing in this world - and Twitter is just a follower not a trend setter here.

For @dtellom, it is this bubble chart about the Ellen tweet that displeased him:

Twitter_ellenimpressions_0

 

In the meantime, @wilte found this unfortunate donut chart, created by PWC in the Netherlands.

PWCG_donut

Both designers basically used appropriated a graphical form and deprived it of data. In one, the designer threw the concept of scale to the wind. In the other, the designer dumped the law of total probability. In either case, the fundamental rationale for the particular graphical form is sacrificed.

Both are examples that fail our self-sufficiency test. This test says if a visual display cannot be understood unless the entire data set is printed on the chart, then why create a visual display? In both charts, if you block out the numbers, you are left with nothing!

***

The PWC chart was submitted by @graphomate, who also submitted the following KPMG chart:

KPMG_donut

The complaint was the total adding up to 101%. I'm not really bothered by this as it is a rounding issue. That said, I like to "hide" such rounding issues. I have never understood why it is necessary to display the imperfection. Flip a coin and remove the decimals from one of the categories!


The exception to the rule against dual axes

Dual axes are almost always a bad idea. But there is one situation under which I'd use it.

***

Last week, Alberto Cairo (link) engaged in a Twitter/blogging debate about a chart that first appeared in Reuters concerning the state of the woman CEO in the Fortune 500 companies. Here is the chart under discussion:

Original_women_ceo_left

This chart already is cleaner and more useful than the original original, which came from a research report from Catalyst (link):

Catalyst_us_ceos

Jonathan Keller re-made the Reuters chart as follows:

Keller_women_ceo_left

 

Cairo Jorge Camões contributed this version:

  Cairo_women_ceo_left

The Voila blog (link) has yet another take:

Voila_women_ceo_left

Then Chris Moore, responding to Cairo, created this view and also left some insightful comments:

Women_ceo_cmoore

***

What's at stake here? There are really three related topics of discussion.

First, there is the matter of the upper limit of the vertical axis. Three solutions were suggested: 100 percent, 50 percent, and 4 percent. (Cairo at one point suggested 25 percent, which can be wrapped into the 50 percent bucket.) In reality, this is an argument over which of two key messages should be emphasized. The first message is that women still comprises a pathetically small proportion of Fortune 500 CEOs. The second message is more hopeful, that the growth in this proportion has been quite rapid since 1995.

All versions of the chart actually display both messages. In the Reuters chart (as well as Moore and Cairo), the message about the absolute proportion of women is given as an annotation while the Keller and Voila versions extend the vertical axis, thus encoding this message directly to the chart. Conversely, the Keller and Voila versions deemphasize the growth in proportions, and so I'd have preferred to see a note about that growth when using their versions.

Voila selectes a 50% upper limit because the 50/50 split has an intuitive meaning in the context of gender balance. Because the resulting chart is so visually arresting, and so biased to one of the two key messages, I'd only consider it if the point of the display is to draw attention to the female deficit.

***

The second disagreement is in using absolute counts versus relative proportions. Moore chose absolute counts. I am in this camp as well. This is primarily because we are talking about Fortune 500 and the 500 number is an idee fixe. In Moore's version, I find the data labels distracting since all the numbers are small and insignificant.

Finally, the linkage between the absolute and the relative numbers also produces multiple solutions. Cairo's post pinpoints this issue. His solution is to include an inset pie chart with an arrow to explicitly link the two views. Moore likes the inset idea, but experimented with a donut chart or a partition in place of the pie chart. He also removes the explicit guiding arrow.

***

It turns out this dataset is perfectly made for the dual axes. The absolute counts and relative proportions are in one to one correspondence because it's really only one data series expressed twice. This happy situation leads to one line that can be cross-referenced on two axes, one side showing counts and the other side showing proportions. This is shown in my version below (the orange line).

Redo_women_ceo

In addition to having two axes, I have plotted two related data series. The second series (in red) shows the incremental change in the number of women CEOs from the previous year (also shown in both counts and proportions).

The first series (the same one everyone plotted) draws attention to the first message, that the growth rate of women CEOs is quite strong since 1995. The second series is a bit of a downer on that message, suggesting that from the absolute count perspective, the progress (only one or two additions per year) has been painfully slow, and not that impressive.

Thanks again to Alberto for making me aware of this discussion. This has been fun!

 

PS. I have left out the other chart and may return to it in a future post.


What's in a cronut? Let me find out

Analyticsseo_gaReader Ross S. did not join the line for this cronut, illustrating the popularity of different makers of tracking software on 1.3 million websites.

Original by Analytics SEO is here.

***

The biggest beef I have with this cronut is the quality of the data. As I read their description of the underlying data, I see several red flags.

The analysis is hobbled by ignoring the competitive landscape in tracking software. Google Analytics carves out a huge share of the market by virtue of offering a richly featured product for free. (They justify this by establishing a gigantic spying operation on unsuspecting users.) However, industry insiders know that Omniture (owned by Adobe) is the heavyweight enterprise solution, with a complete feature set.

In other words, most of the 670,000 "customers" of Google Analytics are tiny websites; in addition, a lot of large websites also maintain Google Analytics in addition to Omniture since the former is free. It would be great if the researcher gives us one of two alternative views of market share: the share of revenues in the tracking software market; and the share of e-commerce revenues represented by the customers of each tracking software vendor. These two views give a fuller picture of the competitive landscape.

You'll notice this is the same game Google is playing in the mobile universe. Android has the most users but Apple makes the bulk of revenues.

***

The SEO agency says the chart is "based on 1.3 million e-commerce websites in May 2013". Are there really 1.3 million websites out there selling us stuff? How do they define e-commerce? Is NYTimes.com an e-commerce website, for example? Or facebook.com for that matter?

In the summary, they made a pretty startling claim--that "a large number of websites have no tracking software at all". The only problem is readers can't find out what proportion of websites don't track users. The data in the cronut excluded sites without tracking, which is a big problem.

***

Here is the link to the annual Top 500 Retailers report by Internet Retailer magazine. In Sep 2011, they found that 217 out of the top 500 use Omniture, 161 use Google Analytics, and 103 use Coremetrics (now owned by IBM).

Another place to look for corroborating evidence is Google Trends, which measures the popularity of search keywords. The relative order of the major vendors (excluding Google Analytics) does not match well with the data shown by Analytics SEO.

Googletrends_on_tracking

Compared to:

Analyticsseo_gatabletop

Coremetrics is way down in the list compiled by Analytics SEO.


English donuts rival Spanish donuts

On my holiday travel, I found a disguised donut chart in the Delta Sky Magazine (Dec 2010), talking about manufacturing jobs in the U.S. Then, flipping through the Spanish section at the back of the same magazine, I found the translated article, plus a translated chart. To my surprise, they look different:

Delta_skymag_dec12

Surprise No. 1: the sizes of the cog wheels are different. Even though the color is still mapped to year in the same way, somehow one of these authors decided to take liberty with the relative size. The suspect is the Spanish author who decided to make 2009 much larger and Jan to Dec 2012 much smaller.

Surprise No. 2: the use of commas within a number, and the format of dates differ by culture. That explains why the Spanish author removed the commas from the numbers, making it harder for me (English-speaking) to comprehend. Also, the swap from "01/12-09/12" to "Sep. 2012" suggests that Spanish speakers don't like the month/year formatting of dates. It also suggests that the Spanish readers have no trouble inferring that the "Sep. 2012" data point refers to "Jan. 2012 to Sep. 2012".

Surprise No. 3: The Spanish author improved the chart in one way. He grouped the annual data together via overlapping, leaving the 2012 partial-year data point by itself.

***

There are some problems with both charts. The most serious is the failure to project the 2012 jobs number. The chart seems to indicate that 2012 is a lackluster year, at best level with the previous years but in fact, the number of jobs in three quarters has already exceeded the full-year count of 2011, 2010 and 2009. Unless the fourth quarter is a particularly bad quarter for manufacturing jobs, it would seem that the message should be that 2012 is a great year of recovery. You can't tell from these charts: in particular, the Spanish author decided to shrink the 2012 cog wheel into insignificance.

The issue here is providing context for comparison. Even if the projected 2012 full-year number is provided, that may not be enough to judge whether manufacturing is healthy. Other useful context can be the growth rate of manufacturing versus other sectors of the economy; and the growth rate of jobs in relation to the population/work force growth rate.

As usual, a simple line chart displays the time-series data more clearly. (I simply linearly extrapolated the 2012 full-year number, which is probably an over-estimate. In practice, you can look up the data and figure out the ratio of Jan-Sept jobs to full-year jobs on average and inflate the number that way.)

Redo_deltaskymag

 

 

 


Gelman joins in the fun

The great Andrew Gelman did a Junk Charts style post today, and very well indeed.

The offending Economist plot is the donut chart, which is a favorite of that magazine.  I commented on this type of chart before.

Econ_timespent

Andrew created two alternatives, one is a line chart (profile chart) which is often a better option (despite the data being categorical), the other is more creative, and the better of the two.

Redo_timespent1

 

Redo_timespent2

Some of Gelman's readers complained that he arbitrarily "standardized" the data by indexing against the average of the countries depicted; one can further grumble that a 50% "excess" may sound impressive but it would be equivalent to less than an hour, perhaps not as startling. These types of complaints are fair but do realize that blog posts like these are primarily concerned with how data is best visualized. If one prefers a different indexing method, or a different set of countries, or a different color for the lines, etc., one can easily revise the chart to reflect those preferences.

The easiest way to see why the third chart is better than the first is that the strongest message coming off the first chart is that there are no material differences between these six countries in terms of time usage but in the third chart, the designer (here, it's Gelman) is asserting that there are interesting differences.


Have data graphics progressed in the last century?

Received a wonderful link via reader Lonnie P. to this website that presents a historical reconstruction of W.E.B. DuBois's exhibit of the "American negro" at the 1900 Paris Expo. Amusingly, DuBois presented a large series of data graphics to educate the world on the state (plight) of blacks in America over a century ago.

You can really spend a whole afternoon examining these charts (and more); too bad the charts have poor resolution and it is often hard to make out the details.

***

Judging from this evidence, we must face up to the fact that data graphics have made little progress during these eleven decades. Ideas, good or bad, get reinvented. Disappointingly, we haven't learned from the worst ones.

Exhibit A 

  Dubois_a

(see discussion here)

Exhibit B

Dubois_b

 (see discussion here)

Exhibit C 

  Dubois_c

(See discussion here.)

Exhibit D

Dubois_dd
 (see the Vampire chart here)

Exhibit E

Dubois_e
(see the discussion here.)

Exhibit F

Dubois_f
(see discussion here.)