Highway Safety Agency goes rogue

A reader sends me to Adam Obeng, who did the dirty work deconstructing a set of charts by the U.S. National Highway Traffic Safety Administration on his blog. Here's an example of these charts:


Aside from the sneaker chart, they concocted a pop stick, a pencil, a tower of Hanoi, etc. These objects are ones I think should be evaluated as art. Adam gamely tells us that the proportions are totally off, and they are both internally and externally inconsitent.


I'll add two small points to Adam's post.

First, these charts pass my self-sufficiency test, that is to say, they did not print the entire data set (just one number here) on the page. Alas, given the distortion identified by Adam, not printing the data means everyone is free to create their own data. Herein lies the problem: there is an argument for allowing a small degree of distortion in exchange for "beauty" but these charts without any data have gone too far.

Second, see Adam's last point (the footnote). The original data is something quite convoluted: “3 out of 4 kids are not as secure in the car as they should be because their car seats are not being used correctly.” (How would they know this, I wonder.) This is a statistic about kids while the picture shows a statistic about their parents (or drivers).



Mix percent metaphors, add average confusion, and serve

Sometimes, a chart just strains your mind. Such is the case with the following, a tip from Augustine F. (@acfou)


There are just so many percentages on the chart it's really hard to figure out which is which.

Under the title, it hints that they are showing results from a poll. The legend implies that the poll asks for estimates of budget and revenue allocations: one imagines the questions were what proportion of your marketing budget is allocated to digital? and what proportion of your revenues is attributed to digital? On top of the bars are some percentages, presumably percentages of respondents. Perhaps, or perhaps not. The column labels clearly add up to over 100% since there are two columns in the 30-35% range.

Under the axis, we have buckets of percentages. Are they percentages of people, of budgets or of revenues? Why and how are they bucketed?

My best guess is that the survey is a multiple-choice with 11 choices corresponding to the groups of columns. The axis labels refer to both percentage of budget and percentage of revenues, depending on which column you're looking at.

What is maximally confusing is the last set of columns, labeled "Average", with values in the 35% range. It is most likely not a choice in the survey. They somehow came up with an average based on the responses. So maybe I was wrong about the multiple-choice format: if the raw data comes in buckets like 61 to 70%, there is no easy way to average these responses. Maybe they asked for two exact percentages, and then grouped them afterwards.


To sum all that up, the percentages on top of the columns are percentages of respondents, except in the last set of columns, where they are percentages of budget (or revenues). The percentages of budget (or revenues) are sitting on the horizontal axis, except in the last label, called "Average", where it means the average respondent.


There is a problem with my interpretation. It makes the chart completely worthless!

What use is it to learn that "16% of the respondents say they allocate 11-20% of their budget on digital while 12% of the respondents say they derive 11-20% of their budget from digital"?

You might be interested in whether there is a return on investment to the money spent on digital marketing. You'd then need to know for a given company, what proportion of budget was spent on marketing versus what proportion of revenues was attributed to that marketing. In this chart, there is no linkage -- the companies who say they spend 11-20% on digital may or may not be the same set of companies who say they derive 11-20% from digital spend.

If the survey asked for exact percentages, then I'd prefer to see a scatter plot, showing proportion of budget on one axis, and proportion of revenues on the other axis, each dot representing a respondent.


A final note: it is worth asking what types of people answer this survey. Pretty much the only people in a company who can answer this question accurately are the heads of marketing. If you are working for the head of marketing, you likely know the details of a particular segment of marketing but not the aggregate numbers. If you work in a different department, there is little to no chance that you have any useful knowledge about marketing budgets and revenue allocations.

One would also appreciate it if all such pictures include the sample size.

Bracket as target

The MLB found an innovative way to present the play-off matchup and results:


I took this photo at the MLB Fan Cave in Manhattan. This was a marketing gimmick in which a bunch of guys were placed into this "fishbowl" and watched every game in the past baseball season. Excuse the scaffolding that was blocking the view.

I like the metaphor of hitting the bull's eye target, and the smooth progression from outside circles to inside. The design also accommodated the wild-card round well.


By contrast, here is the usual bracket presentation:


(This image came from the MLB Fan Cave website.)

Bloomberg issues a health warning dressed up as a fast-food menu

NYC mayor Michael Bloomberg is getting mixed reviews for his proposal to ban super-sized sugary drinks. Reader John O. wasn't impressed with this graphical effort (link):



The key problem: this picture is not scary at all. The reason it's not horrifying is that there is no context. People who have knowledge about healthy eating habits will get the message but that's preaching to the choir.

If you know that the recommended consumption of daily sugars for adults is roughly 20-36 grams, then you can see that one sugary drink of 12 ounces or higher would take you over the daily limit. A 64-ounce drink would give you more than 7 times what you need in a day. That's a powerful message but you won't know it from this chart. Not from the sugar cubes doubling as shadows, which is a cute, creative concept.

Also, make use of the chart-title real estate! Instead of "Sugar & Calories per Fountain Drink", say something memorable. "Fountain drinks make you fat and sick".


There is something else fishy about this graphic. What are the most prominent data being displayed?

You got it. They're 7, 12, 16, 32, 64. Where have we seen this type of data display?

Yup. This format is lifted from a menu in a Starbucks or a McDonald's (without prices).

Is this a health warning? Or a restaurant menu?


John wrote:

Also slightly confused about the slightly non-linear relationship between calories and drink size.  Maybe volume of ice is held constant...

It is in fact a proportional relationship. The confusion arises from the non-linear increase in cup size from 7 to 64 ounces. The math is roughly 11 calories per ounce, and 3g of sugar per ounce. I wonder if it is better to show those two numbers instead of the ten not-very-memorable numbers shown on the chart itself.


In case you're wondering, the heights (thus areas) of the cups have no relationship with any of the data, not calories, not sugars, and not the cup size.


PS. John also wrote: "The soda cup graph reminds me of the chart from Pravda that Tufte cites in 'Cognitive Style of Powerpoint'. " If you know what he's talking about, please post a link to the chart. Thanks.

Motion-sick, or just sick?

Reader Irene R. was asked by a client to emulate this infographic movie, made by UNIQLO, the Japanese clothing store.

Here is one screen shot of the movie:


This is the first screen of a section; from this moment, the globes dissolve into clusters of photographs representing the survey respondents, which then parade across the screen. Irene complains of motion sickness, and I can see why she feels that way.

Here is another screen shot:


Surprisingly, I don't find this effort completely wasteful. This is because I have read a fair share of bore-them-to-tears compilation of survey research results - you know, those presentations with one multi-colored, stacked or grouped bar chart after another, extending for dozens of pages.

There are some interesting ideas in this movie. They have buttons on the lower left that allow users to look at subgroups. You'll quickly find the limitations of such studies by clicking on one or more of those buttons... the sample sizes shrink drastically.

The use of faces animates the survey, reminding viewers that the statistics represent real people. I wonder how they chose which faces to highlight, and in particular, whether the answers thus highlighted represent the average respondent. There is a danger that viewers will remember individual faces and their answers more than they recall the average statistics.


If the choice is between a thick presentation gathering dust on the CEO's desk and this vertigo of a movie that perhaps might get viewed, which one would you pick?


Reading behind the chart

I could have filed this one under Light Entertainment but it's too good a chart to lay to waste:

(The chart is from Internet Retailer.)

Let's focus on the (mis)match between the question being addressed and the data collected to address it. The intention of the analyst is fully divulged in the title of the accompanying article: "Don't sell Twitter short: those 140-character messages reach an affluent and engaged audience". The chart supports this claim by showing that Twitter users disproportionately represent the types of consumers that marketers most covet, i.e. those with advanced education, and those earning higher incomes.


Up top, the concept "user" is very pliable. Is it someone who has a Twitter feed or is it someone who reads Twitter feeds or is it someone who subscribes to Twitter feeds? Is it someone who is registered or everyone who visits? Is it someone who has visited the site in the last x months? or posted a tweet in the last x months? If a writer, does it include someone with no subscribers? no page views? What about people who simultaneously publishes multiple feeds (like John Cook who writes one of my recommended blogs)?

Now, we don't expect the analyst to describe fully how a "user" is defined but while interpreting this chart, we should ask appropriate questions of the data.

Next, the analyst establishes a reference level called "general population". Is this the right metric? This depends on what the chart is used for. If you are choosing between spending money on Twitter, and say spending the money on national TV advertising, then perhaps this comparison is valid. If you are selecting between Twitter and say Google, then absolutely not. For most readers, I think a more relevant point of comparison would be the general Web user, rather than the general population. This is an important distinction because the general Web user also earns higher incomes and has higher educational attainment than the general population, thus the current set of data exaggerates the "value" of Twitter exposure.

Finally, if you are a marketer looking to spend with Twitter, you are also worried about "reach". Say, 50% of website ABC's users are rich people compared to 10% of your reference population. That sounds like an amazing opportunity. Well, only if website ABC has sufficient number of users! If ABC is a niche website serving only 1% of your reference population, then despite the benefit of targeting, its scale is too small for your need.

Don't take a chart on its surface. Read behind the chart!



Experiments in simplification

Julien D. sent us to this link, where a design agency picks up everyday objects and investigates what happens if the designs were to be simplified. Here is Nutella simplified:


I reckon this should be wonderful inspiration for chart designers.

Take your final design, remove components and interrogate whether you need those components.

(Related posts: self-sufficiency.)

Rightsizing: the graph edition

Reader Alex C. alerted me to this sensible note from Allan Reese, complaining about a piece of marketing by a software vendor (Aptech Systems), shown here:

3dburntime This graph purportedly demonstrates the power of 3-dimensional plots, which presumably is a feature of the software GAUSSplot. As Reese pointed out, it rather unfortunately demonstrated the weaknesses of 3D plots, and it did so amusingly well.

Reese: "Apart from a Daliesque charm as abstract art, I can see nothing professional or commendable in this graph."

Yes, every decision rasps. Reese noted that the chart is named a "3D contour plot" by the vendor and yet we see a paired column chart arranged in an L-shape. The diskettes indicate the average of the 3 column values, which serves to obstruct our view of the underlying data. The legend--which is a palette of colors--plays the redundant role of gridlines. The diskettes take on various colors but the label shows it in orange. The two identical axis labels run in opposite directions, with a third running horizontally atop the color legend. The title announces a comparison of natural and synthetic fabrics, which explains how the eight fabrics were divided into two groups but will be missed easily. We surmise that the chart designer typically reads from top to bottom, right to left given the orientation of the fabric categories.

What most disturbs Reese--and surely anyone who is informed of it--is the optical illusion rendered by the use of three different shades of gray for the three panels. The chart literally creates two duelling images, the cube with rainbow strips crawling all over it, and the corner with two walls with rainbow columns sticking up from the floor. The second image, the intended one, is unstable because it would be hard to create a lighting scheme that would render one wall dark while the other wall is lighted.


Redo_3dburntimeThe following 2D dot plot has no razzle-dazzle but makes the point.

I realized the grouping by synthetic v. natural late and just decided to box the synthetic ones. One can certainly make this a two-panel chart with the synthetics on the left and the naturals on the right.

The use of only 3 samples is highly questionable. This chart shows that the only reasonable conclusion is that Acrylic and Nylon have higher burn times than the rest. With so few samples, it is hard to tell if the remaining fabrics are truly different.


There may be situations where a 3D chart is preferred to a 2D chart but this set of data is certainly not such a situation. The software may in fact produce great 3D charts but this particular chart does not show off the software as the marketers may have hoped. One of the designers most important task is to examine the structure of the data, and "right-size" the chart-- throwing in extra dimensions is often counter-productive.




Weekend reading

Lots of ideas from readers have been gathering dust in my mailbox. Here are a bunch of links, with a few comments of mine.


Jetistics_web_162 This first link I'm not sure what to make of. I think the architects and graphic designers amongst you may be able to make sense of it. Not me. It came with this description: "dr. dr. crash and dr. trash of m-a-u-s-e-r
analyzed worlds most junk magazines and visualized their data." For the intrepid (and I claim no liability):

    "Jetistics: The Analysis of Junk. The Junk of Analysis?"


Freshbooks_econ This is yet another instance of the trend of infographics infiltrating PR releases.

This is yet another example of a map adding little or no value to the data. The presence of geographic data is not an excuse to give a lesson on maps.

It would be one thing if the geographic location helps the readers understand the data but in most such charts, the map merely says "Reader, I presume you are map illiterate, so let me tell you  South Africa is at the southern tip of the African continent..."

Also notice that the bar charts are sorted by average size of invoices, which is definitely less meaningful than total amount invoiced. This, I suspect, is the failure to ask the pertinent question, which is at the top of the Trifecta checkup.



Gmcrops"19 Must-See Biotech Infographics", according to Kelly Davis of the BioBlogging Project.

  #2 on this list is a chart (rather old data) on GM food, an issue of concern to me. In the Trifecta checkup, this addresses an important question, and displays very relevant data but uses a poor chart... too many colors, colors not carrying any meaing, hard-to-read labels.

Of the other links, these are more interesting: #10, #12, #17, #19, #8, #9.