8 Red Flags about the "Useful chartjunk" paper
May 14, 2010
Reader Darryl guessed correctly that I'd be interested in this paper in which the authors assert that chartjunk of the USA Today type is more "useful" than Tuftian "plain" graphics. (Via Information Aesthetics) I applaud the attempt to put Ed Tufte's theories to the statistical test, and I have written about Bill Cleveland's experiments. However, after reading their paper carefully, I must conclude that the design of the statistical experiment contains so many major flaws that it is hard to take their conclusion seriously.
***
Please see the companion post on the book blog for technical comments. This post focuses on conceptual issues.
***
RED FLAG 1
The sample size consisted of 20 students. Flipping open any elementary statistics textbook, you will find the standard advice to ignore experiments with fewer than 30 observations.
RED FLAG 2No mention of how participants were "recruited". (Or for that matter, how experimenters were recruited. See RED FLAG 4)
RED FLAG 3The charts used in the study were mind-numbingly simple. The five examples given in the paper all contained data series with exactly 5 numbers. Many of the charts had little of interest. For example, the chart shown right was titled "Diamonds were a girl's best friend" and showed a rise then fall of diamond prices, huh?
RED FLAG 4
The degree of subjectivity in this experiment is mind-boggling. Instead of a multiple-choice test, a "single experimenter" conducted interviews with participants, asking open-ended questions. These answers were later scored by a "single experimenter". Whether the interviewer and the scorer were the same person was not known. The identity of the experimenter, his/her affiliation, and how he/she was recruited were not mentioned.
RED FLAG 5
The interviewer was allowed, in fact instructed, to "prompt" participants until he/she was satisfied with the final answer. Multiple prompts were allowed. However, only whether any prompting was needed was used in the scoring; the number of prompts used was ignored.
RED FLAG 6The response of participants were scored against a "checklist" but the checklist was not released with the paper. The guidelines for scoring were described in detail but they appeared to leave room for discretion (e.g. 2 points for "providing most of the relevant information" -- what does it mean by "most"?) The transcripts of the interviews were not published and therefore it's hard to understand the effect of (multiple) prompting.
RED FLAG 7
Some of the questions posed to the participants after they viewed the charts were very silly. Q2 (values) was "What are the displayed categories and values?" Is a good chart defined as one that leads readers to retain displayed values? Not in my book.
RED FLAG 8
The participants were asked to inspect a succession of 14 charts "for as long as they needed", and then answer questions. As a result, the effect being measured is hopefully confounded with (1) memory capacity and (2) how much time the participants chose to spend on reading the charts.
***
Just to underline RED FLAG 4 above, I cite the paragraph where the researchers described their subjective "scoring" system: (By "Holmes", they meant the USA Today style chartjunk.)
To a participant looking at the Holmes 'Monstrous Costs' chart, we would ask question Q3: 'What is the basic trend of the chart?' If the participant responded, 'I don't understand', we would elaborate: 'Tell me whether the chart shows any changes and describe these changes.' The participant might answer 'The teeth get bigger every year.' This answer would score 1 point, as it is not a complete answer (with incorrect information about the period of the data reported) but provides at least some information that the bars increase. The experimenter would then provide additional prompts starting with 'Can you be more specific?' A complete answer scoring four points might be 'The chart shows that campaign expenditures by the house increased by about 50 million dollar every two years, starting in 1972 and ending in 1982.'
Two other flags for me:
Their suggestions for the 'middle' ground of chart adornment resulted in one ugly chart and one unreadable chart.
Second, for the prompts they asked things like "Did you remember the chart called Monstrous Costs?" Because of the labeling, the subject would be much more likely to remember that the chart looked like a monster if they had seen that version.
Posted by: Alex Kerin | May 14, 2010 at 11:10 AM
"The sample size consisted of 20 students. Flipping open any elementary statistics textbook, you will find the standard advice to ignore experiments with fewer than 30 observations."
Umm, I guess you haven't flipped open that same elementary statistics book? The number of participants is based entirely on _power_, not an arbitrary number. Many famous studies have have had a dozen or less participants.
You're right in many of your other critisisms, but the main value of this paper is its willingness to ask the question in the first place. Much of Tufte has been assumed as gospel, and this is the first time someone has checked whether his guidelines should always be followed. Note that the paper clearly states the value of chartjunk depends on the authors goal, and that ample additional studies are needed.
Posted by: Ian | May 14, 2010 at 12:48 PM
Hi Kaiser, I'm very happy to read a critique of this paper by you. The thesis of the paper is very appealing to me but I couldn't believe it would be embraced unanimously by the charting community. (see also http://www.excelcharts.com/blog/no-tuftes-charts-are-not-plain-and-simple/)
about your red flag 7 -
Is a good chart defined as one that leads readers to retain displayed values? Not in my book.
this is key.
In some contexts this is the whole purpose of a display. In others, it is completely secondary.
this depends on, say, the intended audience, the subject, and the degree of engagement of the reader.
so I work at OECD, and in charts that appear in specialized publications which are read by researchers, I advise to remain completely Tufte-compliant with a very clean and sober design. People who find these charts were asking for it. They will spend as much time and energy as they want to understand them.
but for charts that appear on leaflets, on the side of our web sites or on posters at our HQs, since they need to conquer the reader's attention, I'm happy if they are a little quirky.
Posted by: twitter.com/jcukier | May 17, 2010 at 09:05 AM
Interesting that the Information Aesthetics people fell for this study. Perhaps because it told them what they wanted to hear?
Posted by: Andrew Gelman | May 17, 2010 at 11:06 AM
Surely there's a bigger picture here. Tuftian design principles are for decision-makers who are already interested in the underlying data and who have to analyse it day in/day out. If we ask the random person in the street whether they remember a chart of random numbers they arent interested in or a stylised picture of a chorus line girl, most people (weeks later) will remember the chorus girl. They then may remember that the "graph" traced the line of her leg and sales went up, then down. Our brains are hard-wired to remember images. However for decision makers this isnt an option... unless we hire a graphic artist every time our diamond sales are updated. Sales are flat, the chorus line girl is doing the splits. Sales are up, she's doing a high kick. Sales are up even more, she's doing a high kick which is a few dgrees higher than last month. If you have one piece of information to impart to an ambivalent audience then junk charts have value. Otherwise, stick to the facts and only the facts. Just my thoughts!
Posted by: Duncan | May 19, 2010 at 12:23 PM
JCukier-
If you had bothered to read the paper you would know that the authors clearly state that whether chartjunk is useful or not depends on the context. It really depends on what you are trying to do, and the authors state this in their paper. That doesn't make their study invalid.
D
Posted by: David | May 23, 2010 at 10:47 PM