At first, this looks like a decent chart despite the donut construct, which I cannot stand (but the Economist loves).
The accompanying text proclaimed: "Rock stars are famous for excess, and some pay the price". The rest of the paragraph points out drug- and alcohol-related deaths, plus deaths due to "unhealthy lifestyles", which apparently include cancer and cardiovascular disease.
There is a gaping hole between what's on the chart and what's in the text. They just talk past each other.
The chart invites us to compare the European experience to the American experience. Each donut presents the proportion of total deaths by causes of death. The top donut presents American rock-star deaths, the bottom European ones. But this comparison has zilch to do with
the key point, which is how rock stars are different from the rest of
us. The chart tells us nothing about the rest of us. The 20% death by
cancer would be entirely unremarkable if 20% of non-rock-star deaths
also were attributed to cancer!
We must also bear in mind that the base populations are
rock stars who died young. This is a very specific demographic
segment, and so the only valid point of reference are people who died
young. If we think along those lines, then among unmusical people, if
they died young, what might have been the causes of death? Drugs?
Alcohol? Accidents? Suicide? You bet. I am not sure who is the
authoritative source of such data but the CDC reported that among
Americans aged 15-34 who died, the leading causes were "unintentional
injury", suicides, homicides, cancer and heart disease. Not much different from the above list...
The deaths depicted in the two donuts totaled fewer than 100, and yet percentages are given to one decimal place. This creates a false sense of precision not justified by the sample size.
The deaths occurred over about 50 years. It is very likely that the causes of premature death have shifted during this time span, making an aggregate analysis questionable.
Charting is much more than just aesthetics. Some basic statistical common sense goes a long way. This was observed long ago by Huff.
David Leonhardt wrote in the NYT of a shocking incident of statistical abuse committed by Lou Dobbs and the CNN crew.
On several recent occasions, while commenting on the red-hot immigration issue, Lou and company remarked that "there had been 7,000 cases of leprosy in this country over the previous three years, far more than in the past". (Leprosy is a flesh-eating disease prevalent among immigrants, particularly of Asian or Latin American origin.)
When asked about fact-checking, Lou reportedly said: "If we reported it, it's a fact." A quick visit to the government's leprosy program web-site immediately reveals the time-series chart, shown on the left. With annual rates at about 150 in the last 5 years or so, one is hard impressed to find the 7,000 alleged cases!
Furthermore, because this chart lacks comparability, we fail to see that 150 cases out of a population of 300 million represent a minuscule risk.
A slight downward trend is evident in the last 20 years or so; this record is even more impressive when we realize the population grew during this period. These points can be made clearer in multivariate plots.
This graphic appeared on the front page of the British paper, the Independent. I find it to be effective, although defiantly not efficient a la Tufte: the data-to-ink ratio is abysmal. Two data points on the entire page, with both data labels drawn in extra large font!
It can be improved if the 24 guys are given a different color so we can see the amount of improvement between 1971 and "NOW".
Some may complain that the use of percentages obscured population growth during this period. Perhaps there should be fewer men on the left than on the right. Unfortunately, that would in turn obscure the comparison of percentages.
A bit of research into the data (at Cancer Research UK) reveals that the average survival rate hides a very wide range of rates (by type of cancer, by gender, by gender and type, etc.). One might argue that the average is quite meaningless for most users.
An alternative construct is a time series chart showing the increase in survival rate over time. It would plot more data and depict a trend (or lack thereof). I'd have to agree with the editor that such a chart would look unattractive on the newstand.
I've been reading my friend's anti-smoking tome, and traced this "infographic" back to its source (World Health Organization).
I was very intrigued by the "lines of death" which seemed to make the point that the risk of death had a spatial correlation: specifically, that the death risk for male smokers was higher in northern hemisphere (above the line), primarily developed countries, as compared to the southern hemisphere, mostly developing nations.
I find that somewhat counter-intuitive but in a fascinating book like this, that brings together both scientific, psychological and societal commentary, I was expecting to learn new things.
Looking at the legend, the red areas were regions in which deaths from tobacco use accounted for over 25% of "total deaths among men and women over 35". This explained some, as perhaps there were more reasons to die (warfare, other diseases, mine accidents, etc.) in developing nations than in developed nations, or that they had larger populations (so more deaths even at lower rates).
However, the description of the "lines of death" raised my eyebrows. It is now claimed that more than 25% of middle-aged people (35-69 years old) die from tobacco use in the red regions.
Did they mean 25% of the dead middle-aged people die from smoking? Or 25% of all middle-aged folks die from smoking? A gigantic difference!
Percentages are very tricky things to use. Every time I see a percentage, the first thing I ask is what is the base population. Here, the baseline appeared to have gotten lost in translation.
This set of maps also shows the peril of focusing too much on entertainment value, and losing the plot.
For those concerned about the effect of smoking on our society and our children, I highly recommend Dr. Rabinoff's highly readable new book, "Ending the tobacco holocaust". It contains lots of interesting tidbits and really brings together every cogent argument that exists, including the common ones you've heard and others you haven't.
Behind the smokescreen lies the informative conclusion: among households with smokers, about 40% smoke in residence all the time while about half never smoke in residence.
This graphic, unfortunately chosen, contains many distractions from the main message, including:
the liberal sprinkling of colors
the inclusion of data for 1, 2, 3, 4, 5, 6 days, almost all of which were effectively zero
the redundant vertical scale, as all the data already appeared on the chart itself
the comparison of smokers to "total sample" (rather than non-smokers)
The last point merits special attention. The total sample contains households with smokers as well as households without smokers. Any data from the total sample is a weighted average of these two types of households. It is better to directly compare the two household types than to indirectly compare one type to the overall.
Further, households without smokers should be extremely likely to have no smoking in residence all week. And if most households have no smokers (76% of this sample), then the statistics of the total sample will mimic those of no-smoker households. That is to say, the total sample statistics do not add much to the analysis. Our junkart version below corrects for this as well as other things.
One of the key functions of a graph is data reduction, i.e. to aggregate data in such a way as to expose the information contained within. Typically, a graph that uses aggregated data is clearer and stronger than one that plots every piece of data. In this example, by combining 1-6 days into a single category ("smokes in residence part of the week"), we have a graph that is much more readable.
I want to thank Dr. Mike Rabinoff for inspiring me to look up these second-hand smoking statistics. Mike recently published a book called "Ending the Tobacco Holocaust", which tells you more than you want to know about the tobacco industry.
The recent coverage of obesity in the US media produced at least two very good data maps. The New York Times printed this snapshot of the nation in 2004.
Because of a judiciously chosen color scheme, we can easily discern the pattern of obesity: more severe between the Lakes and the Gulf; least in the West and Northeast, especially in Colorado; quite bad in the middle and the South.
The legend is deserving of much praise: in defiance of popular but simplistic usage, the range was not divided into four equal parts (quartiles); rather, the designer selected four unequal parts so as to reveal the geographical pattern on the map. Besides, the complete range of the data was shown as is, where most would have artificially widened the range to 15.0% on one end and 30.0% on the other.
All in all, this is a simple graphic conveying a clear message. Well done.
And yet -- the dynamic aspect of obesity growth alarms even more:
Within many states, more and more people are becoming obese
Nationally, more and more states have high obesity rates
These trends, along with others, are perfectly captured by the following terrific, dynamic data map, thanks to CDC. It is a wonderful example of how the electronic medium (animated gif) can do wonders for the graphics designer. [You may need to click on the map to see the animation in a pop-up window.]
The time dimension is experienced rather than drawn on paper/screen. This experience is in fact distorted, compressed time is what we feel, but the distortion improves rather than deter our ability to see trends.
The states between the Lakes and the Gulf led the nation throughout this period.
The ever expanding legend ingeniously draws attention to the fact that the worst states have gotten worse over time.
No single state has been spared: by 2001, only Colorado had an obesity rate below 15%; just 7 years earlier, in 1994, the entire Western half of the U.S. had obesity rates below 15%.
One small gripe: if read quickly, the reader can be forgiven for thinking that "white" indicates 0% obesity. Not so! "White" actually means "no data". I'd prefer to use a neutral color for "no data"; when they started tracking, these states turned out to be no less obese than others. By 1994, every state has started tracking obesity.
This dynamic map is really rich in information. Feel free to leave comments about what else strikes you about it.
Junk artists have no shortage of raw materials; those living in NYC will understand what I mean. So today I saw this graphic in the Economist. It supports an article that describes the "progress and problems" with WHO's 3x5 campaign to fight AIDS in poorer countries.
In my junkchart version, I have switched the emphasis from the absolute number of people in need to the percentage of those who have (or have not) received ARV therapy.
I dislike the common practice of the cut-off (see the sub-Sahara bar): our brain just isn't capable of extrapolating and understanding how far the bar would have stretched off the page.
The grid-lines are avoided by providing data labels.
As usual, the sub-title gives the main point of the graphic, and anything minor (namely, the date) is put to the periphery.
Reference: "Moving Targets", Economist, June 30 2005