Oct 24, 2007

Light entertainment

Christopher P submitted this chart, which is great for our light entertainment series.
Dutchdocs
Apparently it came from the Netherlands and showed how privileged their citizens are compared to the rest of the world.  It would appear that they need to reverse the color scheme (and font size?) to highlight the privileged.  Comments welcome.

Source: AdsoftheWorld.com

Sep 27, 2007

A challenge

The Gelman blog has issued a challenge on how to present the following Venn diagram in a more comprehensible way.  This one is pretty tough.
Gelmanvenn

Antony Unwin sent in this entry:

Unwinvenn_2
Do you have other ideas?







Sep 04, 2007

Read fast, pay the price

At first, this looks like a decent chart despite the donut construct, which I cannot stand (but the Economist loves).

Rockstars

The accompanying text proclaimed: "Rock stars are famous for excess, and some pay the price".  The rest of the paragraph points out drug- and alcohol-related deaths, plus deaths due to "unhealthy lifestyles", which apparently include cancer and cardiovascular disease.

There is a gaping hole between what's on the chart and what's in the text.  They just talk past each other.

  • The chart invites us to compare the European experience to the American experience. Each donut presents the proportion of total deaths by causes of death. The top donut presents American rock-star deaths, the bottom European ones. But this comparison has zilch to do with the key point, which is how rock stars are different from the rest of us.  The chart tells us nothing about the rest of us.  The 20% death by cancer would be entirely unremarkable if 20% of non-rock-star deaths also were attributed to cancer!
  • We must also bear in mind that the base populations are rock stars who died young. This is a very specific demographic segment, and so the only valid point of reference are people who died young.  If we think along those lines, then among unmusical people, if they died young, what might have been the causes of death?  Drugs? Alcohol?  Accidents?  Suicide?  You bet.  I am not sure who is the authoritative source of such data but the CDC reported that among Americans aged 15-34 who died, the leading causes were "unintentional injury", suicides, homicides, cancer and heart disease.  Not much different from the above list...
  • The deaths depicted in the two donuts totaled fewer than 100, and yet percentages are given to one decimal place.  This creates a false sense of precision not justified by the sample size.
  • The deaths occurred over about 50 years.  It is very likely that the causes of premature death have shifted during this time span, making an aggregate analysis questionable.

Charting is much more than just aesthetics.  Some basic statistical common sense goes a long way.  This was observed long ago by Huff.

Source: "Rock stars: live fast, die young", Economist, Sept 4 2007.

May 31, 2007

If we report it, it's a fact

David Leonhardt wrote in the NYT of a shocking incident of statistical abuse committed by Lou Dobbs and the CNN crew.

On several recent occasions, while commenting on the red-hot immigration issue, Lou and company remarked that "there had been 7,000 cases of leprosy in this country over the previous three years, far more than in the past".  (Leprosy is a flesh-eating disease prevalent among immigrants, particularly of Asian or Latin American origin.)

Nyt_leprosyWhen asked about fact-checking, Lou reportedly said: "If we reported it, it's a fact."  A quick visit to the government's leprosy program web-site immediately reveals the time-series chart, shown on the left.  With annual rates at about 150 in the last 5 years or so, one is hard impressed to find the 7,000 alleged cases!

Furthermore, because this chart lacks comparability, we fail to see that 150 cases out of a population of 300 million represent a minuscule risk.

A slight downward trend is evident in the last 20 years or so; this record is even more impressive when we realize the population grew during this period.  These points can be made clearer in multivariate plots.

Source: "Truth, Fiction and Lou Dobbs", New York Times, May 30, 2007; U.S. National Hansen's Disease web-site.

 

May 17, 2007

People picture

Ind_cancersurvival This graphic appeared on the front page of the British paper, the Independent.  I find it to be effective, although defiantly not efficient a la Tufte: the data-to-ink ratio is abysmal.  Two data points on the entire page, with both data labels drawn in extra large font!

It can be improved if the 24 guys are given a different color so we can see the amount of improvement between 1971 and "NOW".

Some may complain that the use of percentages obscured population growth during this period.  Perhaps there should be fewer men on the left than on the right.  Unfortunately, that would in turn obscure the comparison of percentages.

A bit of research into the data (at Cancer Research UK) reveals that the average survival rate hides a very wide range of rates (by type of cancer, by gender, by gender and type, etc.).  One might argue that the average is quite meaningless for most users.

An alternative construct is a time series chart showing the increase in survival rate over time.  It would plot more data and depict a trend (or lack thereof).  I'd have to agree with the editor that such a chart would look unattractive on the newstand.

Source: "Cancer: the good news", The Independent, May 16, 2007


Mar 12, 2007

Lines of death

I've been reading my friend's anti-smoking tome, and traced this "infographic" back to its source (World Health Organization). 

Who_tobacco I was very intrigued by the "lines of death" which seemed to make the point that the risk of death had a spatial correlation: specifically, that the death risk for male smokers was higher in northern hemisphere (above the line), primarily developed countries, as compared to the southern hemisphere, mostly developing nations.

I find that somewhat counter-intuitive but in a fascinating book like this, that brings together both scientific, psychological and societal commentary, I was expecting to learn new things.

Looking at the legend, the red areas were regions in which deaths from tobacco use accounted for over 25% of "total deaths among men and women over 35".  This explained some, as perhaps there were more reasons to die (warfare, other diseases, mine accidents, etc.) in developing nations than in developed nations, or that they had larger populations (so more deaths even at lower rates).

Who_tobacco2 However, the description of the "lines of death" raised my eyebrows.  It is now claimed that more than 25% of middle-aged people (35-69 years old) die from tobacco use in the red regions. 

Did they mean 25% of the dead middle-aged people die from smoking?  Or 25% of all middle-aged folks die from smoking?  A gigantic difference!

Percentages are very tricky things to use.  Every time I see a percentage, the first thing I ask is what is the base population.  Here, the baseline appeared to have gotten lost in translation.

This set of maps also shows the peril of focusing too much on  entertainment value, and losing the plot. 

For those concerned about the effect of smoking on our society and our children, I highly recommend Dr. Rabinoff's highly readable new book, "Ending the tobacco holocaust".  It contains lots of interesting tidbits and really brings together every cogent argument that exists, including the common ones you've heard and others you haven't.

Reference: "Ending the tobacco holocaust" by Michael Rabinoff; The Tobacco Atlas by the World Health Organization

Dec 01, 2006

Smoking-Screening

Smokeathome2

Behind the smokescreen lies the informative conclusion: among households with smokers, about 40% smoke in residence all the time while about half never smoke in residence.

This graphic, unfortunately chosen, contains many distractions from the main message, including:

  • the liberal sprinkling of colors
  • the inclusion of data for 1, 2, 3, 4, 5, 6 days, almost all of which were effectively zero
  • the redundant vertical scale, as all the data already appeared on the chart itself
  • the comparison of smokers to "total sample" (rather than non-smokers)
     

The last point merits special attention.  The total sample contains households with smokers as well as households without smokers. Any data from the total sample is a weighted average of these two types of households.  It is better to directly compare the two household types than to indirectly compare one type to the overall.

Further, households without smokers should be extremely likely to have no smoking in residence all week. 
And if most households have no smokers (76% of this sample), then the statistics of the total sample will mimic those of no-smoker households. That is to say, the total sample statistics do not add much to the analysis.  Our junkart version below corrects for this as well as other things.

Redo_smokeathomeOne of the key functions of a graph is data reduction, i.e. to aggregate data in such a way as to expose the information contained within.  Typically, a graph that uses aggregated data is clearer and stronger than one that plots every piece of data.  In this example, by combining 1-6 days into a single category ("smokes in residence part of the week"), we have a graph that is much more readable.

I want to thank Dr. Mike Rabinoff for inspiring me to look up these second-hand smoking statistics.  Mike recently published a book called "Ending the Tobacco Holocaust", which tells you more than you want to know about the tobacco industry.


Reference: "Second Hand Smoke Survey: Final Report", Madison Department of Public Health, Dec 2003.

Aug 25, 2005

Obesity bad, maps good

The recent coverage of obesity in the US media produced at least two very good data maps.  The New York Times printed this snapshot of the nation in 2004.

24obese_graphic_lg

Because of a judiciously chosen color scheme, we can easily discern the pattern of obesity: more severe between the Lakes and the Gulf; least in the West and Northeast, especially in Colorado; quite bad in the middle and the South. Nyt_obesity_legend_1

The legend is deserving of much praise:   in defiance of popular but simplistic usage, the range was not divided into four equal parts (quartiles); rather, the designer selected four unequal parts so as to reveal the geographical pattern on the map. Besides, the complete range of the data was shown as is, where most would have artificially widened the range to 15.0% on one end and 30.0% on the other.

All in all, this is a simple graphic conveying a clear message.  Well done.

And yet -- the dynamic aspect of obesity growth alarms even more:

  • Within many states, more and more people are becoming obese
  • Nationally, more and more states have high obesity rates

These trends, along with others, are perfectly captured by the following terrific, dynamic data map, thanks to CDC.  It is a wonderful example of how the electronic medium (animated gif) can do wonders for the graphics designer. [You may need to click on the map to see the animation in a pop-up window.]

Cdc_obesity_slides
The time dimension is experienced rather than drawn on paper/screen.  This experience is in fact distorted, compressed time is what we feel, but the distortion improves rather than deter our ability to see trends.

  • The states between the Lakes and the Gulf led the nation throughout this period.
  • The ever expanding legend ingeniously draws attention to the fact that the worst states have gotten worse over time.
  • No single state has been spared: by 2001, only Colorado had an obesity rate below 15%; just 7 years earlier, in 1994, the entire Western half of the U.S. had obesity rates below 15%.


One small gripe: if read quickly, the reader can be forgiven for thinking that "white" indicates 0% obesity.  Not so!  "White" actually means "no data".  I'd prefer to use a neutral color for "no data"; when they started tracking, these states turned out to be no less obese than others.  By 1994, every state has started tracking obesity.

This dynamic map is really rich in information.  Feel free to leave comments about what else strikes you about it.

Reference: "Obesity Rate Is Nearly 25%, Group Said", New York Times, August 24, 2005; CDC Obesity Trends.

Jul 20, 2005

The right emphasis

Junk artists have no shortage of raw materials; those living in NYC will understand what I mean.  So today I saw this graphic in the Economist. Redoeconomistaids
It supports an article that describes the "progress and problems" with WHO's 3x5 campaign to fight AIDS in poorer countries.

In my junkchart version, I have switched the emphasis from the absolute number of people in need to the percentage of those who have (or have not) received ARV therapy.

I dislike the common practice of the cut-off (see the sub-Sahara bar): our brain just isn't capable of extrapolating and understanding how far the bar would have stretched off the page.

The grid-lines are avoided by providing data labels.

As usual, the sub-title gives the main point of the graphic, and anything minor (namely, the date) is put to the periphery.

Reference: "Moving Targets", Economist, June 30 2005

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31