« A matter of compactness | Main | No sorting and lack of structure undermine a chart »



I sense a lurking variable here...


Agreed -- these are insanely effective.


Tyson: I made no comments about causality. However, I don't share your "sense". What exactly are you speaking of? If you have a variable in mind, you can check the correlation without much trouble.


"Let the data speak for itself." Except that the graph itself gives no indication of whether or not the 19 other rich countries represent many, most, all, or a few cherry-picked rich countries.

"Recognize what's important, what's not." Apparently, anything other than plain-vanilla length-of-life is not -- not anything that might be contained in data related to child mortality, not health care dollars relative to medan HH income -- nothing but dollars and years.

"Rather than clutter up the chart, the other 19 lines are anonymized." See above. The flipside of 'anonymized' is 'nondescript to the point of being immune to follow-up research.'

"The axis labeling is superb" This is, frankly, false. Spending per capita per *what*? Not per lifespan -- being born costs > $7K in pretty much every 'rich' country. So the x-axis, i am not exaggerating when i say that i have no idea what it represents. I'm not playing obtuse. I don't know what that 2,000 is.

Look -- I get it, and I agree: end-of-life spending on health care in the US has no relation to a return in lifespan. Or maybe what these charts say that i agree with is that yearly spending in the US doesn't result in higher life expectancy. The point, whatever it is, is well-taken: US health spending is stupid, compared to what we get for it.

But these are *terrible* charts. They're bludgeons, built to serve an agenda other than providing transparency into the data on which they rest.

Are those *all* rich countries? Are there rates attached to the movement of the lines in the second chart? Cause that matters, and its completely invisible here. Also, are the $ over time adjusted by floating, point-in-time exchange rates, or, as the note seems to imply, just a conversion to $ based on, I guess, now? Also, totally matters -- the variance in slope might be nothing more than a vestige of devaluation of the dollar.

How were the 19 other countries chosen? What's rich? Are there others that were left out? Why? What happens to those life-expectancy lines if we control for things that aren't related to health-care expenditure?

But mostly, these charts are the END of a data discussion, not the BEGINNING of one, and I stand firmly on the side of 'charts don't make facts, the World makes facts'. Charts are no substitute for data or its analysis, and these pictures don't help me decide what to look at next. And because of that, they're infographics at best and propaganda at worst.

Sorry. Maybe I'm cranky, but I doubt it since I just had yakatori, which might have another 'i' in it, but since it's a transliteration, i think i'm safe. I just really, really don't like these graphs. I think they make their point very strongly, which would be fine if graphs were for proving points, rather than helping look for the truth.


That second plot is fantastic.

I am curious what you mean when you say "You can do this in R but that's about it." Do you mean any programming language with a decent graphics package, i.e., that you need to write code to get such a graphic? I lament this too. But if you really do mean that you need R to create this graphics, I can think of a number of languages with graphics support for such visualizations (Python, IDL, Matlab, PV-WAVE, IGOR, not to mention more than a few javascript charting libraries...)


You can do it in Excel, but you have to fiddle with it. Effectively, you have to treat the scales as graph series in their own right; which is not a bad philosophy, but not one that Excel normally encourages. Excel by default treats scales as a support element to graphs, not graphs in themselves.

The effort you have to go to in Excel might be considered "programming" Excel to do this, which needn't be more onerous than the programming you have to do in R to achieve the same result.

The unnecessary physical contact of the two orthogonal scales in the graphs above makes me think of Cleveland, not Tufte. Tufte would have have them separated by a space gap.


You can do un-evenly spaced axis pretty easily in matplotlib


How did they pick the other "rich countries"? Just to make the US look bad? It seems unlikely that one country would be so far off from 19 others that are so close together.


Henry: One hopes you held the yakatori down, it's unfortunate that you had to look at these charts while eating your dinner.

"But these are *terrible* charts. They're bludgeons, built to serve an agenda other than providing transparency into the data on which they rest."
I'm sorry to have to disclose that there are no "objective" charts just like there is no "objective" journalism. A chart represents data filtered through the designer's point of view. Every chart has a designer so every chart is subjective, even the ones you make yourself. In fact, if you read my blog regularly, you will notice that I consider having a point of view to be one of the most crucial elements of any chart.

"The flipside of 'anonymized' is 'nondescript to the point of being immune to follow-up research."
The data set behind these charts is accessible. You can do follow-up research if you're up to it.

"Except that the graph itself gives no indication of whether or not the 19 other rich countries represent many, most, all, or a few cherry-picked rich countries."
He did cite OECD as the data source. Besides, the names of the countries can be read directly off the scatter plot. If you (or Hmm for that matter) disagree with the selection, tell us which countries you don't consider "rich" and which "rich" country should have been included.

"these pictures don't help me decide what to look at next."
Use your imagination.


Josh, Sloan: Yes, any programming language like R should be able to produce customized axes. When I wrote that, I'm thinking of graphing software for the mass market. It would be interesting to compare the level of effort in R, Matlab, Python, etc.


@Henry & Hmm: Lazy trolling... Google "OECD health" and you'll find the datasets in 2 minutes (ok I help you: http://stats.oecd.org/index.aspx?DataSetCode=HEALTH_STAT#). To be fair, there are 34 countries with complete data in 2007 (including Mexico, Chile, Turkey...). So I was curious too, and I've replicated the 1st plot myself (R + ggplot2, 3 lines of code, mostly cut & past from here: http://had.co.nz/ggplot2/geom_text.html)... And guess what? The cloud looks exactly the same, with the US as an unbelievable outlier. I tried the "expanditure per capita in US$ purchasing power parity" but there are many export options (% of GDP...) on the OECD website so that you can try to hide this disturbing stubborn pattern with 'devaluation of dollar'... or whatever


What's a good rule of thumb on when the axes should start at 0 vs when they should start at some "reasonable" value like the Y axis does here (starting at 77)?
It seems that sometimes the choice of lower bound on the Y axis is used to exaggerate the differences in data? I guess it's especially offensive when used when the Y axis is a percentage, but are there other good guidelines?


Somebody's got to say it... there are 19 "other countries" besides the U.S., making the line for the U.S. a one-in-twenty outlier. One in twenty, why does that sound eerily familiar. Hmm wonders how they picked the other 19, but in reality such cherry-picking wouldn't even be necessary. In honesty it's very unlikely that this correlation is spurious, but the statistical power is at least worth noting (and I haven't read the original article, in which it probably is).

Steve T

I think the chart is 'effective' in the sense that it communicates something clearly, but in reality it just raises more questions than it answers. What other factors contribute to life expectancy? How far ahead is the US in medical research than other countries, and how does that impact the cost of our health care? And so on. Chart authors have to be wary of becoming merely persuasive without connection to the truth.


If you say Organisation for Economic Co-operation and Development, people say "what's that?" If you say "other rich countries" they say "how did you pick them?" If you say by picking the OECD countries they say "what's OECD?". At some point you have to conclude the push-back is for convenience, not a genuine request for clarification, and challenge the pushers-back to cite a singel country that has a health expenditure and life expectancy that is in the ballpark of the US, and a name everyone recognises as a fair example of a rich country.

If they say there's cherry-picking, they should point to the unpicked cherry.


Perhaps tyson's lurking variable is funding source? I can imagine that if these data sets were split into private and public spending, you might get some even more interesting results.

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter