« August 2009 | Main | October 2009 »

Seth's Rules

(Via Gelman blog)

Prominent marketer Seth Godin came up with some sensible rules for making "graphs that work".  We pretty much agree with most of what he says here, unlike the last time he talked about charting.

One must recognize that he has a very specific type of chart in mind, the purpose of which is to facilitate business decisions.  And not surprisingly, he advocates simple, predictable story-telling.

His first rule: dispense with Excel and Powerpoint.  Agreed but to our dismay, there are not many alternatives out there that sit on corporate computers.  So we need a corollary: assume that Excel will unerringly pick the wrong option, whether it is the gridlines, axis labels, font sizes, colors, etc.  Spend the time to edit every single aspect of the chart!

His second rule: never show a chart for exploration or one that says nothing.  I used to call these charts that murmur but do not opine.  (See here, for example.)  This pretty much condemns the entire class of infographics as graphs that don't work.   This statement will surely drive some mad.  One of the challenges that face infographics is to bridge the gap between exploration and enlightenment, between research and insight.  As I said repeatedly, I value the immense amount of effort taken to impose structure and clarity on massive volumes of data -- but more is needed for these to jump out of the research lab.

In rules 3 and 4, Seth apparently makes a distinction between rules made to be followed and rules made to be broken.  In his view, time going left to right belongs to the former while not using circles belongs to the latter.  He gave a good example of why pictures of white teeth are preferred to pie charts, bravo.  I hope all those marketers are listening.

As readers know, I cannot agree with "don't connect unrelated events".  He's talking about using line charts only for continuous data.  This rule condemns the whole class of profile plots, including interaction charts in which statisticians routinely connect average values across discrete groupings.  The same rule has created the menace of grouped bar charts used almost exclusively to illustrate market research results (dozens to hundreds of pages of these for each study).  I'd file this under rules made to be broken!

What menace?

Menace1

What menace?

Menace2


What menace?

Menace3


What menace?

Alright, I made my point.  If you don't work in market research, the mother lode of cross-tabs and grouped bars, consider yourself lucky.  If you do, will you start making line charts please?






Of placebos and straw men

Note: This post is purely on statistics, and is long as I try to discuss somewhat technical issues.

(Via Social Sciences Statistics blog.)

This article in Wired (Aug 24,2009) is a must-read.  It presents current research on the "placebo effect", that is, the observation that some patients show improvement if they believe they are being treated (say, with pills) even though they have received "straw men" (say, sugar pills) that have no therapeutic value.

The article is a great piece, and a terrible piece.  It fascinated and frustrated me in equal measure.  Steve Silberman did a good job bringing up an important topic in a very accessible way.  However, I find the core arguments confused.

Let's first review the setting: in order to prove that a drug can treat a disease, pharmas are required by law to conduct "double-blind placebo-controlled randomized clinical trials".  Steve did a great job defining these: "Volunteers would be assigned randomly to receive either medicine or a sugar pill, and neither doctor nor patient would know the difference until the trial was over."  Those receiving real medicine is known as the treatment group, and those receiving sugar pills is the placebo control group.  Comparing the two groups at the end of the trial allows us to establish the effect of the drug (net of the effect of believing that one is being treated).

(I have run a lot of randomized controlled tests in a business setting and so have experience interpreting such data.  I have not, however, worked in the pharma setting so if you see something awry, please comment.)

Two key themes run through the article:

1) An increasing number of promising drugs are failing to prove their effectiveness.  Pharmas suspect that this is because too many patients in the placebo control group are improving without getting the "real thing".  They have secretly combined forces to investigate this phenomenon.  The purpose of such research is "to determine which variables are responsible for the apparent rise in the placebo effect."

2) The placebo effect meant that patients could get better without getting expensive medicine.   Therefore, studying this may help improve health care while lowering cost.

Theme #1 is misguided and silly, and of little value to patients.  Theme #2 is worthwhile, even overdue, and of great value to patients.  What frustrated me was that by putting these two together, not sufficiently delineating them, Steve allowed Theme #1 to borrow legitimacy from Theme #2.

To understand the folly of Theme #1, consider the following stylized example:

Effect on treatment group = Effect of the drug + effect of belief in being treated

Effect on placebo group = Effect of belief in being treated

Thus, the difference between the two groups = effect of the drug, since the effect of belief in being treated affects both groups of patients.

Say, if the treatment group came in at 15, and the placebo at 13, we say the effect of the drug = 15 - 13 = 2.


A drug fails because the effect of the drug is not high enough above the placebo effect.  If you are the pharmas cited in this article, you describe this result as the placebo effect is "too high".  Every time we see "placebo effect is too high", substitute "the effect of the drug is too low". 

Consider a test of whether a fertilizer makes your plant grow taller.  If the fertilized plant is the same height as the unfertilized plant, you would say the fertilizer didn't work.  Who would conclude that the unfertilized plant is "unexpectedly tall"?  That is what the pharmas are saying, and that is what they are supposedly studying as Theme #1.  They want to know why the plant that grew on unfertilized soil was "so tall", as opposed to why the fertilizer was impotent.  (One should of course check that the soil was indeed unfertilized as advertised.)

Take the above example where the effect on the placebo group was 13.  Say, it "unexpectedly" increased by 10 units.  Since the effect of the treatment group = effect of drug + effect of believing that one is treated, the effect of the treatment group also would go up by 10.  Because both the treatment group and the control group believe they are being treated, any increase in the placebo effect would affect both groups equally, and leave the difference the same.  This is why in randomized controlled tests, we focus on the difference in the metrics and don't worry about the individual levels.  This is elementary stuff.

One of their signature findings is that some cultures may produce people who tend to show high placebo effects.  The unspoken conclusion that we are supposed to draw is that if these trials were conducted closer to home, the drug would have been passed rather than failed.  I have already explained why this is wrong as described... the higher placebo effect lifts the metrics on both the treatment and the control groups, leaving the difference the same.

There is one way in which cultural difference can affect trial results.  This is if the effect of the drug is not common to all cultures; in other words, the drug is effective for Americans (say) but not so for Koreans (say).  Technically, we say there is a significant interaction effect between the treatment and the cultural upbringing.  Then, it would be wrong to run the trial in Korea and then generalize the finding to the U.S.  Note that I am talking about the effect of the drug, not the effect of believing one is being treated (which is always netted out).  To investigate this, one just needs to repeat the same trial in America; one does not need to examine why the placebo effect is "too high".

I have sympathy for a different explanation, advanced for psychiatric drugs.  "Many experts are starting to wonder if what drug companies now call depression is even the same disease that the HAM-D [traditional criterion] was designed to diagnose".  The idea is that as more and more people are being diagnosed as needing treatment, the average effect of the drug relative to placebo group gets smaller and smaller.  This is absolutely possible: the marginal people who are getting diagnosed are those with lighter problems, and thus those who derive less value from the drug, in other words, could more easily get better via placebo.  This is also elementary: in the business world, it is well known that if you throw discounts at loyal customers who don't need the extra incentive, all you are doing is increasing your cost without changing your sales.

No matter how the pharmas try, the placebo effect affects both groups and will always cancel out.  Steve even recognizes this: "Beecher [who discovered the placebo effect] demonstrated that trial volunteers who got real medication were *also subject to placebo effects*."  It is too bad he didn't emphasize this point.

On the other hand, Theme #2 is great science.  We need to understand if we can harness the placebo effect.  This has the potential of improving health care while at the same time reducing its cost.  Of course, this is not so useful for pharmas, who need to sell more drugs.

I think it is not an accident that Theme #2 research, as cited by Steve, are done in academia while Theme #1 research is done by an impressive roster of pharmas, with the help of NIH.

The article also tells us some quite startling facts:

- if they tell us, they have to kill us: "in typically secretive industry fashion, the existence of the project [Theme #1] itself is being kept under wraps."  Why?
- "NIH staffers are willing to talk about it [Theme #1] only anonymously, concerned about offending the companies paying for it."
- Eli Lilly has a database of published and unpublished trials, "including those that the company had kept secret because of high placebo response".  Substitute: low effect of the drug.  This is the publication bias problem.
- Italian doctor Benedetti studies "the potential of using Pavlovian conditioning to give athletes a competitive edge undetectable by anti-doping authorities".  This means "a player would receive doses of a performance-enhancing drug for weeks and then a jolt of placebo just before competition."  I hope he is on the side of the catchers not the cheaters.
- Learnt the term "nocebo" effect, which is when patients develop negative side effects because they were anticipating them

Again, highly recommended reading even though I don't agree with some of the material.  Should have focused on Theme #2 and talk to people outside pharma about Theme #1.




Lining them up

Consider this simple and effective display accompanying the NYT article titled "Recession Drives Women Back to the Work Force".  This is somewhat surprising because jobs are harder to come by during a recession. 

Nyt_laborforce


The journalist also told us that the interpretation of this data is controversial among economists.  The trend seemed to have been a decline in participation till 2004 and then a rise since then for 25-to-34-year-old women.  If there were an up-trend, it appeared before the start of this recession.  And participation rate dropped after the 2001 recession.  So much is still under study.

The use of small multiples is a good idea, and the color coordination between the two charts is well thought out.  A minor miss is the vertical scale: for any small multiples chart, one should always use the same scale.  In this case, using the same scale will allow readers to compare the levels of participation between men and women.

The following chart plots a related data set, focusing on the difference between men and women, among those married.  As before, the participation rates are of very different levels.  The male rate has steadily declined in recent decades while the female rate rose from the 30s in the 60s to about 70% in the 90s. 

Redo_laborforce

Reference: "Recession Drives Women Back to the Work Force", New York Times, September 19, 2009; Bureau of Labor Statistics.


Comment on a comment


Policy on Comments

Unlike some blogs, I do not censor comments (except for obvious spam comments, including commercials that are unrelated to the content of the posts).  Junk Charts readers have been very impressive in contributing comments that are almost always relevant, constructive and provocative.  In this regard, I am very grateful.

Because I don't censor, I typically only respond to posts that react to the contents of my posts.


J's Comment

Yesterday, J left a general comment about the entire premise of Junk Charts.  I will give a general response here, and take the opportunity to share some thoughts about the blog, which I rarely do.

First of all, those who do not see any value in my blog are welcome to tune out.  The blog has a particular point of view and that won't change.  I do not market the blog so if you are reading it, you have found your way here, and I am sorry that you have not found it interesting.


Point of View

I believe the primary purpose of charts in the mass media is to convey information in a clear manner.  I do understand that editors like to entertain readers, and have written occasionally about the sometimes conflicting objectives of clarity and beauty.  The best charts manage to attain both.  The point of view of Junk Charts is that when there is a conflict between the two, clarity comes first.  This perspective is not new; Ed Tufte has preached it for years, and I am a big fan of his work.


Blogging and I

I have a full-time job as a statistician in industry.  I work on the blog on my own free time.  A typical blog post takes about three to five hours of work between carefully studying the original chart, collecting data, testing different alternative charts and writing up the posts.  Some posts take days.

I started writing Junk Charts five years ago to connect with other people who are interested in how data can best be communicated through charts.  I have been heartened to find so many kindred spirits out there, as evidenced by the variety of commentators and the numerous submissions from readers.  Thank you!

I do not make money from this blog.  I do not serve ads.  I also do not pitch graphing software on the blog.  (Marketers please note: I am happy to write about charts created by your software that highlight the software's strengths; I just don't have time to learn your software from scratch.)  I do publish a wish list at the end of each year of books I'd love to have, and I am gratified that a few of you have liked the blog enough to contribute to my library.  Thanks again.


Why I Don't Publish Professional Charts

The point of Junk Charts is to discuss how the featured charts are conceived, identify strengths and weaknesses, and explore alternative concepts.  The alternative charts posted here contain sketches, hints, suggestions and illustration of the commentary; they are never intended to be publish-ready charts that can be dropped into any publication.  Creating charts for this blog is not my full-time job.


Do Graphics Designers Read This Blog?

You bet.  I have received a lot of favorable feedback from professionals in the graphics community.  A lot of them, whose work is discussed on Junk Charts, regard Junk Charts as a great resource and treasure of ideas.  They appreciate that there are people out there who spend considerable time examining their handiwork.

Take the case of New York Times.   Their graphs feature frequently on Junk Charts.  If their work is not consistently interesting, do you think I will bother writing about them?  I love the Times, and love their support of printing large amounts of thought-provoking charts.  The USA Today provides plenty of chartjunk, you won't see them on Junk Charts.  Economist charts show up infrequently because they only use like five types of charts, and rarely inspires posts.


Why Aren't There More Positive Posts?

From the start, I intend to post about both good and bad charts.  Over time, the not-so-good charts have outnumbered good charts.  That's right.  The unfortunate state of affairs is that good, innovative charts are not in abundance.  Periodically, I ask readers to send in examples of good charts but roughly 95% of all submissions are examples of what not to do.   I would gladly put up any good charts sent to me.


Self-Sufficiency

J's comment unfortunately garbled the very important notion of self-sufficiency.  Self-sufficiency has nothing to do with whether a chart is publish-ready.  It is the point of view that graphical elements should add to the data, not merely duplicate data.  If every data element is printed on a chart next to each graphical construct, are the graphical elements adding anything to the chart?  Is the reader, in effect, just reading a data table?


Finally, I wish to thank all loyal readers for your continued support.


Serving donuts

David Leonhardt's article on the graduation rates of public universities caught my attention for both graphical and statistical reasons.


Nyt_gradrate David gave a partial review of a new book "Crossing The Finish Line", focusing on their conclusion that public universities must improve their 4-year graduation rates in order for education in the U.S. to achieve progress.  This conclusion was arrived at through statistical analysis of detailed longitudinal data (collected since 1999).

This chart is used to illustrate this conclusion.  We will come to the graphical offering later but first I want to fill in some details omitted from David's article by walking through how a statistician would look at this matter, what it means by "controlling for" something.

The question at hand is whether public universities, especially less selective ones, have "caused" students to lag behind in graduation rate.  A first-order analysis would immediately find that the overall graduation rate at less selective public universities to be lower, about 20% lower, than at more selective public universities.  

A doubter appears, and suggests that less selective schools are saddled with lower-ability students, and that would be the "cause" of lower graduation rates, as opposed to anything the schools actually do to students.  Not so fast, the statistician now disaggregates the data and look at the graduation rates within subgroups of students with comparable ability (in this instance, the researchers used GPA and SAT scores as indicators of ability).  This is known as "controlling for the ability level".  The data now shows that at every ability level, the same gap of about 20% exists: about 20% fewer students graduate at the less selective colleges than at the more selective ones.  This eliminates the mix of abilities as a viable "cause" of lower graduation rates.

The researchers now conclude that conditions of the schools (I think they blame the administrators) "caused" the lower graduation rates.  Note, however, that this does not preclude factors other than mix of abilities and school conditions from being the real "cause" of lower graduation rates.  But as far as this analysis goes, it sounds pretty convincing to me.

That is, if I ignore the fact that graduation rates are really artifacts of how much the administrators want to graduate students.  As the book review article pointed out, at the less selective colleges, they may want to reduce graduation rates in order to save money since juniors and seniors are more expensive to support due to smaller class sizes and so on.  On the other hand, the most selective colleges have an incentive to maintain a near-perfect graduation rates since the US News and other organizations typically use this metric in their rankings -- if you were the administrator, what would you do?  (You didn't hear it from here.)

Back to the chart, or shall we say the delivery of 16 donuts?

First, it fails the self-sufficiency principle.  If we remove the graphical bits, nothing much is lost from the chart.  Both are equally impenetrable.

A far better alternative is shown below, using a type of profile chart.

Redo_gradrate

Finally, I must mention that in this particular case, there is no need to draw all four lines.  Since the finding of a 20% gap essentially holds for all subgroups, no information is lost by collapsing the subgroups and reporting the average line instead (with a note explaining that the same effect affected every subgroup).  

By the way, that is the difference between the statistical grapher - who is always looking to simplify the data - and the information grapher - who is aiming for fidelity. 




Reference: "Colleges are lagging in graduation rates", New York Times, Sept 9, 2009; "Book review: (Not) Crossing the Finish Line", Inside Higher Education, Sept 9 2009.