The gulf between infographics and statistical graphics, that is.
Stan at Mashable praised "5 Amazing Infographics for the Health Conscious". They belong to the class of "pretty things" that are touted all over the Web but from a statistical graphics perspective, they are dull.
Reader Mike L. poked me about the snake oil chart (right) while I was writing up this post. The snake oil chart is by David McCandless whose Twitter chart I liked quite a bit.
This one, not very much.
I continue to love his pithy text labels though; the "worth it line", truly.
The data (if verified) is pretty useful though since there are so many health supplements out there, and as a consumer, it's impossible to know which ones are sham. (Ben Goldacre's site may help.)
Now, let's run through the low lights of the rest:
I'm still trying to figure out what plus-minus means in the Dirty Water graphic.
The fact that the four buildings are not considered one complete unit also trips me up. The Truckee Meadows is depicted as 7 buildings, not divisible by 4. In addition, if 2 short buildings + 1 tall + 1 medium = 200,000 people, how many people live in 2 tall + 1 medium + 4 short buildings?
The obesity charts are pinatas.
The cost of health care chart is boring, just a prettied up data table. Why are life expectancy statistics expressed in 2 decimal places, and not in years and months?
Why 78.11 years and not 78 years (or 78 years, 1 month)?
The scatter chart relating survival rates of people with various ailments and the survival rates of virues/bacteria left outside our bodies is alright but do we care about this correlation?
I hate to be so negative but I can't believe these are examples of good infographics.
My appeal for readers to send in positive examples still stand!
Economists have their misery index; dentists, it seems, have a mystery index.
Laird Harrison, senior editor at DrBicuspid.com, an online newsletter for the dental community, pointed me to this chart when he interviewed me about how to interpret the findings in the latest Quarterly Survey of Economic Confidence, conducted by the American Dental Association. (Note: you have to register to read his article. Registration is free.)
When faced with an index, the first thing to do is to find out what the reference level (here, the zero level) means. Although the report is littered with dozens of similar graphs showing all kinds of indices, I cannot find any definition of the reference level, not even in the methodology appendix. The closest is the following directive for reading the chart I printed above:
For example, [this figure] illustrates that the Net Income Index improved by approximately 10% between 3rd and 4th quarters in 2009, an increase that was driven by 6% fewer dentists responding that net income had declined, approximately 2% more dentists indicating that net income was about the same, and 5% more dentists reporting that net income had increased.
For this survey question, respondents could answer that their Net Income increased, stayed about the same or decreased, and correspondingly, these answers were scored +1, 0, -1. But we still do not know what zero means in the Net Income Index.
Fortunately, the raw data was also provided. I plotted the net score differential, essentially the difference in proportion between those who reported income increase and those who reported income decrease:
The shape of this line looks eerily familiar. But what is the zero level?
After some investigation, I found the answer. The reference level is the net score differential, averaged over the six quarters shown on the chart. In essence, the blue line from this chart, if shifted up by the average net score differential, becomes the green line from the first chart.
How would we interpret such an index? The current quarter's differential was about -40% which was 3% below the average net score differential between 2008Q3 and 2009Q4 (which was -37%).
This index is very problematic. The choice of the past six quarters seems completely arbitrary and ignores any seasonality effect. The use of an unweighted average to average the score differentials assumes that there are no quarterly variations in the data.
But the biggest problem surfaces if one focuses attention on, say, 2008Q3. The top chart says that the net score differential for 2008Q3 was 2% above the average differential from 2008Q3 to 2009Q4. But this is a forward-looking number because in 2008Q3, it was not yet known what the net score differentials would be in the next 5 quarters. Usually, indices are constructed using historical data to establish the reference level.
The mystery is why indexing is even needed. What's wrong with plotting the change in net score differentials?
Reference: "Quarterly Survey of Economic Confidence, Fourth Quarter 2009", American Dental Association, Jan 29 2010.
Here are some things I have been reading while I'm traveling (the posting schedule will be erratic):
Does the vaccine matter? Shannon Brownlee and Jeanne Lenzer investigates for The Atlantic. About 100 million Americans get the flu shot each year; what benefit does it confer? This is an excellent article.
Some provocative quotes:
Flu comes and goes with the seasons, and often it does not kill people directly, but rather contributes to death by making the body more susceptible to secondary infections like pneumonia or bronchitis. For this reason, researchers studying the impact of flu vaccination typically look at deaths from all causes during flu season, and compare the vaccinated and unvaccinated populations.
The estimate of 50 percent mortality reduction is based on “cohort studies,” which compare death rates in large groups, or cohorts, of people who choose to be vaccinated, against death rates in groups who don’t. But people who choose to be vaccinated may differ in many important respects from people who go unvaccinated—and those differences can influence the chance of death during flu season. [Ed: people who can afford the flu shot vs. those who can't; people who are more health-conscious vs. those who aren't, etc.]
“For a vaccine to reduce mortality by 50 percent and up to 90 percent in some studies means it has to prevent deaths not just from influenza, but also from falls, fires, heart disease, strokes, and car accidents. That’s not a vaccine, that’s a miracle.”
In the flu-vaccine world, Jefferson’s call for placebo-controlled studies is considered so radical that even some of his fellow skeptics oppose it. ... “It is considered unethical to do trials in populations that are recommended to have vaccine,” a stance that is shared by everybody from the CDC’s Nancy Cox to Anthony Fauci at the NIH. They feel strongly that vaccine has been shown to be effective and that a sham vaccine would put test subjects at unnecessary risk of getting a serious case of the flu.
Yet another pie chart, Business Insider. Not on the same scale as the one above but still why?
Clean Water Act Violations, New York Times. Can we trust tap water? As usual, a set of small bars would work better than concentric circles.
How does your state compare to California? (via Pew and Mother Jones) This is a nice illustration that often it is better to plot data derived from the raw data, as opposed to the raw data itself. Since the designer decided to hide the information, let's figure out what were the cut-off points for the color categories. If the size of each category is not the same, the designer needs to explain the scale. Also, the two shades of light blue are hard to tell apart. But all in all, a good effort here.
Here are some of my favorite links from other places:
A spatial journey illustrating a very long scale, created by the Genetic Science Learning Center (here)
Long scales are very difficult to deal with in charts; I have never been satisfied with log scales since it addresses the designer's challenge of trying to fit everything onto one page, bu does not deal with the reader's need to compare the elements accurately
Not sure how this helps but perhaps some of you will figure it out
Says there is no optimal chart type. A type that works very well for one data set may get hopelessly cluttered for another, similar data set.
From fellow bloggers (especially Jorge), a whole series of views of the U.S. unemployment figures by state over time. Alternatives that are much more interesting to look at than the typically line chart. Jorge even found something in Excel that looks good.
This is the second post on the immigration paradox study, first discussed on the Gelman blog. My prior post on the graphing aspect is here; this post focuses on the statistical aspects. I am working backwards on Andrew's discussion points.
Which difference is most interesting?
5. Agree with Andrew; they should publish similar analyses on other minority groups as soon as possible. One thing that strikes me when looking at the interaction plot is that the U.S. born non-Latino whites have a much higher incidence of mental illness. The difference between different subgroups of Latinos paled in comparison to the difference between non-Latinos and the Latinos. This latter difference is particularly acute among the U.S. born than the immigrants. The importance of the Latino analysis hinges upon whether the "paradox" is also found among other minority groups.
(Chris P also pointed this out in his comment on the previous post.)
Disaggregation, Practical Significance, and the Meaning of Not Significant
2. Andrew is also right in expressing moderate skepticism about this sort of disaggregation exercise. He connects this to the subtle statistical point that "the difference between significant and not significant is not significant." A related but less obtruse issue is that as one disaggregates any data, the chance of seeing variations that stray from the average gets higher and higher. This is because the sample size is decreasing, and so the statistical estimates are less reliable.
(To give a flavor of the scale, there were a total of 2500 Latinos in the sample, with 500 Puerto Rican Latinos. The analysis drilled down to the level of different types of mental disorders, subgroups of Latinos, and also adjusted for demographics. The details of the demographic adjustment are not available but in any case, one should be concerned about whether there were sufficient numbers of say, male immigrant Puerto Rican Latinos age 18-25 with income < $10,000 living in a rental apartment, for such an elaborate exercise.)
Expanding on this point further, one observes that the measured gap between U.S. born and immigrant Puerto Rican Latinos was about 5%. But this 5% is probably of considerable practical significance since the base rate of incidence is about 30% (I say probably since I am not an expert in mental illness). The current statistical analysis judged this to be insignificant -- if the sample size were larger, this difference could conceivably be statistically significant, and also practically significant.
doesn't the significance test deal with the small sample size problem?
Yes, if the authors merely described the Puerto Rico result as
inconclusive. Here, as is done very commonly, insignificance is
equated to "no difference": they said
doesn't the significance test deal with the small sample size problem?
Yes, if the authors merely described the Puerto Rico result as
inconclusive. Here, as is done very commonly, insignificance is
equated to "no difference": they said
No differences were found in lifetime prevalence rates between migrant and U.S.-born Puerto Rican subjects.
In reality, a difference of 5% was found in the sample that was analyzed. The statistical procedure found that this difference could have been a result of chance -- notice "could", not "must". If the measured difference was 0.5% on 30%, then I might be willing to accept a finding of "no difference"; when it was 5% on 30%, I would like to see a larger sample analyzed.
The Meaning of Paradox
1. Andrew was perplexed by why the phenomenon is known as a "paradox". I had the same issue until I read the paper. The authors were a bit sloppy in the abstract. In the paper itself, they explained that the conventional wisdom has it that immigrants should be more likely to have mental illness because of the stress from the immigration process, and yet the statistics showed the exact opposite. That is the paradox.
I was a little shocked to see the data tables that gave all the estimates of the various effects at the various subgroup levels: shocked because the authors were allowed (or asked) to include only the p-values that were below some unspecified level (which I surmised is 10% although a 5% significance level is used to judge significance as per convention). This is publication bias within publication bias. P-values that are not significant still provide valuable information and should not be omitted. They did provide confidence intervals but for each subgroup separately, rather than for the difference -- and as they noted, such intervals by themselves are inconclusive when they overlap moderately.
Andrew Gelman has a great post about a so-called Immigrant paradox here, which should be interesting to our readers too.
He posed a set of sharp questions. My read, in reverse order:
6. The graph is pretty effective, I agree. This is known as an "interaction plot". The message the authors were trying to send was that the gap between immigrants and U.S. born in terms of prevalence of mental illness is not constant across sub-groups of Latinos. For example, the gap for Mexicans (light blue) is larger than the gap for Puerto Ricans (pink). Thus, the authors concluded that one should be careful about speaking of an aggregate (average) gap.
The graph lays this out clearly. The steeper the line, the bigger the gap between the immigrants and non-immigrants.
When Andrew showed this, I knew for sure someone will cry foul that a line is drawn between unrelated, discrete things. Indeed, the very first commenter weighed in with this complaint. In fact, whenever I show such charts to non-statisticians, a lot of people have this reaction.
So I'll take this as another chance to convince you to release interaction plots from jail.
Typically, a dissenter will offer up a dot plot as an alternative. So let's look at the same chart without the lines. Since the reader is supposed to figure out how the gap between U.S. born and immigrant groups across different subgroups of Latinos, the proverbial nose is tracing a line from a left dot to a right dot. Thus, to follow one's nose is to mentally draw the lines I just removed. The chart designer has done us a favor by making the lines explicit.
In addition, as Andrew pointed out, it is always better to try to get rid of the legend and put the line labels directly onto the chart.
One shortcoming of the interaction plot is that it does not disclose the relative importance of the different lines, which correspond to the relative proportions of people in these subgroups. Without this information, the reader will likely assume the lines have equal weight. This assumption, as I will explain in a future post, may be a problem.
This post dealt with the graphical aspect. I will have more to say about Andrew's other points on the statistics in a future post.
Note: This post is purely on statistics, and is long as I try to discuss somewhat technical issues.
(Via Social Sciences Statistics blog.)
This article in Wired (Aug 24,2009) is a must-read. It presents current research on the "placebo effect", that is, the observation that some patients show improvement if they believe they are being treated (say, with pills) even though they have received "straw men" (say, sugar pills) that have no therapeutic value.
The article is a great piece, and a terrible piece. It fascinated and frustrated me in equal measure. Steve Silberman did a good job bringing up an important topic in a very accessible way. However, I find the core arguments confused.
Let's first review the setting: in order to prove that a drug can treat a disease, pharmas are required by law to conduct "double-blind placebo-controlled randomized clinical trials". Steve did a great job defining these: "Volunteers would be assigned randomly to receive either medicine or a sugar pill, and neither doctor nor patient would know the difference until the trial was over." Those receiving real medicine is known as the treatment group, and those receiving sugar pills is the placebo control group. Comparing the two groups at the end of the trial allows us to establish the effect of the drug (net of the effect of believing that one is being treated).
(I have run a lot of randomized controlled tests in a business setting and so have experience interpreting such data. I have not, however, worked in the pharma setting so if you see something awry, please comment.)
Two key themes run through the article:
1) An increasing number of promising drugs are failing to prove their effectiveness. Pharmas suspect that this is because too many patients in the placebo control group are improving without getting the "real thing". They have secretly combined forces to investigate this phenomenon. The purpose of such research is "to determine which variables are responsible for the apparent rise in the placebo effect."
2) The placebo effect meant that patients could get better without getting expensive medicine. Therefore, studying this may help improve health care while lowering cost.
Theme #1 is misguided and silly, and of little value to patients. Theme #2 is worthwhile, even overdue, and of great value to patients. What frustrated me was that by putting these two together, not sufficiently delineating them, Steve allowed Theme #1 to borrow legitimacy from Theme #2.
To understand the folly of Theme #1, consider the following stylized example:
Effect on placebo group = Effect of belief in being treated
Thus, the difference between the two groups = effect of the drug, since the effect of belief in being treated affects both groups of patients.
A drug fails because the effect of the drug is not high enough above the placebo effect. If you are the pharmas cited in this article, you describe this result as the placebo effect is "too high". Every time we see "placebo effect is too high", substitute "the effect of the drug is too low".
Consider a test of whether a fertilizer makes your plant grow taller. If the fertilized plant is the same height as the unfertilized plant, you would say the fertilizer didn't work. Who would conclude that the unfertilized plant is "unexpectedly tall"? That is what the pharmas are saying, and that is what they are supposedly studying as Theme #1. They want to know why the plant that grew on unfertilized soil was "so tall", as opposed to why the fertilizer was impotent. (One should of course check that the soil was indeed unfertilized as advertised.)
Take the above example where the effect on the placebo group was 13. Say, it "unexpectedly" increased by 10 units. Since the effect of the treatment group = effect of drug + effect of believing that one is treated, the effect of the treatment group also would go up by 10. Because both the treatment group and the control group believe they are being treated, any increase in the placebo effect would affect both groups equally, and leave the difference the same. This is why in randomized controlled tests, we focus on the difference in the metrics and don't worry about the individual levels. This is elementary stuff.
One of their signature findings is that some cultures may produce people who tend to show high placebo effects. The unspoken conclusion that we are supposed to draw is that if these trials were conducted closer to home, the drug would have been passed rather than failed. I have already explained why this is wrong as described... the higher placebo effect lifts the metrics on both the treatment and the control groups, leaving the difference the same.
There is one way in which cultural difference can affect trial results. This is if the effect of the drug is not common to all cultures; in other words, the drug is effective for Americans (say) but not so for Koreans (say). Technically, we say there is a significant interaction effect between the treatment and the cultural upbringing. Then, it would be wrong to run the trial in Korea and then generalize the finding to the U.S. Note that I am talking about the effect of the drug, not the effect of believing one is being treated (which is always netted out). To investigate this, one just needs to repeat the same trial in America; one does not need to examine why the placebo effect is "too high".
I have sympathy for a different explanation, advanced for psychiatric drugs. "Many experts are starting to wonder if what drug companies now call depression is even the same disease that the HAM-D [traditional criterion] was designed to diagnose". The idea is that as more and more people are being diagnosed as needing treatment, the average effect of the drug relative to placebo group gets smaller and smaller. This is absolutely possible: the marginal people who are getting diagnosed are those with lighter problems, and thus those who derive less value from the drug, in other words, could more easily get better via placebo. This is also elementary: in the business world, it is well known that if you throw discounts at loyal customers who don't need the extra incentive, all you are doing is increasing your cost without changing your sales.
No matter how the pharmas try, the placebo effect affects both groups and will always cancel out. Steve even recognizes this: "Beecher [who discovered the placebo effect] demonstrated that trial volunteers who got real medication were *also subject to placebo effects*." It is too bad he didn't emphasize this point.
On the other hand, Theme #2 is great science. We need to understand if we can harness the placebo effect. This has the potential of improving health care while at the same time reducing its cost. Of course, this is not so useful for pharmas, who need to sell more drugs.
I think it is not an accident that Theme #2 research, as cited by Steve, are done in academia while Theme #1 research is done by an impressive roster of pharmas, with the help of NIH.
The article also tells us some quite startling facts:
- if they tell us, they have to kill us: "in typically secretive industry fashion, the existence of the project [Theme #1] itself is being kept under wraps." Why?
- "NIH staffers are willing to talk about it [Theme #1] only anonymously, concerned about offending the companies paying for it."
- Eli Lilly has a database of published and unpublished trials, "including those that the company had kept secret because of high placebo response". Substitute: low effect of the drug. This is the publication bias problem.
- Italian doctor Benedetti studies "the potential of using Pavlovian conditioning to give athletes a competitive edge undetectable by anti-doping authorities". This means "a player would receive doses of a performance-enhancing drug for weeks and then a jolt of placebo just before competition." I hope he is on the side of the catchers not the cheaters.
- Learnt the term "nocebo" effect, which is when patients develop negative side effects because they were anticipating them
Again, highly recommended reading even though I don't agree with some of the material. Should have focused on Theme #2 and talk to people outside pharma about Theme #1.
I hinted at it in the last post, and some readers also made similar suggestions. What happens if we plot the U.S. life expectancy data in relative terms (indiced) rather than in absolute terms?
The result is highly revealing, and that is why we should always look at the data many ways. While in the original chart, the differences in the race/gender segments were essentially obscured by the overall slowly-growing trend, in our new chart, we took out the trend, isolating the growth rates.
The reconstructed chart showed that:
Reference: "CDC says life expectancy in the US is up, deaths not", Miami Herald, Aug 19 2009. CDC Life expectancy data.