A budding field
Close races

Whither complexity?

The ever interesting Gelman blog ("Too clever by half") ponders about this enterprising NYT chart.  Whatever its merits, this is one that requires close study. 


Reception is generally positive.  Andrew himself learnt an important fact, that there are still more white people than other races in America!  In statistics, we distinguish between two types of errors, the significant kind and the ignorable kind.  From this perspective, using admissions count is a gigantic problem; it renders the rest of the chart useless.  So I agree with Andrew.  As ever, picking the right scale is the beginning of making a nice chart.

We can also use this example to discuss the concept of "interactions".  When we go about presenting small multiples, i.e. comparisons of subgroups within a population, it's because we have observed differences between those subgroups; otherwise, it is both simpler and clearer to present the aggregate results.  The present chart presents subgroups defined by race, gender, age and substance abused, that is quite a lot of subgroups. 

Focusing on the first row (Alcohol), we note that the colored mass has shifted to the right, indicating more older people abused alcohol.  This trend appeared for all races.  Now scanning the other rows, we discover that only heroin abuse showed a distinctly different pattern,
but only among whites.  For every other row, it seemed that the change from 1996 to 2005 was similar across races.

By breaking out substance abused, the designer added 21 little charts (7 sets of 3).   Only one set  (heroin) added information to what was true in aggregate i.e. that substance abusers got older.  The incremental gain in information does not justify the added complexity.

Nevertheless, the chart had many positive things such as judicious use of axis and gridlines and letting the graphical constructs speak for themselves (without accompanying data labels).


Reference: "Why is Mum in Rehab?",  New York Times, Jun 14 2008.


Feed You can follow this conversation by subscribing to the comment feed for this post.


I kept getting confused and forgetting which year was the line (1996) and which was the filled area (2005)-- wait, are they getting older, or younger?-- so I was constantly having to refer back to the key in one corner of the graph.

It's funny that people put many long dark gridlines across their data in case we can't remember where the scale is, but when it comes to categories, they let us rely on our memory of what's what, or make us look off to one side to keep checking.

Sean Carmody

The "whither complexity" and "small multiples" links seem to have the same content.


Sean, this article is the latest one to have the "small multiples" category tag, as seen in the tag cloud on the right. If you scroll down a bit, you'll see earlier articles in that category.


The text talks about change in the over 40 population, so that should be what the graphs display. Maybe proportion over 40 in each period or some sort of population standardised admission rate for the over 40 population might work.

Joe Harris

The worst thing about this chart is that it looks "semi-Tuftian" when it's nothing of the sort. It's disqualified on the data/ink principle alone.

As you pointed out there is only one interesting variance displayed. Also, the male/female display is terrible. I can't make any valid comparison between sexes.

I would like to see the variables compared 2 at time in small multiples. If I get an hour today I'll try and cook this up.


I actually liked this set of graphs quite a lot. The graphs are clutter-free, which makes reading them relatively easy, once you have a question you want to find an answer for. There's a lot of information here, which would serve numerous different questions, and so this isn't a graph that can be easily taken in at one glance, rather it is a resource you can revisit a number of times. A good graph doesn't necessarily need to be just simple.

I do agree that as this is treatment admissions data, there is a whole luggage of bias issues that go with these figures, making many comparisons at least between groups very difficult. Yet, for instance the graphs on alcohol or smoked cocaine in black group are interesting, as they seem to suggest more or less the same demographic that was doing these drugs in 1997, has kept on doing them, as the peak has just shifted right for about 10 years. Are younger people doing less of these or just being admitted less, or are we now getting (as the NYT article suggsts) a bigger proportion of old users admitted, are questions that need to be asked before making conclusions, the answers would be hard to find, though.

But suppose these biases are smaller than the massive effect that this graph seems to show, if there was this massive "bump" in 30 year old crack users admitted to treatment in 1997, why are they still in the system, as 40 year old crack users?

(Of course, two measurements 10 years apart don't mean any of the people making the 30-yr bump in -97 data reoccur in the -07 data, but it would seem more plausible to suggest that the crack problem has persisted within this same part of population rather than the junkies of the late nineties all kicking the habit and a new slice of their generation now taking it up ten years later.)

Jonathan Dursi

I just can't see some of the criticisms here. There's a lot of good stuff that comes out of this chart. For smoked cocaine, and for all but marijuana in the black group, it is essentially the same cohort seeking treatment. Does that say something about the drug culture at the time, or about one groups increased willingness to seek treatment? For alcohol & marijuana, it's primarily males being admitted, but the others are much more gender neutral. That very robust double bump in marijuana is interesting to see -- kids and 20 somethings? That hollowed-out region in the 30s for black and white alcohol abuse; what caused that? Which other opiates caused such an increase in admission to treatment, and why did the heroin demographics change so suddenly?

The fact that the numbers have so much structure is genuinely interesting, and inspires more detailed investigation.

Is complexity just always to be avoided? What if the underlying data is complex, and interestingly so?


Perhaps we should recognize the difference between graphics for exploration and graphics for communication?

I think one step is missing here which is to move from exploration to communication. The charts (if the right scale is chosen) help us identify the information in this data; the next step would be to focus on the most pertinent parts that illustrate the story-line.

Without a doubt, this chart is much richer than most we see daily.

Luiz Pires

We can evaluate any chart from either the content level (Are the right questions asked? Is the analysis sound? Are the comparisons valid?) or we comment on its implementation (Does it leverage the appropriate visual tools? Is it misleading? Is it easy for the viewer to grasp the main points and visually explore related questions? Is it visually appealing? Is the implementation parsimonious? etc.)

While the former is always most important (a beautiful implementation of bad analysis is still nonsense), I will leave this task to others.

From the implementation point of view, however, I am struck by this chart's ambition. It examines "number of admissions" (a continuous variable) across 5 independent dimensions:
1. Age of admission (continuous)
2. Year (2 values - 1996, 2005)
3. Gender (2 values - Male, Female)
4. Ethnicity (3 values - White, Black, Hispanic)
5. Type of drug (8 values - alcohol, etc.)

At least from this narrow technical stand point, this is an impressive example of a small multiples visualization. That they include a visualization of such high dimensionality data on a mainstream publication (NYT) is heartening.


No it is not heartening that the include it in the NYT, because hardly anyone will look closely enough to gain any information. What is needed is a chart that helps understanding of the text, so it should be limited to concepts that are discussed in the text. As Kaiser has commented this is a graph for exploration. There is nothing wrong with putting it in the appendices of a report. It is also the type of thing that is useful when developing ideas for analysis. It is probably fine to show in a presentation where the important aspects can be identified and explained but it is excessive to include with an article.

The comments to this entry are closed.