« February 2008 | Main | April 2008 »

Small multiples re-imagineered


This chart gave me trouble.  I kept staring at it, staring.  Searching for the legend.  What could the several lines, in different colors, represent?  Take a look yourself.

Well, it turns out all three graphs were duplicates.  A different line was given dark blue to highlight a particular amusement park.

I have not seen this tactic used before.  This is like a small multiples concept except that every chart contains the same data.  Is it better than having just one chart?

Reference: "Will Disney Keep Us Amused?", New York Times, Feb 10 2008.

PS. [4/6/2008]  Here are two alternative charts contributed by our readers.  See comments below.

Derek suggested using sparklines:


Zuil reverted to basics:


Two books

Nathan from FlowingData announces a competition to win Tufte's classic book on visual representation of data.   There are still a few days left to participate.  While his more recent books start getting repetitive, he still has published one of the most accessible books on this topic.

I also had the pleasure of reading Naomi Robbins' Creating More Effective Graphs.  She adopts a cookbook format providing hints on graphs in one, two and more dimensions, scales, visual clarity and so on.  Since she has already read Cleveland, Tufte, etc., she manages to put all that learning inside on cover.  The page design - with half of every page blank - is refreshingly easy on the eyes.  Inclusion of examples is generous. 

Lets review her point of view of some of the topics we discuss frequently on Junk Charts:

Starting axis at zero: she thinks "all bar charts must include zero.  However, the answer is not as clear for line charts or other charts for which we judge positions along a common scale." (p.240)

Jittering: she does not provide a clear guideline but gave an example of a strip chart with jittered dots, commenting that "it gives a much better indication of the distributions than would a plot without jittering" (p.85) so I infer that she's generally in favor.

Parallel coordinates plot / profile plot: she provides an example of such a plot on p.141 and describes how to read such a plot.  Again, I infer she's in favor.

Trying too hard

In the course of business and governing, a lot of charts are generated.  An anonymous tipster pointed us to a set created by the "Communities and Local Government" division in the UK government.  Judging from the content, this division has responsibility for economic development in local neighborhoods.

Below are a pair of exhibits.  Truly they are trying too hard!  What we see is a hybrid scatter-bubble chart.  Between the jargon, the acronyms (LAD, LSOA), the boxed text, the multi-color circles, the colored axis labels and lack of title, the reader is plunged into a state of confusion.


The chart can be unraveled.  Each district was evaluated based on two measures of "gaps in worklessness".  The vertical axis compares each district to the national average; positive numbers indicate an above-average district relative to the nation.  The horizontal axis compares the most deprived 10% neighborhood within each district to the local average; positive numbers indicate worst neighborhoods improving. 

Thus, the policy goal would be to move all districts into the upper right quadrant.  The multi-color bubbles were designed to show us the state of the nation.  On the left chart, 41% of the districts (or population?) reside in the improving districts while 19% live in deteriorating areas.

The following strategies can help improve readability:

  • Redo_communities3use English on the axis
  • relegate technical definitions to the legend
  • add succinct title to tell the story
  • use color on the data rather than on axis or data labels
  • use color to draw attention to the upper right quadrant
  • remove bubbles
  • define acronyms


Lunar eclipse

Todd B. sent me this pie chart, with a note: "Do the areas in the pie chart represent the numbers?"


The short answer is NO. 

It's also not so simple to figure out the areas of crescents.  The purple area looks tiny compared to the dark green region.  If shown this chart, we get the impression that  Microsoft's intention to absorb Yahoo! will not vastly expand the number of unique visitors to its properties because so many of their current users overlap.

The following is a bar chart representation of the same data.  Redo_overlapThe combined entity will have 31% more users than what Microsoft has right now.  Not a bad growth rate for a mature business!  The author of the original post calculated that Microsoft would in effect be paying about $1000 each to acquire these new users. 

Perhaps the most important question is how one values a "unique visitor".  Have anyone seen any sophisticated analysis on this topic?


Chart cleanup

Anna E. submitted this great example from Yahoo! Green.  A well-meaning chart but stuffed with redundancy.

Much appear to be going on and yet the entire chart contains 15 data points, Boston's ranks on each of 15 categories.  The bar lengths convey the same information as the data labels.  The legend provides a catchy name for different levels of ranks (0-10 = "leader"; 10-20 = "advances"; etc.).  The colors merely reiterate the catchy titles.  Similarly, the colored squares repeat the information in the bars.

In the name of green, we cleaned up this chart:


As a standalone graph, the categories should be ordered by Boston's ranks.  Here, we assume that cross-referencing cities is needed so we leave the order unchanged.

Amazing baseballs

Reader Jonathan S. submitted this entry.

USA Today chartjunk:


Recycled junkart (his chart):


Jonathan noticed that the scales were off (more likely, they began with an axis that did not start at zero!  This is precisely why most graphs should start at zero).

As an aside, pitchers used to point to their (frequently untoned) physique as proof that steroids could not help; now we know better.


Don't believe what you see

Mankiw's blog linked to a press release by the Congressman Jim Saxton, using CBO data to show "middle income tax burden at lowest level in decades".  Cbo_taxrateThe attached graph, as Junk Charts readers will immediately recognize, is classic chartjunk.  Every time the vertical axis does not start at zero,  one suspects something is amiss.  And what with the gridlines and data labels?

"Don't believe it? Check out the data source yourself."  I followed Mankiw's suggestion and was indeed surprised... but not by the great fortune of the "middle class".  The surprise was how the chart painted a dishonest picture of the CBO data.

The original chart plotted only the tax rate experienced by the middle 20% of the population. 
Redo_taxrate1The CBO provided data for all five quintiles; why not plot them all?  In this new chart (right), the "surprise" windfall to the middle 20% proved not to be anything special at all!  All five quintiles, especially the middle three, followed pretty much the same trend over time.  The effect of singling out the middle 20% is to deprive the context by which the data should be interpreted.

Further, what might be the result of the declining middle income tax burden?  Redo_taxrate3 The CBO data painted an unexpected picture.  Paradoxically, as the middle 20% see their tax rate decrease, they also earn a smaller share of the nation's after-tax income (black line at right).  At the same time, the top 1% saw their share of after-tax income double from about 8% to almost 16% (blue line).  The top 20% line is also upward-sloping although less pronounced.  So, the implication that the middle class have had it good is plainly wrong.

What is going on?  Two factors were at play and the Congressman presented
only one side of the story (the tax rate).  What he omitted was that during this period, the nation's wealthy took home larger and larger shares of the pre-tax income.  This shift in pre-tax income more than offset any relative reduction in tax rate for the middle 20%.

This distortion can be traced back to the use of quintiles (or more generally, ranks).  We use them to cope with data having extreme distributions but a by-product is losing information about how extreme are the extreme values.  As demonstrated here, the quintiles from old are really different from the quintiles from today because the underlying distribution has become much more extreme.

Finally, another bit of mystery (to me) is how the middle 20% came to be considered "middle class".  Is there a widely accepted definition?

Reference: "CBO Data Show Middle Income Debt Burden At Lowest Level in Decades", Feb 21 2008.