« April 2009 | Main | June 2009 »

The trouble with two lines

It's never a good idea to put two scales on one chart, and this is another example of what not to do:

Ir_newspaper

The ugliest part of this chart are the duelling gridlines. Because neither axis starts at zero, it is difficult to access whether the number of newspapers was declining at a faster pace than the circulation was. Also, line charts would be better able to trace the evolution over time. The interspersed blue and red columns interfere with each other. Note that the designer lessened our pain by plotting every other year, thus halving the number of columns.

The junkart version also puts two data series on the same chart but on the same
scale. Instead of plotting the raw data, we plot indices, with 2008 as 100. This reveals a pattern that was not apparent in the original chart. There appeared to have been four periods of evolution: up till 1980, both the number of newspapers and total circulation were at a plateau; from 1980 to 1990, the circulation stayed stable while the number of papers dropped drastically, indicating perhaps consolidation; then from 1990 to 2003, both series declined at roughly equal rates; and finally, the bottom dropped off the circulation from 2003.


Redo_newspaper


See our previous discussion of dual axes here, here, and here


Reference: "Channel Shift: Online Circulars: the first step by retailers toward web-to-store harmony? ", Internet Retailer (print), May 2009. Data from the Newspapers Association of America.


Bonus 1: Thinking about the column interspersing trick some more, I realize that it is, em, possible, em, that one series was plotted for odd years and the other series plotted for even years!

Bonus 2: Here is the requested scatter plot (still indiced):

Redo_newspaper2




Supplemental reading

What are other graphics blogs talking about recently?

Subway_sparklines2 Information Aesthetics highlighted the so-called New York City Subway sparklines.   (original site)  (Andrew also mentioned it.)

IA said "
The general idea is that the history of subway ridership tells a story about the history of a neighborhood that is much richer than the overall trend." 

Okay but what about these sparklines would clarify that history?  From what I can tell, this is a case of making the chart and then making sense of it.

The chart designer did make a memorable comment in his blog entry: "Hammer in hand, I of course saw this spreadsheet as a bucket of nails."  The hammer is a piece of software he created; the nails, the data of trips taken.



Wsj_stresstest Nathan at FlowingData gave a reluctant passing grade to this Wall Street Journal bubbles chart illustrating the recent U.S. bank "stress" test.

One should fight grade inflation with an iron fist.  (Hat tip to Dean Malkiel at Princeton.)  A simple profile chart would work nicely since the focus is primarily on ranks.  The bubbles, as usual, add nothing to the chart, especially where one can create any kind of dramatic effect by scaling them differently.


Envy_map Nathan also pointed to the maps of the seven sins, which garnered some national attention.  This set of maps is a great illustration of the weakness of maps to study spatial distribution of anything that is highly correlated with population distribution.  Do cows have envy too?  See related discussion at the Gelman blog.





Spinning multi-color 2

Here are two more versions of the greenhouse gas chart.

The first one is a Marimekko which many would consider to be appropriate for this type of data.  It is essentially a stacked bar chart where the width of the bar is scaled to the proportion of the type of gas.  Here's what one would be looking at:

Redo_greenhouse2


Merimekkos (also called Mosaic charts) share many of the problems of pie charts.  Note the need to use multi-color, the difficulty in comparing the areas of the pieces (even worse than looking at sectors), and the difficulty in comparing across categories since the pieces float in irregular space (take for example the three pink pieces).  My rule is: avoid at all costs. (Well, like the pie chart, when the data is sufficiently simple, with very few pieces and with some outliers, these could be acceptable.)


Secondly, here is a recycled junkart chart, with all white space removed from the interior.  (Thanks to Derek for the suggestion.)

Redo_greenhouse3


Depending on what the purpose of the chart is, one can decide what is the base for the proportions.  My version preserves equity between the two dimensions.  Anything else will require the designer to make a choice.  If, for example, the base is 100% for each type of gas emitted, then the reader could not derive from the same chart the proportion of each source of emission (across all types of gases).



Spinning multi-color

New York Times has a great pointer to the Global Warming Art website.  The author Robert Rohde wants to popularize environmental science by visualization of the data.  There are many interesting charts and well worth repeated visits.

These pie charts cry out for some re-dressing:

Greenhouse_Gas_by_Sector

The pie charts, the colors, the whole works.  Most troubling is that each pie has its own sorting scheme, and because the text labels were not reproduced in the smaller pies, the reader is sent scrambling around to find the right labels.

In addition, these pie charts, as with almost every other pie chart, fail the self-sufficiency test.  Without all the data printed next to each sector, the reader is simply unable to judge the size of each sector.

Further, the aggregate data (larger pie) may not be as relevant after realizing that the smaller pies show very different patterns.  The following junkart version tries to bring out this fact by treating both dimensions (type of greenhouse gas; source of emission) equitably.

Redo_greenshouse


While I picked on this particular chart, I must say I support Robert's effort and wish him luck in this very well-intentioned project.




Sore-thumb graphics

A particular genre of graphics is designed to induce awe: certain bits are allowed to stick out like a sore thumb.  Via reader Andre L., and an archive of US Army medical photos and illustrations:


Sorethumb_graph

This is a small multiples graph designed to display the somewhat seasonal pattern of deaths due to influenza over years.  Basically, we see a U shape in almost every year; however, the height of the peak, and the timing of the peak shows quite a lot of variation.  Further, some years exhibit more of an L-shape than U-shape.

But the attention grabber here is the massive peak that occurred between 1918 and 1919.  It was unusual in many ways... it was the second big peak during 1918, it occurred late in the year and ellided with the next year's peak.  The designer allowed these two components to bleed into the other charts.

From the perspective of scale, readability, cleanliness, this bit sticks out like a sore thumb!  But one has to say it is effective.  

A log scale is often used to deal with data containing such outliers.  But while this makes neater charts, the impact of the orders-of-magnitude difference is lost on the reader, except in her imagination.


Animal racetrack

We introduced the racetrack chart before.  Via Zero Hedge, we find a version of it, perhaps a race for animals.  In a race for humans, they run in concentric circles; animals are not so tame, they may stray off the track, or just refuse to continue.

Zerohedgeintervention_arc

The designers certainly tried very hard to make the numbers palatable.  Indeed, given how much of our taxpayer funds are being thrown to the fire these days, any informed citizen ought to know how the money was being spent.  Their hard work, unfortunately, was not rewarded as the various constructs failed to improve our understanding of the data.

The three annotations on the right tell us that the arc width at the left indicates the allocated funds while the arc width at the right indicates the actual amounts spent as of end of April.  In addition, the breakpoint on each arc in relationship to the fan of lines indicate the date at which the funds were allocated. 

In reality, things are a bit more complicated.  When all allocated funds have been spent, as apparently the case of Fed funds for AIG, the arc has no break point and thus the date of the allocation is missing.  Also, when the same use soaks up funds from multiple sources, the width on the right gets confusing: take for example FDIC funds for unlocking credits; it's unclear how the two arcs add up to 1.8 trillion.

Perhaps a flow chart might work well for this sort of data.


Reference: "Visual Representation of the Government Intervention Programs", Zero Hedge blog, April 8 2009.