Mar 22, 2008

Trying too hard

In the course of business and governing, a lot of charts are generated.  An anonymous tipster pointed us to a set created by the "Communities and Local Government" division in the UK government.  Judging from the content, this division has responsibility for economic development in local neighborhoods.

Below are a pair of exhibits.  Truly they are trying too hard!  What we see is a hybrid scatter-bubble chart.  Between the jargon, the acronyms (LAD, LSOA), the boxed text, the multi-color circles, the colored axis labels and lack of title, the reader is plunged into a state of confusion.

Uk_communities3

The chart can be unraveled.  Each district was evaluated based on two measures of "gaps in worklessness".  The vertical axis compares each district to the national average; positive numbers indicate an above-average district relative to the nation.  The horizontal axis compares the most deprived 10% neighborhood within each district to the local average; positive numbers indicate worst neighborhoods improving. 

Thus, the policy goal would be to move all districts into the upper right quadrant.  The multi-color bubbles were designed to show us the state of the nation.  On the left chart, 41% of the districts (or population?) reside in the improving districts while 19% live in deteriorating areas.

The following strategies can help improve readability:

  • Redo_communities3use English on the axis
  • relegate technical definitions to the legend
  • add succinct title to tell the story
  • use color on the data rather than on axis or data labels
  • use color to draw attention to the upper right quadrant
  • remove bubbles
  • define acronyms

 

Feb 25, 2008

Playful and exploratory

I share reader Bernard L.'s enthusiasm for this very imaginative chart, courtesy of the graphics people at NYT.  The chart captures the ebb and flow of weekly movie receipts over the last two decades.
Nyt_films
The details that particularly interest me include:

  • The addition of area colors (on top of lines) serves to highlight box office successes; this really helps readers sort out the massive amount of data
  • Nicely spaced text (and dots) does not interfere with our reading of the chart
  • The hiding of text for less important films, plus taking advantage of interactivity to show their titles if the reader mouses over the respective areas

All of the above indicate a keen sense of foreground versus background.  Besides, the authors had the good sense to speak of inflation-adjusted box office sales; I'm tired of the movie industry proclaiming higher sales each year when ticket prices are rising, and the population is growing.

This is another chart where more data do not easily translate into better communication (see my guest post at Flowing Data).  While I like the playful nature of the interactive chart, it is left to the reader to discover the information buried in the data, such as the assertion in the header that Oscar-winning films typically take time to attain box-office success while many blockbusters do not Oscars make.

In this presentation, it is challenging to compare the total receipts of one film versus another (this requiring comparing oddly shaped, partially obscured areas).  It is also hard to compare across years since the data is spread out over a lot of space.

There may really be two types of graphics: the one like the example here which is a dictionary and designed for exploration; and the other kind where the designer has selected a subset of the data to make a specific point.

Reference: "The ebb and flow of movies", New York Times, Feb 23 2008.

Jul 21, 2007

Exception to the rule

It's pretty hard to decree hard-and-fast rules for graphical design; every rule seems to admit its exception.  This reinforces Tufte's contribution as he has successfully organized the rules in his collection of books.

Dustin J sent in this chart from the Economist.  Its first impression is ugly and overly complex.

Econ_petrol

Dustin commented:

Steven Few says not to use stacked bar charts because you cannot compare individual values very easily and as a rule I avoid stacked bars with more than six or seven divisions. What do you think of this stacked bar--I think it is quite effective in telling the story.

On this blog, I have also re-done some stacked bar charts but this one is truly an exception to the rule.  The reason why this one works is that it's not about the individual components, it's showing that the US consumes more than all those countries combined. 

If only it has the proper caption!  The Economist is uncharacteristically detached here: "Petrol consumption per day", "Litres bn, 2003".  How about "Goliath v. Davids"?  "US v. the World"? "Dream Team USA"?

It'd help if they tone down the colors; also, by simply annotating the total litres for the US and the total for the other countries, they would have made a clearer point without using gridlines.  But these are minor glitches in an otherwise effective chart.

Source: Economist, July 2007.

Jun 19, 2007

Wizardry

An anonymous reader dropped a comment pointing us to Martin Wattenberg's gallery at Business Week.  Martin's work falls into the category of information visualization, which typically concerns cramming as much high-dimensional data as possible onto 2D or 3D displays, augmented heavily by colors, shapes, interactivity, superpositioning and other tricks.  Often pleasing to the eye, these graphics usually take time to warm up to.  Sites like Infosthetics and Visual Complexity cover them well.

Mw_baby Martin is responsible for the baby names visualization, which tracks the popularity of names over the years.















Mv_treemap_2 Martin also created treemaps like this one.  Does this show relative stock performance better than other designs?

Jun 17, 2007

Foreground, background

Derek C. points us to this effort by a science journalist to use graphs to help "clarify the concept of climate change".  The graph on the left shows that actual greenhouse gas emissions have exceeded the level predicted by the most pessimistic climate models.  The 3D bar chart on the right examines which countries had most increased emissions since 1990. Warming

While the bar chart contains many of Tufte's "ducks" (not sorted by percent change, 3D, color, gridlines, sufficiency, etc.), it's the left chart that can be made more powerful.  Redo_warming2

The casual observer does not need to know which model led to which trajectory of predictions; the graph is vastly simplified, and the message much clearer in the junkart version.  (I only included the CDIAC data because I didn't locate the EIA numbers.)

The general point here is recognizing what is foreground, and what is background.  Aside from gridlines, data labels, axis labels and so on, some of the data usually constitute background material, often as in this case being used to establish comparability.

One message I got out of this chart is that these climate models have done a good job!  (Now, I have no idea if part of the curve included the training period.  It is curious that the predictions were very narrowly contained in the early 1990s.)

Source: The Island of Doubt Blog, June 6, 2007.

Feb 27, 2007

Mean and median

In the comments of the last post on on-line weather forecasts, Hadley raised the evergreen statistical question of mean vs median.  In this context, median error is unaffected by particular days in which the forecaster makes extreme errors while mean error takes into account the magnitude of every forecasting error in the sample.

Which one to use depends on the situation.  Brandon, who did the original analysis, was motivated by planning a trip to a unfamiliar location.  In this case, he might be better served by lower mean error, which would imply few extremely bad forecasts.

On the other hand, if I am interested in my local weather, then I'd likely be less concerned about a few extremely bad forecasts, and more concerned that the forecast is on the money on most days.  Then perhaps the median error would come into play.

Redoonlineweather2 It turns out it doesn't much matter for our weather forecast data.  In this new chart, I superimposed the mean error data (in black).  The scatter of points was exactly as it was for median error (in red).  (MSN had a particularly bad forecast for a low temperature one day, which pulled its location to the left.)

This shows further that the difference between CNN, Intellicast and The Weather Channel is negligible.

Feb 12, 2007

Horrid stuff

Ec_smoke Small multiples can work wonders when data are replicated, as in this case.  The chart accompanied an Economist article on pollution levels in several European cities, as indicated by the concentration of nitrogen dioxide and particulates.

In the junkart version, I plotted the data series side by side, rather than one over the other.  Further, the order of cities was according to decreasing levels of NO2, which seemed to be the worse pollutant.  All gridlines are removed except the 30 line which worked pretty well to separate out the highly polluted cities.

Redopollutant An odd pattern has now surfaced.  Namely, there is some degree of negative correlation between the concentration of the two pollutants.  Environmental scientists may be able to tell us why.


Reference: "The Big Smoke", Economist, Feb 3 2007.

Sep 12, 2006

Working with lines

Here's how a great idea can be made better.

Nyt_pharmacies

The unifying axis on the right hand side, described as "comparable percentage-change scales", is a great concept.  The data being plotted are the cumulative percent return for each stock from the start of 2006 to the day of publication.

Redo1apharmacies_1If the three lines are superimposed, we can see the relative performance throughout the year.  Within these three stocks, Walgreens has clearly underperformed until recently.  Also, plotting weekly rather than daily returns reduces clutter.  The only grid-line of importance is the 0% line, which is what is left.

In addition, the three other axes, depicting actual prices, are redundant; removing them significantly enhances readability. 

Some will insist that actual prices must be shown; the following includes key bits of data in a subtle way.

Redo2_pharmacy





Reference: "Drugstores are Looking More Like a Growth Story", New York Times, Sept 10 2006.

Sep 07, 2006

Rushing to judgement

Charting, since the great John Tukey spoke, has been recognized as a key subject of "exploratory" data analysis.  Starting with a battery of hypotheses, one can use charts to examine them, reject those not viable, and for the viable ones, search for the best perspective.

When the order is subverted, that is, when the conclusion is fixed before charts drawn, the result is often embarrassing.  This cited example is perhaps a result of such.

Nyt_knightridderThe header confidently announced: "since ... November 2005, most newspaper stocks have done poorly". 

Of six stocks shown, McClatchy really did poorly; Gannett and NYT weren't much better; however, Tribune appeared to be on the upswing, Dow Jones was also stable, and Knight Ridder was up.

Moreover, in order to fully appreciate an "industry challenged", one needs to establish comparability by including the performance of an index, say the S&P or the Dow.  When this is done, one realizes that the whole group of stocks have underperformed the general market (The Dow Jones average hovered between 0% and 10% during this period.)


Reference: "What-ifs of a Media Eclipse", New York Times, Aug 27, 2006

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31