« August 2006 | Main | October 2006 »

Where are the crimes?

Msn_crimeThe author of this data table and the readers are asking the same question, "Where are the crimes?", but for different reasons.

While the author wanted to convey regional differences in crime growth, as readers, we are not sure which part of the table to look at; every cell is given equal "weight".

Redo_crimeJudging from this "profile plot", we can conclude:

  • the Mid-West (blue line) experienced a crime spurt that is very much worse than the national average (dots) in all categories except forcible rapes and murder
  • the West (red line), in general, had crime increases less severe than the national average
  • that said, the regional profiles are relatively similar, showing few meaningful regional differences (compared to other profile plots I've seen)

Reference: "Communities Grapple With Rise in Violence", MSNBC.com
Thanks to Maya for sending in the link.

Small and beautiful

Nyt_allegiantThe creater of this map understands small is beautiful: simple concepts deserve simple charts.

As discussed in the NYT article, Allegiant's business model is small and beautiful -- rather than focusing on popular routes between major cities like most startup airlines, Allegiant serves a web of routes going to just two destinations.

In this map, the two destinations are clearly labeled; all the originating cities are marked with those serving both Las Vegas and Orlando highlighted.  Extra information is provided through shading of the States served, and through the route lines (roughly indicating distance / time).

This simple chart can be made simpler by removing the route lines.  Not much is lost by removing them.

Reference: "Flying Where Big Airlines Aren't", New York Times, Sep 21 2006.


Econ_muslimsReaders may have noticed that I'm not a fan of the graphics aesthetics of the Economist.  (I love their subtle sarcasm, a way of saying something without saying it.  For example, the title of this chart is "where they are".  They let us read any meaning into the word "they".  As for their charts, I have taken issue on several occasions.)

This particular example uses one of their standard formats, stacked bars with an extra data series tagged on the right, its boxed annotation calling attention to itself.  It's a case of too much apparatus for a simple task.

The chart's purpose is to show that the US and France have the largest Muslim populations by numbers while France is by far the top country by percentage.

Redo_muslimsOur junkart version is very much cleaner.  Line segments indicating the low, mid and high estimates replaced the stacked bars (which falsely imply significance in adding the low and high estimates).  As usual, the minimum of gridlines and axes is used.  Instead of jamming two ideas onto one chart, if percentages are more important, then a separate chart should be produced, now ordered by decreasing percentages (see below).

The most crucial improvement is the fine print.  Perhaps extending their subtle sarcasm too far, the chart maker omitted context for interpreting the data: namely, that the low-mid-high range represents estimates by up to 5 different sources, each using potentially different methodologies for estimation.  This partially explains the huge variance in estimates for the US (or does it?).

Redo2_muslimsAlso missing is a comment on why these particular 6 countries were selected.  It may give a misleading picture of "where they are" in the context of world population.

Reference: "Where They Are", Economist, June 2006.


Much data, zero info

The number crunching college football fans at the Wall Street Journal wondered out loud:

One of the biggest developments in college football in recent years was the decision by Virginia Tech and Miami -- perennial top-20 teams -- to leave the Big East conference and join the Atlantic Coast Conference.  How much has that strengthened the ACC?

Wsj_accThe data table on the right was ostensibly the answer.  Readers were drawn to the bolded numbers, the almost identical winning percentages of ACC and SEC (averaged over the last decade, as the text explained).

The question is a classic one of cause and effect: did the addition of two strong teams cause the ACC to become stronger?  Startlingly, the data cited was useless, and the analysis conducted irrelevant.

First, the difference in winning percentages between ACC and SEC is the wrong metric.  Something more pertinent is, for example, the change in winning percentage of ACC before and after the team additions.

Second, the observation period is seriously mistaken.  The ACC expansion occurred in 2004 so average winning percentages from 1995-2005 have zilch to say about its effect.

Third, a Web search uncovers that major realignment occurred again in the ACC in 2005, making it very difficult to isolate the effect of adding Virginia Tech and Miami in 2004.

Thus, the data table contains zero information for addressing the stated problem.  How to measure the effect properly seems to me a tall order, and a good discussion topic.

Besides the iffy statistics, it is also impossible to read this table.  The data in the lower left triangle is a reflection of those in the upper right triangle, containing no new information.  Head-to-head conference comparisons seem to serve no purpose.  Actual win-loss numbers create clutter while adding no insight.  (Theoretically, the larger the number of contests between any two conferences, the more reliable are the winning percentages.  Confidence intervals is a much better way to present such information but even those would be over-kill for our purpose.)

Reference: "College Football's Power Struggle", Wall Street Journal, Sept 16-17, 2006.

Working with lines

Here's how a great idea can be made better.


The unifying axis on the right hand side, described as "comparable percentage-change scales", is a great concept.  The data being plotted are the cumulative percent return for each stock from the start of 2006 to the day of publication.

Redo1apharmacies_1If the three lines are superimposed, we can see the relative performance throughout the year.  Within these three stocks, Walgreens has clearly underperformed until recently.  Also, plotting weekly rather than daily returns reduces clutter.  The only grid-line of importance is the 0% line, which is what is left.

In addition, the three other axes, depicting actual prices, are redundant; removing them significantly enhances readability. 

Some will insist that actual prices must be shown; the following includes key bits of data in a subtle way.


Reference: "Drugstores are Looking More Like a Growth Story", New York Times, Sept 10 2006.

Rushing to judgement

Charting, since the great John Tukey spoke, has been recognized as a key subject of "exploratory" data analysis.  Starting with a battery of hypotheses, one can use charts to examine them, reject those not viable, and for the viable ones, search for the best perspective.

When the order is subverted, that is, when the conclusion is fixed before charts drawn, the result is often embarrassing.  This cited example is perhaps a result of such.

Nyt_knightridderThe header confidently announced: "since ... November 2005, most newspaper stocks have done poorly". 

Of six stocks shown, McClatchy really did poorly; Gannett and NYT weren't much better; however, Tribune appeared to be on the upswing, Dow Jones was also stable, and Knight Ridder was up.

Moreover, in order to fully appreciate an "industry challenged", one needs to establish comparability by including the performance of an index, say the S&P or the Dow.  When this is done, one realizes that the whole group of stocks have underperformed the general market (The Dow Jones average hovered between 0% and 10% during this period.)

Reference: "What-ifs of a Media Eclipse", New York Times, Aug 27, 2006