« October 2006 | Main | December 2006 »

Dropped, just like that

Quakeradsm_1 Frank W. sent in a timely reminder of the start-at-zero rule.  This ad from Quaker Oats pitches the impossible: the smart consumer will never believe that cholesterol levels can be dropped just like that!  According to Frank's measurement, the column heights plunged 77% from Week 1 to Week 4 in this chart.

In fact, if the vertical axis had started from 0, then the drop would more appropriately appear to be 5%.  Now, even that would have been a miracle, in my opinion.

Thus, I would like to know what is a "point" of cholesterol, and what do they mean by "representative" drop.  I suppose they are asking me to call that number.

My previous posts (with commentary from readers) about starting at zero can be found here and here.

Wading in waste

Sciam_bacteria A poor graphic leaves readers wading in waste, in this case, the waste of time.  (Thanks to a tip from Dr. Bruce W.)

This very busy chart conveys a simple research finding, that the density of bacteria increases with the prevalence of impervious surfaces.  As Bruce pointed out, underlying this chart is but six observations taken at selected tidal creeks, each observation being a (paired) measurement of bacteria count and prevalence of impervious surfaces.

A factory worth of graphical elements was employed, including columns, pies, colors, data labels, legends and so on.  The result is utter confusion.  How is it that the tip of each column does not coincide with the center of each pie?  Do equal-sized pies imply equal surface areas?  What is the bacteria count at each location?

Redo_bacteriaA scatter plot brings out the key correlation with minimal fuss.

Reference: "Wading in Waste", Scientific American, June 2006

New look

Junk Charts has adopted a new look.  There is now a categories section on the right.  I have vastly increased the category tags but please be patient as I slowly re-categorize old posts.  Happy browsing!  Let me know how you like the new look.

Flight of fancy


The venerable Wired magazine has surely gone too far with this flight of fancy!  Consider:

  • The zig-zagging lines streaming across the map
  • The redundant white dots, each of equal size, contradicting the black dots, with size proportional to prevalence
  • The inexplicable use of 00, 01, 02, ...
  • The use of a taller column for human cases, when tallied, amounting  to about 1/20 the number for bird cases
  • The inclusion of Australia (with zero cases) while excluding the Americas (also zero cases)
  • Ordering the countries neither by bird nor human cases but by convenience of placement on the map

Redoh5n1As with a previous example, the map adds nothing to the data except for providing a lesson in geography.  We prefer a parallel bar chart, shown on the right.  Here, the continents are given different colors.  In an unusual move, I chose different scales for each side as I am more interested in the distribution among countries, rather than the relative prevalence of bird/human cases.

Reference: "Flight H5N1: Delayed", Wired Magazine, October 2006.

Poll numbers

The Political Arithmetik blog has great graphics pertaining to, surprise surprise, political matters.  I really like the ones portraying Presidential approval ratings. 

Bushfullterm20061022This chart plots all the different polls (grey dots) at once; the blue line is the estimated approval rate over time while the scatter of grey dots provides an estimate of the reliability of the blue line. 

Different polls are different random samples of the population.  Random sampling is not fool-proof; any one sample has a chance, albeit small, to poorly represent the population.  That's why the dots add greatly to the chart.

ApprovalatmidtermDerek pointed me to a different chart, a simple dot plot that shows Bush's 2006 mid-term approval rate was the 2nd worst since 1946.  To paraphrase him, this is a scenario in which the chart does not add much because the underlying data is a simple ranked list.

He also suggested differentiating the 2nd term presidents from the one-termers. 

Shown below is another view of the data,  emphasizing the time dimension.  The linked dots represent two-term presidents.  The gridlines delineate the minimum, average and maximum approval ratings over time.  Another line shows Bush's 2006 approval rating, which is the 2nd worst since 1946.   Redo_approvalrate

Lots of other great charts at this blog.  Check them out.


Calming the rip tide

Xan Gregg at Forth Go helpfully scraped the auto market share data off the NYT chart discussed here before.  He even created an improved chart based on histograms.

I have created another view of the data, using boxplots.  Tukey's boxplot is one of the most spectacular graphical inventions, as I have said before (see here, for example).  Its power is evident again for this data set.

Redo_autoshares_1 This chart is in fact two boxplots superimposed on the same surface.  I forgot to put on the legend: the green boxes represent U.S. market shares, and the blue boxes Europe shares.

The automakers are ordered by decreasing U.S. market shares (with apologies to European readers).

Lots of information can be immediately read off this chart:

  • The European market is much more fragmented than the U.S. market.
  • The Big 2 (GM, Ford) has had mixed fortunes over this period (as indicated by the large variance)
  • The Big 2 are competitive in Europe although they are definitely not dominant there
  • Several key players in Europe (Peugot, Renault, Fiat, BMW) have negligible shares in the U.S

Most importantly, there is little evidence that the U.S. market is "looking more like Europe".

One weakness of the above chart is the suppression of temporal information: there is no indication whether the recent shares are moving to the left or the right of the medians (center of each box). 

In the next chart, with the Europe data removed, I highlighted the data for the most recent 5 years in red.  I can make the general statement that there is a small movement towards less concentration and more parity in the U.S. market but one have to conclude that the U.S. market shares in 2000-2006 look more similar to the U.S. market shares in 1990-1999 than to Europe market shares.


P.S. I added legends to the charts.

Finding dots

Erik W. alerted me to this CNN map that shows FBI statistics about safety of American cities.  As Eric pointed out, this is prototypical of chartjunk a la Tufte.  A lot of ink is used to depict 12 points of data (top 3 cities in safety, crime, improvement and decline).

Cnn_safest Imagine the reader trying to find the 3rd most improved city.  She either has to find all the blue dots and then figure out which is #3; or she needs to find all the #3 dots and figure out which is blue.  As they say, it's "hard work".  In fact, finding the dots among the forest of large text is hard work by itself!

How would I re-make this chart?

  • Highlight only the states containing data (California, Michigan, Missouri, Ohio, Georgia, New Jersey, New York); gray out all other states and their boundaries
  • Separate the states from the cities; only write the State name once for each State; reduce the font size
  • Instead of dots, use numbers.  So the most dangerous city (St Louis) gets a red "1", Oakland gets a purple "3", etc.
  • Remove Mexico, Canada and water from the map

The map gives the false impression that crime is relevant only along the coasts and the lakes, when in fact, the map is just saying that most cities in the U.S. are located along the coasts and the lakes.  Using such a map to depict city-level statistics creates distortion because cities are not evenly distributed across America.

Beyond that, what is the point of this map?  Is it merely a geography class telling us where each city is located?  How is it better than a simple table listing the cities in order?   

Reference: "U.S. City Safety Rankings", CNN, 2006.