Apr 27, 2008

Running in the rain

Reader Eduardo is unhappy about the embellishments in this Nikeplus chart of miles ran by day; "pretty but misleading" he wrote us to say.  This is a clear case of more is less.

Nikeplus


As a data graphic, it doesn't work.  The reflections don't work.  Perhaps Nike wants to remind all you super-dedicated Nano-wearing runners what it's like to run in mist or rain!  To quote Eduardo: "The bars start at -1! I guess it is motivation."  An extra mile for everyone.  The rounded corners make it harder to read the level.

Startat8Speaking of bar charts, I want to follow up on an exchange from March.  In that example, we claimed that not starting bars at zero misrepresented the relative lengths of those bars.  The chart showed counts of baseball players implicated in the Mitchell Report by position.

This distortion arises from taking the same length off each bar regardless of the data.  As a result, the ratios of the lengths between the bars have been changed drastically.

For example, the ratio of P/3B in the top chart is 31/9 = 3.4 but in the bottom chart, it is 23/1 = 23!




Mar 28, 2008

Two books

Nathan from FlowingData announces a competition to win Tufte's classic book on visual representation of data.   There are still a few days left to participate.  While his more recent books start getting repetitive, he still has published one of the most accessible books on this topic.

I also had the pleasure of reading Naomi Robbins' Creating More Effective Graphs.  She adopts a cookbook format providing hints on graphs in one, two and more dimensions, scales, visual clarity and so on.  Since she has already read Cleveland, Tufte, etc., she manages to put all that learning inside on cover.  The page design - with half of every page blank - is refreshingly easy on the eyes.  Inclusion of examples is generous. 

Lets review her point of view of some of the topics we discuss frequently on Junk Charts:

Starting axis at zero: she thinks "all bar charts must include zero.  However, the answer is not as clear for line charts or other charts for which we judge positions along a common scale." (p.240)

Jittering: she does not provide a clear guideline but gave an example of a strip chart with jittered dots, commenting that "it gives a much better indication of the distributions than would a plot without jittering" (p.85) so I infer that she's generally in favor.

Parallel coordinates plot / profile plot: she provides an example of such a plot on p.141 and describes how to read such a plot.  Again, I infer she's in favor.

Mar 17, 2008

Lunar eclipse

Todd B. sent me this pie chart, with a note: "Do the areas in the pie chart represent the numbers?"

Overlapmsnyahoo

The short answer is NO. 

It's also not so simple to figure out the areas of crescents.  The purple area looks tiny compared to the dark green region.  If shown this chart, we get the impression that  Microsoft's intention to absorb Yahoo! will not vastly expand the number of unique visitors to its properties because so many of their current users overlap.



The following is a bar chart representation of the same data.  Redo_overlapThe combined entity will have 31% more users than what Microsoft has right now.  Not a bad growth rate for a mature business!  The author of the original post calculated that Microsoft would in effect be paying about $1000 each to acquire these new users. 

Perhaps the most important question is how one values a "unique visitor".  Have anyone seen any sophisticated analysis on this topic?


 

Mar 08, 2008

Chart cleanup

Anna E. submitted this great example from Yahoo! Green.  A well-meaning chart but stuffed with redundancy.
Yahoo_bostongreen

Much appear to be going on and yet the entire chart contains 15 data points, Boston's ranks on each of 15 categories.  The bar lengths convey the same information as the data labels.  The legend provides a catchy name for different levels of ranks (0-10 = "leader"; 10-20 = "advances"; etc.).  The colors merely reiterate the catchy titles.  Similarly, the colored squares repeat the information in the bars.

In the name of green, we cleaned up this chart:

Redo_bostongreen

As a standalone graph, the categories should be ordered by Boston's ranks.  Here, we assume that cross-referencing cities is needed so we leave the order unchanged.


Mar 04, 2008

Amazing baseballs

Reader Jonathan S. submitted this entry.

USA Today chartjunk:

Usa_drugreport_2



Recycled junkart (his chart):

Redo_drugreport2

Jonathan noticed that the scales were off (more likely, they began with an axis that did not start at zero!  This is precisely why most graphs should start at zero).

As an aside, pitchers used to point to their (frequently untoned) physique as proof that steroids could not help; now we know better.


 

Feb 10, 2008

Ordering and grouping

The Times reported that January retail sales generally disappointed, and consumers showed a preference for discount retailers over department stores.

Nyt_retailjan


Redo_retailjan

Taking the bar chart on the right, re-ordering by change in same-store sales, and grouping companies by type of retailer, we can present the data to match the text more closely.  The divergent performance between discount retailers and department stores is readily visible.












Reference: "Weak January dashed retailers' gift-card hopes", Feb 8 2008.

 

Feb 03, 2008

Redundancy

Nick B., who occasionally writes about statistical graphics, found some classic chart junk from a Canadian report on the Afghan army.  Here's one example, together with the junkchart version.Redoafghan_2

Redundancy is an enemy of good graphics, and incongruous redundancy is worse.  Here, troop level is variously described as "total force size", "strength" and "army growth"; the chart on the right uses only the army concept.  The data labels ("47000 Strength"), the axis labels ("50000 Total Force Size"), and the gridlines all germinate from the five grand data points underlying the entire chart!

Another distorting feature is that use of different-sized time intervals, which we space out appropriately on the right chart.

Ultimately, the key message should be growth in the army size, not the absolute number of troops.  The slopes of the line segments encode this information.  Alternatively, a data table can be rather powerful for simple data like this:

Redoafghan2 By what is called the "end state", there would be 70% more troops than those as of December 2007.

 


Jan 24, 2008

Oscar diseconomy

OscarBusiness Week dissected the beneficiaries of the Oscar show as shown on the right.  Although this doesn't work well as a data graphic, if thought as a variant on the data table, it is more engaging for readers.

Lets have some fun with the Oscar statue.  First, putting a bar chart next to the statue confirms that the height of the segments (rather than the area) is in proportion to the dollar values (below left).

Tufte, Chambers and others have shown that our eyes react to the areas, not heights.  So next, I estimated the areas but stretched them out into segments of equal width.  Squeezing the entire column back down to the height of the statue, the following chart (below right) puts perceived proportions next to the true proportions, displaying visually the extent of distortion. 

Redo_oscar


































Reference: "News you need to know", Business Week, Jan 28 2008.

Dec 25, 2007

Doctoring charts

Reader Chris P. alerted us to a fascinating post from Errol Morris' blog, which presents results in graphical form from a readers' poll related to this other post.  This other post deals with a pair of photographs taken during wartime, previously discussed by Susan Sontag and others.  Sontag believed the pair documented a before-and-after setting: it was alleged that the photojournalist shifted some cannon balls from their natural position between takes. 

Morris polled his readers asking them in which order they thought the photos were taken ("on before off", "off before on", "undecided"), and which factors were used to make the decision.  He presented results in two formats, first plotting frequencies in bar charts and then plotting proportions in pie charts.  He preferred the pie chart construct.

Nyt_sontag

Most here would share Chris' reaction: "Oh my.  What people do with Excel."

The biggest problem with these pie charts is the unreasonable baseline.  This is one of those polls that allow respondents to pick any number of factors and clearly, the pie chart creator used the 1,151 responses as the baseline, as opposed to 910 people who voted.  Consider these two statements:

  • 52% of respondents who decided "on before off" listed "sun shadow" as a decision factor
  • 30% of the decision factors submitted by respondents who decided "on before off" were "sun shadow"

It is tough to figure out what the second statement means.  It is as if the respondent who selects more than one factors gets more than one votes in the final tally.  To put it differently, the 30% is meaningless unless one also knows how many decision factors were selected by each respondent, on average and in distribution.  The 52% is independent of such consideration.

Combining the data given in the bar charts and pie charts, one discovered that 469 out of 910 respondents could not decide which photo was taken before the other; besides, these respondents on average expressed 0.9 opinions on the decision factors whereas the respondents who made a decision expressed 1.6 opinions.


A simple illustration to show the key decision variables by type of respondents is shown below.  Redo_sontag_2From this chart, one sees that the number and position of the cannon balls were crucial to at least 50% of those who came to a conclusion.  Sun shadow were much more important to those who decided "on before off" while those who decided "off before on" noticed character artistic, shelling and rocks.  Most other factors did not differentiate the three groups.

Source: "Not Your Mum's Apple Pie Chart", Errol Morris, Dec 18, 2007.


 

Oct 15, 2007

Sense of proportion

[I'm back from vacation.  Will provide my reaction to the responses to the Gelman challenge, and for those who have sent me email, I will work through them soon.]

The NYT commented on a trend among marketers to shift their advertising spending from so-called "measured" media like print and TV to so-called "unmeasured" media like product placements, contests, etc. 
The following chart accompanied the article:

Nyt_ads_2


This construct is akin to a population pyramid; it's great for comparing two groups along one metric, say age groups between males and females.  Here, the two halves aren't comparable groups but two different metrics.  The main metric, that is, the proportion of unmeasured, is not directly depicted: the reader must figure out mentally how much of each bar the black part covers.  Also, the companies are sorted by unmeasured media spending but this leaves the measured spending with a jagged profile, confusing matters.

As for the little white slits on the gray bars, they are admittedly cute but it is difficult to compare the detailed breakdown between print, TV and other media among companies.

The following dot plot gives the two halves equal weight.  Redoads1(Pink dots are measured, blue unmeasured.) It's not a very interesting graphic though. The sense of proportion is still missing.

I settled on a scatter plot which relates the proportion spent on unmeasured to the total amount of spending.  It appears that the largest advertisers had the lowest proportional unmeasured spend while the smallest (among the majors) had the highest.  (It's only a weak correlation: a linear fit yields only 16% R-squared.)
Redoads2


















Source: "The New Advertising Outlet: Your Life", New York Times, Oct 14, 2007.









Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31