Mar 17, 2008

Lunar eclipse

Todd B. sent me this pie chart, with a note: "Do the areas in the pie chart represent the numbers?"

Overlapmsnyahoo

The short answer is NO. 

It's also not so simple to figure out the areas of crescents.  The purple area looks tiny compared to the dark green region.  If shown this chart, we get the impression that  Microsoft's intention to absorb Yahoo! will not vastly expand the number of unique visitors to its properties because so many of their current users overlap.



The following is a bar chart representation of the same data.  Redo_overlapThe combined entity will have 31% more users than what Microsoft has right now.  Not a bad growth rate for a mature business!  The author of the original post calculated that Microsoft would in effect be paying about $1000 each to acquire these new users. 

Perhaps the most important question is how one values a "unique visitor".  Have anyone seen any sophisticated analysis on this topic?


 

Mar 08, 2008

Chart cleanup

Anna E. submitted this great example from Yahoo! Green.  A well-meaning chart but stuffed with redundancy.
Yahoo_bostongreen

Much appear to be going on and yet the entire chart contains 15 data points, Boston's ranks on each of 15 categories.  The bar lengths convey the same information as the data labels.  The legend provides a catchy name for different levels of ranks (0-10 = "leader"; 10-20 = "advances"; etc.).  The colors merely reiterate the catchy titles.  Similarly, the colored squares repeat the information in the bars.

In the name of green, we cleaned up this chart:

Redo_bostongreen

As a standalone graph, the categories should be ordered by Boston's ranks.  Here, we assume that cross-referencing cities is needed so we leave the order unchanged.


Mar 04, 2008

Amazing baseballs

Reader Jonathan S. submitted this entry.

USA Today chartjunk:

Usa_drugreport_2



Recycled junkart (his chart):

Redo_drugreport2

Jonathan noticed that the scales were off (more likely, they began with an axis that did not start at zero!  This is precisely why most graphs should start at zero).

As an aside, pitchers used to point to their (frequently untoned) physique as proof that steroids could not help; now we know better.


 

Feb 10, 2008

Ordering and grouping

The Times reported that January retail sales generally disappointed, and consumers showed a preference for discount retailers over department stores.

Nyt_retailjan


Redo_retailjan

Taking the bar chart on the right, re-ordering by change in same-store sales, and grouping companies by type of retailer, we can present the data to match the text more closely.  The divergent performance between discount retailers and department stores is readily visible.












Reference: "Weak January dashed retailers' gift-card hopes", Feb 8 2008.

 

Feb 03, 2008

Redundancy

Nick B., who occasionally writes about statistical graphics, found some classic chart junk from a Canadian report on the Afghan army.  Here's one example, together with the junkchart version.Redoafghan_2

Redundancy is an enemy of good graphics, and incongruous redundancy is worse.  Here, troop level is variously described as "total force size", "strength" and "army growth"; the chart on the right uses only the army concept.  The data labels ("47000 Strength"), the axis labels ("50000 Total Force Size"), and the gridlines all germinate from the five grand data points underlying the entire chart!

Another distorting feature is that use of different-sized time intervals, which we space out appropriately on the right chart.

Ultimately, the key message should be growth in the army size, not the absolute number of troops.  The slopes of the line segments encode this information.  Alternatively, a data table can be rather powerful for simple data like this:

Redoafghan2 By what is called the "end state", there would be 70% more troops than those as of December 2007.

 


Jan 24, 2008

Oscar diseconomy

OscarBusiness Week dissected the beneficiaries of the Oscar show as shown on the right.  Although this doesn't work well as a data graphic, if thought as a variant on the data table, it is more engaging for readers.

Lets have some fun with the Oscar statue.  First, putting a bar chart next to the statue confirms that the height of the segments (rather than the area) is in proportion to the dollar values (below left).

Tufte, Chambers and others have shown that our eyes react to the areas, not heights.  So next, I estimated the areas but stretched them out into segments of equal width.  Squeezing the entire column back down to the height of the statue, the following chart (below right) puts perceived proportions next to the true proportions, displaying visually the extent of distortion. 

Redo_oscar


































Reference: "News you need to know", Business Week, Jan 28 2008.

Dec 25, 2007

Doctoring charts

Reader Chris P. alerted us to a fascinating post from Errol Morris' blog, which presents results in graphical form from a readers' poll related to this other post.  This other post deals with a pair of photographs taken during wartime, previously discussed by Susan Sontag and others.  Sontag believed the pair documented a before-and-after setting: it was alleged that the photojournalist shifted some cannon balls from their natural position between takes. 

Morris polled his readers asking them in which order they thought the photos were taken ("on before off", "off before on", "undecided"), and which factors were used to make the decision.  He presented results in two formats, first plotting frequencies in bar charts and then plotting proportions in pie charts.  He preferred the pie chart construct.

Nyt_sontag

Most here would share Chris' reaction: "Oh my.  What people do with Excel."

The biggest problem with these pie charts is the unreasonable baseline.  This is one of those polls that allow respondents to pick any number of factors and clearly, the pie chart creator used the 1,151 responses as the baseline, as opposed to 910 people who voted.  Consider these two statements:

  • 52% of respondents who decided "on before off" listed "sun shadow" as a decision factor
  • 30% of the decision factors submitted by respondents who decided "on before off" were "sun shadow"

It is tough to figure out what the second statement means.  It is as if the respondent who selects more than one factors gets more than one votes in the final tally.  To put it differently, the 30% is meaningless unless one also knows how many decision factors were selected by each respondent, on average and in distribution.  The 52% is independent of such consideration.

Combining the data given in the bar charts and pie charts, one discovered that 469 out of 910 respondents could not decide which photo was taken before the other; besides, these respondents on average expressed 0.9 opinions on the decision factors whereas the respondents who made a decision expressed 1.6 opinions.


A simple illustration to show the key decision variables by type of respondents is shown below.  Redo_sontag_2From this chart, one sees that the number and position of the cannon balls were crucial to at least 50% of those who came to a conclusion.  Sun shadow were much more important to those who decided "on before off" while those who decided "off before on" noticed character artistic, shelling and rocks.  Most other factors did not differentiate the three groups.

Source: "Not Your Mum's Apple Pie Chart", Errol Morris, Dec 18, 2007.


 

Oct 15, 2007

Sense of proportion

[I'm back from vacation.  Will provide my reaction to the responses to the Gelman challenge, and for those who have sent me email, I will work through them soon.]

The NYT commented on a trend among marketers to shift their advertising spending from so-called "measured" media like print and TV to so-called "unmeasured" media like product placements, contests, etc. 
The following chart accompanied the article:

Nyt_ads_2


This construct is akin to a population pyramid; it's great for comparing two groups along one metric, say age groups between males and females.  Here, the two halves aren't comparable groups but two different metrics.  The main metric, that is, the proportion of unmeasured, is not directly depicted: the reader must figure out mentally how much of each bar the black part covers.  Also, the companies are sorted by unmeasured media spending but this leaves the measured spending with a jagged profile, confusing matters.

As for the little white slits on the gray bars, they are admittedly cute but it is difficult to compare the detailed breakdown between print, TV and other media among companies.

The following dot plot gives the two halves equal weight.  Redoads1(Pink dots are measured, blue unmeasured.) It's not a very interesting graphic though. The sense of proportion is still missing.

I settled on a scatter plot which relates the proportion spent on unmeasured to the total amount of spending.  It appears that the largest advertisers had the lowest proportional unmeasured spend while the smallest (among the majors) had the highest.  (It's only a weak correlation: a linear fit yields only 16% R-squared.)
Redoads2


















Source: "The New Advertising Outlet: Your Life", New York Times, Oct 14, 2007.









Sep 23, 2007

Buffer time

As this report from the Department of Transportation makes clear, congestion on our roadways causes travellers to add "buffer time" to their planned journeys.  So, for instance, one may have to allocate 32 minutes for a trip that would have taken 20 minutes in uncongested traffic if one would like to guarantee on-time arrival.  The 12 minutes would either become time spent sitting on the road or wasted time due to arriving too early.

Buffer time can be applied to graphs too.  Some graphs require readers to spend time fishing out the information.  The chart used to illustrate travel time belongs to this category. 
Dottraveltime_2The clock analogy fails; in fact, it confuses matters as the hour hand just sits there serving no purpose.  The buffer time between staring and comprehending is too much!

Only four numbers underly this chart: travel time when uncongested and buffer time to guarantee on-time arrival, for 1982 and 2001.  The following version gets to the point without fuss. 
RedotraveltimeIt shows that the travel time increased significantly even under uncongested traffic; worse, the buffer time multiplied.

Reducing buffer time is always good but some buffer time may be inevitable.  In the traffic analogy, to eliminate all buffer time would mean lots of unused capacity.  In the context of graphs, more complicated charts would require more time; the key is whether the reader is rewarded for the time spent figuring out the chart.



Source: "Traffic Congestion and Reliability", Department of Transportation.

Sep 17, 2007

Structuring a chart

Nytmpg This chart from the NYT was intended to show how the EPA has moved the bar on vehicle mileage ratings: 2008 estimates were lower than 2007 estimates across the board, regardless of manufacturer, model and city/highway.

The chart was built from one basic component, repeated for each model. 
Nytmpgsm_2I like the discreet gridlines (the white ticks) which enable readers to count off the mileage ratings.

The data is rich: ratings were given along three dimensions (model, year of estimate and city/highway).  Readers can benefit from a stronger guidance in where to look for the most pertinent information.  As the chart stands, it is merely a container for the data.  It fails our self-sufficiency test: all the data were printed on the chart, and the bars add little.

In the junkart version, I use knowledge of the data to structure the chart. First, noting that sedans, hybrids and trucks/SUVs/minvans have different levels of mileage ratings, I clustered the models into three groups.  Secondly, the city and highway ratings were separated into two columns as I consider the between-model comparisons more important than city-highway comparisons. 
RedompgThe chart is a dot plot, with a vertical tick for 2007 estimates and a dot for 2008 estimates.  It's easy to see that all dots sit to the left of vertical ticks.

More subtly, we can also see that the hybrids appeared to have been penalized more.  Or perhaps, the higher the rating, the larger the downward adjustment...

Source: "Mileage Ratings Are Still Estimates, Though Closer to Reality", New York Times, Sept 16 2007.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31