Mar 30, 2008

Small multiples re-imagineered

Nyt_disney

This chart gave me trouble.  I kept staring at it, staring.  Searching for the legend.  What could the several lines, in different colors, represent?  Take a look yourself.




Well, it turns out all three graphs were duplicates.  A different line was given dark blue to highlight a particular amusement park.

I have not seen this tactic used before.  This is like a small multiples concept except that every chart contains the same data.  Is it better than having just one chart?

Reference: "Will Disney Keep Us Amused?", New York Times, Feb 10 2008.




PS. [4/6/2008]  Here are two alternative charts contributed by our readers.  See comments below.

Derek suggested using sparklines:

Redo_parks1

Zuil reverted to basics:

Redo_parks2

Mar 17, 2008

Lunar eclipse

Todd B. sent me this pie chart, with a note: "Do the areas in the pie chart represent the numbers?"

Overlapmsnyahoo

The short answer is NO. 

It's also not so simple to figure out the areas of crescents.  The purple area looks tiny compared to the dark green region.  If shown this chart, we get the impression that  Microsoft's intention to absorb Yahoo! will not vastly expand the number of unique visitors to its properties because so many of their current users overlap.



The following is a bar chart representation of the same data.  Redo_overlapThe combined entity will have 31% more users than what Microsoft has right now.  Not a bad growth rate for a mature business!  The author of the original post calculated that Microsoft would in effect be paying about $1000 each to acquire these new users. 

Perhaps the most important question is how one values a "unique visitor".  Have anyone seen any sophisticated analysis on this topic?


 

Feb 25, 2008

Playful and exploratory

I share reader Bernard L.'s enthusiasm for this very imaginative chart, courtesy of the graphics people at NYT.  The chart captures the ebb and flow of weekly movie receipts over the last two decades.
Nyt_films
The details that particularly interest me include:

  • The addition of area colors (on top of lines) serves to highlight box office successes; this really helps readers sort out the massive amount of data
  • Nicely spaced text (and dots) does not interfere with our reading of the chart
  • The hiding of text for less important films, plus taking advantage of interactivity to show their titles if the reader mouses over the respective areas

All of the above indicate a keen sense of foreground versus background.  Besides, the authors had the good sense to speak of inflation-adjusted box office sales; I'm tired of the movie industry proclaiming higher sales each year when ticket prices are rising, and the population is growing.

This is another chart where more data do not easily translate into better communication (see my guest post at Flowing Data).  While I like the playful nature of the interactive chart, it is left to the reader to discover the information buried in the data, such as the assertion in the header that Oscar-winning films typically take time to attain box-office success while many blockbusters do not Oscars make.

In this presentation, it is challenging to compare the total receipts of one film versus another (this requiring comparing oddly shaped, partially obscured areas).  It is also hard to compare across years since the data is spread out over a lot of space.

There may really be two types of graphics: the one like the example here which is a dictionary and designed for exploration; and the other kind where the designer has selected a subset of the data to make a specific point.

Reference: "The ebb and flow of movies", New York Times, Feb 23 2008.

Feb 10, 2008

Ordering and grouping

The Times reported that January retail sales generally disappointed, and consumers showed a preference for discount retailers over department stores.

Nyt_retailjan


Redo_retailjan

Taking the bar chart on the right, re-ordering by change in same-store sales, and grouping companies by type of retailer, we can present the data to match the text more closely.  The divergent performance between discount retailers and department stores is readily visible.












Reference: "Weak January dashed retailers' gift-card hopes", Feb 8 2008.

 

Jan 24, 2008

Oscar diseconomy

OscarBusiness Week dissected the beneficiaries of the Oscar show as shown on the right.  Although this doesn't work well as a data graphic, if thought as a variant on the data table, it is more engaging for readers.

Lets have some fun with the Oscar statue.  First, putting a bar chart next to the statue confirms that the height of the segments (rather than the area) is in proportion to the dollar values (below left).

Tufte, Chambers and others have shown that our eyes react to the areas, not heights.  So next, I estimated the areas but stretched them out into segments of equal width.  Squeezing the entire column back down to the height of the statue, the following chart (below right) puts perceived proportions next to the true proportions, displaying visually the extent of distortion. 

Redo_oscar


































Reference: "News you need to know", Business Week, Jan 28 2008.

Jan 15, 2008

Water and wine

Marketers have always argued that price signals quality; this leads to the startling idea that one should just set a high price. 

If you don't believe it, note how Coca Cola and Pepsi turned tap water into a premium-priced $1.7 billion market.  As we now know, Dasani and Aquafina are just bottled tap water.

Wine_tasting Even if one can turn water to wine, now researchers discovered the same rule applies.  Unlike most scholarly articles, they actually published a well-made chart to illustrate their experiment.

Testers were given the same wine but told either it cost $10 or $90.  Their brain activity is measured.  The chart showed that those thinking it cost $90 (green line) had much better sensation about the wine than those thinking it cost $10 (blue).

A standard way to display this information is a data table that spells out every estimate and its standard error, plus some asterisk or bolding scheme to indicate statistical significance.  Visualization is far superior.

For more examples, see Gelman's paper or Kastellec and Leoni's paper.

Reference: "Study: $90 wine tastes better than the same wine at $10", News.com, Jan 14, 2008.

Nov 06, 2007

The eyeball test

This set of graphs was used by the NYT to discuss changes in U.S.  spending patterns over time.  For this post, I am focusing on the bottom left and bottom right graphs.  One shows spending on energy as a percent of GDP; the other, on "nonresidential structures" (aka, commercial buildings).

Nyt_spending

At first glance, spending on energy and that on commercial buildings look very similar in shape (see above or below left).  Alas, this "eyeball test" doesn't work very well with time series data.  Lets investigate further.

Redospend1_2

"Standardizing" the data (above right) tells us whether the swings are unusual or not in the history of the data.  So in the 1980s, commerical building spend spiked to more than three times the standard deviation above the historical average.  Generally speaking, the standardized unit of 3 is taken to mean highly unusual. 

Notice that the peaks of the left graph had equal heights but on the right graph, energy spending peaked only above two while commerical building spend rose above three.  This is because energy spending has been more volatile historically so it takes larger jumps (or plunges) to count as "unusual" movements.  This information is hidden in the unstandardized version.

Further, since we are concerned with long-term trends, lets take a look at five-year moving averages (below right): in other words, each time point is the average of the preceding five years worth of data. 

Redospend2

The fluctuations have been smoothed out and the peaks are no longer as high.  Glancing at this chart, we may still conclude that the spending patterns are quite similar -- especially in the period prior to 1995.

But is that really the case?  Zooming in on the 1980s, we may mistakenly think the two lines are "close together" if our eyes read the horizontal distance and/or area between the curves, rather than focusing on the vertical distance.  The arrows on the bottom left chart depict this difference.  To make things clearer, the bottom right chart plots the vertical distances between the two lines.

Redospend3

Observe that the difference expanded to above 1 unit in the late 1980s.  A difference of one unit is very large in the standardized scale (of "unusualness") since 0 is business as usual and 3 is "highly unusual".

Eyeballing the two time series would lead us to believe that the two series are similar but we run the risk of underestimating the differences as illustrated here.


Source: "Auto Sector's role Dwindles, and Spending Suffers", New York Times, Nov 3 2007.

Oct 17, 2007

Points of comparison

Econ_mortgage In light of the current housing crisis, arising from mortgage defaults, I pulled this graphic from a Jan 2007 opinion piece that plotted historical default rates of mortgages.  Notice the high degree of stretching on the vertical axis that exaggerates the volatility: essentially, the annual delinquency rate ranged from 1.75% to 2.65% during the last six years or so.  One might be forgiven to think that a 2% default rate is quite acceptable.

Nyt_mortgage_2 Compare the above chart to the pair that showed up in the NYT in Oct 2007 (see right).  The default rates here are in the 10-20% range, very alarming indeed.

The two graphics illustrate a key issue of "aggregation" in statistical analysis.  The first graphic is super-aggregated: all types of mortgages of all ages are put together to calculate each year's default rate.  The second graphic hones in on subprime mortgages only.

More importantly, the second graphic presents data in "vintages".  Each line represents loans originated during a particular year (a "vintage").  This establishes comparability.  On the first chart, each point in time represents the default rate of mortgages averaged over all ages (some loans may be only a few months old; others may be 15 years old).  Since the default rate is much higher for very young mortgages than for older mortgages, such averaging hides crucial information.

Overall, the NYT graphic very effectively conveys the alarming trend of new mortgages performing much worse, especially those originated in 2007.

Redo_mortgage It can benefit from two slight edits: adding a few more years, and using vertical lines (the most critical comparisons are default rates for loans of a given age!)  Something like this...


Sources: "As Defaults Rise, Washington Worries", New York Times, Oct 16 2007; "Mounting Mortgage Credit Problems", economy.com, Jan 23 2007.

Oct 15, 2007

Sense of proportion

[I'm back from vacation.  Will provide my reaction to the responses to the Gelman challenge, and for those who have sent me email, I will work through them soon.]

The NYT commented on a trend among marketers to shift their advertising spending from so-called "measured" media like print and TV to so-called "unmeasured" media like product placements, contests, etc. 
The following chart accompanied the article:

Nyt_ads_2


This construct is akin to a population pyramid; it's great for comparing two groups along one metric, say age groups between males and females.  Here, the two halves aren't comparable groups but two different metrics.  The main metric, that is, the proportion of unmeasured, is not directly depicted: the reader must figure out mentally how much of each bar the black part covers.  Also, the companies are sorted by unmeasured media spending but this leaves the measured spending with a jagged profile, confusing matters.

As for the little white slits on the gray bars, they are admittedly cute but it is difficult to compare the detailed breakdown between print, TV and other media among companies.

The following dot plot gives the two halves equal weight.  Redoads1(Pink dots are measured, blue unmeasured.) It's not a very interesting graphic though. The sense of proportion is still missing.

I settled on a scatter plot which relates the proportion spent on unmeasured to the total amount of spending.  It appears that the largest advertisers had the lowest proportional unmeasured spend while the smallest (among the majors) had the highest.  (It's only a weak correlation: a linear fit yields only 16% R-squared.)
Redoads2


















Source: "The New Advertising Outlet: Your Life", New York Times, Oct 14, 2007.









Sep 17, 2007

Structuring a chart

Nytmpg This chart from the NYT was intended to show how the EPA has moved the bar on vehicle mileage ratings: 2008 estimates were lower than 2007 estimates across the board, regardless of manufacturer, model and city/highway.

The chart was built from one basic component, repeated for each model. 
Nytmpgsm_2I like the discreet gridlines (the white ticks) which enable readers to count off the mileage ratings.

The data is rich: ratings were given along three dimensions (model, year of estimate and city/highway).  Readers can benefit from a stronger guidance in where to look for the most pertinent information.  As the chart stands, it is merely a container for the data.  It fails our self-sufficiency test: all the data were printed on the chart, and the bars add little.

In the junkart version, I use knowledge of the data to structure the chart. First, noting that sedans, hybrids and trucks/SUVs/minvans have different levels of mileage ratings, I clustered the models into three groups.  Secondly, the city and highway ratings were separated into two columns as I consider the between-model comparisons more important than city-highway comparisons. 
RedompgThe chart is a dot plot, with a vertical tick for 2007 estimates and a dot for 2008 estimates.  It's easy to see that all dots sit to the left of vertical ticks.

More subtly, we can also see that the hybrids appeared to have been penalized more.  Or perhaps, the higher the rating, the larger the downward adjustment...

Source: "Mileage Ratings Are Still Estimates, Though Closer to Reality", New York Times, Sept 16 2007.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31