« December 2007 | Main | February 2008 »

Jittering lines

A reader alerted me to this NYT chart a few weeks back.  The chart plots daily changes in stock index prices (gray lines) and yearly changes (color blocks). 

Nyt_volatility

The blue blocks represent bad down years but notice that the daily changes during many of those periods give no such impression.  Nyt_volatility2_2In fact, the gray lines are quite equally balanced on both sides of 0, and yet the annual tallies swing from positive to negative quite frequently.  It is by no means true that one exceptional down day predicts a down year.

The problem arises from cramming too much data into too small a space.  We can't judge the density of the lines on paper and so can't judge whether there were more up lines than down lines.

This issue is not dissimilar to the jittering question when used with large data sets.

Source: "The Pulse of Uncertainty", New York Times, Jan 4 2008.

 


The buzz

What other statistics and/or graphics blogs are chattering about:

1. Andrew Gelman asks: Does jittering suck?

2. Gelman reports progress on some (really cool) simple statistical methods that solve practical problems, such as scaling regression coefficients and using priors to deal with separable data in logistic regression; he also tells us which commonly used methods he does not like

3. Rindskopf's rules for statistical consulting (delivered at a mini-symposium organized by Gelman)

4. Information Aesthetics looks at travel-time maps (which we also discussed here)

5. EagerEyes considers the makeup of past U.S. presidents (see also older post here), and argues for "expressive visualization", or charts that opine






Oscar diseconomy

OscarBusiness Week dissected the beneficiaries of the Oscar show as shown on the right.  Although this doesn't work well as a data graphic, if thought as a variant on the data table, it is more engaging for readers.

Lets have some fun with the Oscar statue.  First, putting a bar chart next to the statue confirms that the height of the segments (rather than the area) is in proportion to the dollar values (below left).

Tufte, Chambers and others have shown that our eyes react to the areas, not heights.  So next, I estimated the areas but stretched them out into segments of equal width.  Squeezing the entire column back down to the height of the statue, the following chart (below right) puts perceived proportions next to the true proportions, displaying visually the extent of distortion. 

Redo_oscar


































Reference: "News you need to know", Business Week, Jan 28 2008.


Football rankings 1.1

Long-time reader Jon sent in a different view of the QB data.  He uses a nifty tool in Excel to generate a parallel coordinates plot (also called profile plot) on which pairs of QBs can be highlighted and compared.

Jon_garrard This chart exploits the foreground background concept very nicely.  One way to deal with abundant data is to highlight only those bits that matter to the question at hand, and relegating the rest to the background.

The gray lines in the background provide context without grabbing undue attention. He also converted every metric to a scale between 0 and 1, similar to what we did with our version.

The Eli Manning / Philip Rivers comparison shows that both QBs were below average on most of these metrics, with Manning near the bottom of each.





Football rankings 2

Nyt_nfloffense

The above chart is another one in the NYT series on the NFL playoffs.  It evaluates the mix of passing and rushing attempts by offense.  The convoluted way by which the caption strains to tell a story indicates trouble ahead:

Of the three playoff teams that threw the ball the most, two of them come from cities known for cold weather.  Conversely, of the three teams that ran the most, two of them play their home games in milder weather.

The implication is that teams from cold-weather cities are supposed to want to rush more, and vice versa.  And the data (total of six samples) pointed to the opposite.

This presentation suffers from low data-to-ink ratio:  too much ink is spilled over not much data.  The designer arbitrarily picks one of the two variables (passing attempts, rushing attempts) as the primary, sorting variable -- trace the orderly green diamonds on the right chart.  This makes it hard to see a pattern in the brown diamonds.  As usual, a scatter plot works much better with two data series.

Redo_nfloffenseIn the junkart version, the raw numbers of attempts are converted into proportion of attempts that were passing versus rushing.  This easy move immediately collapses the two dimensions into one.  Now, we have room to include an extra variable which matters: the average amount of snowfall in these cities.

So what does the data say about the relationship between propensity to pass and cold weather?  There appears to be very little relationship as the dots are all over the chart.  In particular, the teams playing in cities with the highest snowfall span the range of passing percents; similarly, those playing in lowest-snowfall cities also span the range of passing percents. 

The caption ignores all the blue dots, focusing only on the gray ones.  A more direct examination of the relationship reveals the folly of the so-called "not so conventional wisdom".

References: "NFL Offences Undergo a Thaw in Thinking", New York Times, Jan 5 2008; government snowfall statistics.


Water and wine

Marketers have always argued that price signals quality; this leads to the startling idea that one should just set a high price. 

If you don't believe it, note how Coca Cola and Pepsi turned tap water into a premium-priced $1.7 billion market.  As we now know, Dasani and Aquafina are just bottled tap water.

Wine_tasting Even if one can turn water to wine, now researchers discovered the same rule applies.  Unlike most scholarly articles, they actually published a well-made chart to illustrate their experiment.

Testers were given the same wine but told either it cost $10 or $90.  Their brain activity is measured.  The chart showed that those thinking it cost $90 (green line) had much better sensation about the wine than those thinking it cost $10 (blue).

A standard way to display this information is a data table that spells out every estimate and its standard error, plus some asterisk or bolding scheme to indicate statistical significance.  Visualization is far superior.

For more examples, see Gelman's paper or Kastellec and Leoni's paper.

Reference: "Study: $90 wine tastes better than the same wine at $10", News.com, Jan 14, 2008.


Football rankings 1

The Times' sports pages made wise use of graphics in a series of NFL articles recently.  Here is a rank plot (below left) comparing Jaguars quarterback David Garrard to seven other quarterbacks who started the weekend of January 5.

Nyt_garrard

Simple and effective, this chart does not fuss around in showing us where Garrard ranks relative to the others. 

Redo_garrardThe junkart revision (below right) plays with a different scale: the spacing between the tick marks represent proportional differences in the underlying metric.  This gives us a little more: for example, Garrard's second rank in completion percentage is less remarkable than first thought as he essentially tied with the 3rd and 4th best while the top six were bunched between 60 and 65 percent.

But Garrard's touchdown to interception ratio stands out as the next best quarterback attained only about half his ratio.  (Todd Collins who had not thrown an interception until that time was omitted; he also had only started four games.)


References: "Two Dreams (One Big, One Tiny) Come True", New York Times, Jan 4 2008; ESPN statistics.


Maps and dots 2

In response to Derek's comment, here is a bit more show and tell on chained dots, rows of dots, stacked bars and rows of bars.

Redo_brainmap2

Chained dots and stacked bars require a legend.  The rows of dots or bars permit a more efficient labeling.  Rows of bars do require a scale but this object is more scalable than rows of dots; imagine longer and longer rows of dots.  With bars, we just adjust the scale.

Chaining and stacking makes it easier to compare total prizes as opposed to individual prizes.  So it depends on which comparisons we want to emphasize.  My interest in this map happens to be league-table style comparisons, and because these prizes (aside from the Nobel) are so specific to a field, knowing the distribution across fields is important too.

Chicago is indeed a curious omission; is it possible they are not counting Economics Nobels?



Maps and dots

Happy New Year

The cosmos of university ranking got more interesting recently with the advent of the "brain map" by Wired magazine.  This new league table counts the total number of winners of five prestigious international prizes (Nobel, Fields, Lasker, Turing, Gairdner) in the past 20 years (up to 2007); and the researcher found that almost all winners were affiliated with American institutions.
Wired_brainmap
As discussed before, the map is a difficult graphical object; it acts like a controlling boss.  In this brain map, the concentration of institutions in the North American land mass causes over-crowding, forcing the designer to insert guiding lines drawing our attention in myriad directions.  These lines scatter the data asunder, interfering with the primary activity of comparing universities.

Wired_dots The chain of dots object cannot stand by itself without an implicit structure (e.g. rows of 10).  This limitation was apparent in the hits and misses chart as well.  Sticking fat fingers on paper to count dots is frustrating.  Simple bars allow readers to compare relative strength with less effort.

Redo_brainmap_2

In the junkart version, we ditched the map construct completely,  retaining only the east-west axis.  [For lack of space (and time), I omitted the US East Coast and Washington-St. Louis.]  With this small multiples presentation, one can better contrast institutions.

To help comprehend the row structure, I inserted thin strikes to indicate zero awards. A limitation of the ranking method is also exposed: UC-SF has a strong medical school and not surprisingly, it has received a fair share of Nobel (medicine), Lasker and Gairdner prizes; but zero Lasker and Gairdner could be due to less competitive medical schools or none at all!


Reference: "Mapping Who's Winning the Most Prestigious Prizes in Science and Technology", Wired magazine, Nov 2007.