« August 2008 | Main | October 2008 »

Political theater

Jens, a long-time reader, tried to re-make the boring data tables used to report poll data.  Here is an example from USA Election Polls (left) and his enhanced version (right).


Like Jens, I find most of the tabular presentation of poll data underwhelming.  Too much data hiding all the useful information.  For example, the pollster and polling date data provide a context for super-serious poll watchers to interpret the data; however, they do not present themselves in a way that actually help readers.  Read further for versions that bring out this data much better.

Meanwhile, Jens' revision uses color and ordering to bring out the current state of affairs.  The addition of electoral votes allows us to understand the relative weight of each row, countering the weakness of the tabular format, that each row has the same height, implying erroneously that they have the same importance.

There are a number of good web-sites where this type of data is presented in attractive ways.

I have been a fan of Political Arithmetik, which made great use of the pollster and polling date data mentioned above.  Those data have been averaged to show the overall trend while the individual poll results are plotted as dots in the background.  The polling date data is embedded in the horizontal positions of the dots.  Even more impressively, the margins of error are presented.  Remarkably, this race has been a statistical tie for all these months, the 95% lower limit never quite making it above the zero level.


Another great site is fivethirtyeight.com.  Below, they essentially turned Jen's enhanced table into a map.  The legend on the right perhaps represents what they call "East Coast bias"?  All of Nathan's graphs are very attractively produced; I just wish he'd put more labels on them (such as the differentials corresponding to shades of red and blue.)


Bubbles of the same size

Frederic M. sent in this chart, together with his commentary.

Nyt_teens He wrote:

Bubbles across rows have vastly different numbers but their circles are of identical size (or vice versa). It borders on the ridiculous that all bubbles of the US row have the same size... The question if teenage birth rates and teen sex are correlated cannot be eye-balled with this kind of display. The fact that you cannot compare across rows make this an instance of “chart junk”.

I add:

White spaces -- always dangerous.  Does lack of bubble imply no data or no abortions/sex?

Sorting -- this is what Howard Wainer called "Arizona first" with a twist (United States)

Loss aversion -- would U.S. readers be resentful if countries like Iceland are excluded?  A much reduced version comparing U.S. to say Canada, U.K, Japan and Germany may yield more information for the reader.

Sufficiency -- if all the data are printed as in a table, why do we need the bubbles?

Reference: "Let's Talk About Sex ", New York Times, Sep 6 2008.


What I have been reading:

"Google Co-Founder Has Genetic Code Linked to Parkinson’s" (New York Times)

Studies show that his likelihood of contracting Parkinson’s disease in his lifetime may be 20 percent to 80 percent, Mr. Brin said.

Talk about useless statistics.  A confidence interval that is utterly useless.

"How Wall Street Lied to Its Computers" (New York Times)

So where were the quants?

Risk manager must be the most miserable job ever.  When traders were raking in the millions, quants didn't get the credit (or the pay), according to Taleb, etc.  Now when the market is imploding, they get the blame?

"Competing Tax Plans: two perspectives" (Freakonomics blog)

Three ways to plot the distribution of tax cuts across income brackets.  I don't see why the first, and simplest, chart has a problem.  The two revisions use bar charts with varying-width bars which give excessive focus on the number of people, in one case, and the base income, in the other case.  It is not easy to compare areas of a tall, thin bar and a narrow, flat bar.  The income group labels also present a problem of "loss aversion": why not lose the precision? or just report the percentiles?

Loss aversion

Loss aversion manifests itself in chart-making, as it does in economics.  In chart-marking, loss aversion can be defined as the tendency to avoid losing data at any cost.  Given a rich data set, designers often make the mistake of cramming as much data into the chart as possible.  This is taking Tufte's concept of maximizing data-ink ratio to the extreme, and it often leads to awkward, muddled charts.

Gelman provided a great example of this recently.  See here

Every piece of data is given equal footing, which results in nothing standing out.  The reader gasps for air.

Here is a recent example from the New York Times, in which the designer showed admirable restraint.

Nyt_flasugar   The best evidence is the set of small multiples shown at the bottom.  These give the amount of phosphorus flowing into the lake annually since 1973, as measured from four locations.

The point is that the pollution has been most serious on the northern shores, especially in recent years.  Thus, the Florida plan focusing on the southern region is likely to make limited impact.

The choice of vertical lines is smart, as the typical time-series connected-line chart would jump up and down crazily.  A simple vertical axis marks the amounts, avoiding the temptation to print all the data.  The designer realizes it is the trend, rather than individual values, that is the issue.

Taken together, the three components tell a good story.  This is a well-executed effort.  The Times once again proves itself the leader in developing sophisticated graphics.

Reference: "Florida Deal for Everglades May Help Big Sugar", New York Times, Sep 13 2008.

As simple as possible but no simpler

In this political season, we are bombarded with soundbites.  For example, we keep hearing about the Red States versus the Blue States.   By coloring states, we endorse the notion that Red State people are conservatives and Blue State people are liberals.  Using sophisticated analyses of real data, Prof. Gelman tells us why this notion is wrong in his recently published book "Red State, Blue State, Rich State, Poor State".

The key chart is below (courtesy of Gelman). 


There is indeed a red-blue divide but the gap is much much wider among rich voters (right ends of lines) and the middle class (mid-points of lines) than among poor voters (left ends).  Poor voters are almost everywhere liberals on economic issues, and on social issues, they are moderates leaning conservative.  Rich voters, by contrast, are very polarized on both economic and social fronts.

This is one of those charts that express their messages without fuss.  A lot of data was analyzed but only the statistical conclusion is portrayed, much of the hard work hidden.  Contrast this with data-rich infographics.  In my view, this chart fits the description: as simple as possible but no simpler.

It is not that the common notion of a Red-Blue divide is entirely wrong; it suggests that such a view is too simple.  This divide is strongly present among middle-class and rich voters but not so much poor voters.  To be even more nuanced, for middle-class voters, the Red-Blue divide is manifested almost exclusively along the social dimension; middle-class voters are economic moderates everywhere, leaning liberal.

More technically, the above is summarized by saying that the interaction effect (between state residence and wealth) is significant and cannot be ignored.  Prof. Gelman is one of the strongest advocates of always including interaction terms in regression of social and economic data.  And here is an example of why.

The concept of interaction is tough to explain to the business audience, I have found.  In presenting one such chart recently, I found the audience confused by the lines.  Indeed, the lines, their slopes, etc. do not contain any information.  They merely serve as guides to how to read the chart.  In order to see the interactions, our eyes need to trace a path from one dot to another dot, literally tracing the lines on the chart.

Another way to discuss interactions: the common notion of a Red-Blue divide masks the reality that this divide is of varying importance across income groups.  A view that aggregates income groups is too simplistic.

Reference: Red State, Blue State, Rich State, Poor State, by Andrew Gelman (2008).

Lining things up

Guess where I went for vacation (clue in the chart).

This long, narrow country is divided into 15 regions.  In the chart below, an uneven parade of 13 bubbles was used to present some sort of economic projections.  The capital of the country was singled out as the top of the table.


The unevenness has a side effect, that the guiding lines are forced to have differing lengths and bewildering turns.  Further, because bubbles have no intrinsic scale, the designer must put all the data onto the map as well, thus failing our self-sufficiency test..

The following bar chart version respects the wide, thin space and yet delivers the data more clearly.  The top version displays all the data while the bottom one uses a simple axis The bottom chart is my preference since most readers are probably interested in approximate and relative comparisons, rather than exact projections.  (The map would be better off without colors.)


Reference: "Inversiones entre 2008 y 2012 llegaran a US$ 57 mil millones impulsadas por mineria y energia", El Mercurio, Aug 25 2008.