Jul 01, 2009

A shocking failure to communicate

So said a reader, Stephen B., of the following graphic (note: pdf) in the London Times concerning Andy Murray's recent tennis triumphs.


Lt_murray

How can we disagree?  Shocking?  Yes.  Failure?  Definitely.  Failing to communicate?  No doubt.


Lt_murray_a Let's first start with the five tennis balls at the bottom.  It fails the self-sufficiency test.  It makes no difference whether the balls (bubbles) are the same size, or different sizes.  Readers will look at the data and ignore the bubbles.

Amazingly, the caption said that "Murray has one of the best returns of serve in the game."  And yet, the graphic showed the five players who were better than Murray, and nobody worse!  For those unfamiliar with tennis statistics, it does not provide any helpful statistics like averages, medians, etc. to help us understand the data.


But that is only the beginning.

Take a look at these two donuts.

Lt_murray_b
(The color scheme from light to dark: first, second, third, fourth round of tournament)

So we're told: the 75% of first-serve points won in the fourth round was 25.6% of the sum of the percentages of first-serve points won from first to fourth rounds (75%+70%+71%+76%).  What does this mean?  Why should we care?

The challenge with these two statistics is that they are correlated and have to be interpreted together.  If a first-serve is won, then there would be no second serve, etc.  Here's one attempt at it, using statistics from the Soderling-Federer match.  It's clear that Federer was better on both serves.

Redo_murray


Reference: "Murray's march to the last eight", London Times.
 





May 12, 2009

Spinning multi-color

New York Times has a great pointer to the Global Warming Art website.  The author Robert Rohde wants to popularize environmental science by visualization of the data.  There are many interesting charts and well worth repeated visits.

These pie charts cry out for some re-dressing:

Greenhouse_Gas_by_Sector

The pie charts, the colors, the whole works.  Most troubling is that each pie has its own sorting scheme, and because the text labels were not reproduced in the smaller pies, the reader is sent scrambling around to find the right labels.

In addition, these pie charts, as with almost every other pie chart, fail the self-sufficiency test.  Without all the data printed next to each sector, the reader is simply unable to judge the size of each sector.

Further, the aggregate data (larger pie) may not be as relevant after realizing that the smaller pies show very different patterns.  The following junkart version tries to bring out this fact by treating both dimensions (type of greenhouse gas; source of emission) equitably.

Redo_greenshouse


While I picked on this particular chart, I must say I support Robert's effort and wish him luck in this very well-intentioned project.



Apr 27, 2009

Inspired by Tetris

What should we call this one?  A Tetris chart, perhaps.

Nyt_highschool

In particular, pay attention to the rightmost three pieces: while the shapes look completely different, the actual proportions ranged from 6 to 8 percent.

The Tetris chart fails our self-sufficiency test. The only way to read it is to read the data labels.

Since the proportions add up to 100 percent, this multiple-choice question appears to allow only one answer, even though, as the text said, there were two acceptable answers!  It would be useful to label those two choices separately.  We'd also want to see how the question was phrased.

Seen differently, the Tetris chart is a 4x25 matrix with each cell representing one hundredth of the respondents.


Reference: "Name, Please?  High School Seniors Mostly Don't Know", New York Times, April 19 2009.


Mar 28, 2009

Knowing what one is doing

Jess B. sent in some entertainment.


Billshrink

In case the font is too small:

Billshrink2

There is a lot more here, including the author's note in the comments section.

Oct 28, 2008

The matter of bad choice

Right on the heels of the disastrous bubble chart comes another, courtesy of the NYT Magazine.  Bubble charts are okay for the conceptual ("this is really big, and that is really tiny").  This chart wants readers to compare the sizes of the bubbles, which highlights the worst part of such graphs.

Poor scaling is the huge issue with bubble charts.  They are the prototype of what I call not "self-sufficient" charts.  Without printing all the data, the chart is unscaled, and thus useless (see below middle).  When all the data is printed (as in the original, below left), it is no better than a data table.

Pewpoll4

In the above right chart, we simulated the situation of a bar or column chart, i.e. we provide a scale.  For this chart, the convenient "tick marks" are at 10, 20, 34, 41.  Unfortunately, this scaled version also fails to amuse.

Note further that the data should have been presented in two sections: the party affiliation analysis and the gender analysis.  Also, it is customary to place "Independents" between "Republicans" and "Democrats" because they are middle-of-the-road.

Redo_pewpoll A profile chart is an attractive way to show this data.  Here, we quickly learn a couple of things obscured in the bubble chart.

On the issue of abortion, Independents are much closer to Democrats than Republicans.  Also, there is barely any difference between the genders, the only difference being the strength of support among those who want to legalize.



Reference: "A matter of Choice", New York Times Magazine, Oct 19 2008.


PS. Based on RichmondTom's suggestion, here are the cumulative profile charts.

Redo_pewpoll2 


Bernard L. suggested a "tornado" chart:

A matter of choice

Sep 20, 2008

Bubbles of the same size

Frederic M. sent in this chart, together with his commentary.

Nyt_teens He wrote:

Bubbles across rows have vastly different numbers but their circles are of identical size (or vice versa). It borders on the ridiculous that all bubbles of the US row have the same size... The question if teenage birth rates and teen sex are correlated cannot be eye-balled with this kind of display. The fact that you cannot compare across rows make this an instance of “chart junk”.


I add:

White spaces -- always dangerous.  Does lack of bubble imply no data or no abortions/sex?

Sorting -- this is what Howard Wainer called "Arizona first" with a twist (United States)

Loss aversion -- would U.S. readers be resentful if countries like Iceland are excluded?  A much reduced version comparing U.S. to say Canada, U.K, Japan and Germany may yield more information for the reader.

Sufficiency -- if all the data are printed as in a table, why do we need the bubbles?






Reference: "Let's Talk About Sex ", New York Times, Sep 6 2008.

Sep 15, 2008

Loss aversion

Loss aversion manifests itself in chart-making, as it does in economics.  In chart-marking, loss aversion can be defined as the tendency to avoid losing data at any cost.  Given a rich data set, designers often make the mistake of cramming as much data into the chart as possible.  This is taking Tufte's concept of maximizing data-ink ratio to the extreme, and it often leads to awkward, muddled charts.

Gelman provided a great example of this recently.  See here

Olympicsviz
Every piece of data is given equal footing, which results in nothing standing out.  The reader gasps for air.


Here is a recent example from the New York Times, in which the designer showed admirable restraint.

Nyt_flasugar   The best evidence is the set of small multiples shown at the bottom.  These give the amount of phosphorus flowing into the lake annually since 1973, as measured from four locations.

The point is that the pollution has been most serious on the northern shores, especially in recent years.  Thus, the Florida plan focusing on the southern region is likely to make limited impact.

The choice of vertical lines is smart, as the typical time-series connected-line chart would jump up and down crazily.  A simple vertical axis marks the amounts, avoiding the temptation to print all the data.  The designer realizes it is the trend, rather than individual values, that is the issue.

Taken together, the three components tell a good story.  This is a well-executed effort.  The Times once again proves itself the leader in developing sophisticated graphics.



Reference: "Florida Deal for Everglades May Help Big Sugar", New York Times, Sep 13 2008.

Sep 05, 2008

Lining things up

Guess where I went for vacation (clue in the chart).

This long, narrow country is divided into 15 regions.  In the chart below, an uneven parade of 13 bubbles was used to present some sort of economic projections.  The capital of the country was singled out as the top of the table.

Cbcprojections


The unevenness has a side effect, that the guiding lines are forced to have differing lengths and bewildering turns.  Further, because bubbles have no intrinsic scale, the designer must put all the data onto the map as well, thus failing our self-sufficiency test..

The following bar chart version respects the wide, thin space and yet delivers the data more clearly.  The top version displays all the data while the bottom one uses a simple axis The bottom chart is my preference since most readers are probably interested in approximate and relative comparisons, rather than exact projections.  (The map would be better off without colors.)

Redo_ch_map



Reference: "Inversiones entre 2008 y 2012 llegaran a US$ 57 mil millones impulsadas por mineria y energia", El Mercurio, Aug 25 2008.

Jul 21, 2008

Joining the fun

We hope this is indication that the British paper Guardian (with one of the best websites out there) is joining the fun.  It appears that they have quietly debuted an interactive graphics feature.  The first edition addressed the oil price crisis.

This time-series chart has much to be commended:

Guk_blackgold1


The use of inflation-adjusted figures seems obvious but we don't see much of these in the press.  Highlighting the peaks and providing annotation (when moused over) is an excellent touch.  The gridlines and axis labels (especially the year axis) are thankfully restrained.  We don't see the need for the unadjusted series (blue line), however.  The fact that the gap grew larger the more time we went back told us little, as it invited readers to read into it more than what it truly was, the time value of money.

Later on, they used an oil barrel object to illustrate the components of retail oil price.  The height of the cylinder is indeed proportional to the data plotted.  If only they colored the end of the cylinder gray instead of green!  As it stands, the green portion has about the same area as the red.


Guk_blackgold2


Reference: "Interactive: oil price", Guardian, July 14 2008.

Jul 18, 2008

Seth on bar charts

Seth followed up his post about graphics with a specific post about pie charts versus bar charts.  He prefers pie charts.  We happen to agree with his unhappiness of grouped bar charts.  Unfortunately he compared an univariate pie chart (depicting point-in-time data) with a multivariate bar chart (iluustrating time-series data).

Here we present a different example, derived from a NYT article on diabetes in America.  The original chart is a series of pie charts, one for each age group, and one for the aggregate data.

Redo_diabetes

The junkart version uses a bar chart.  Readers can get a more precise comparison of the prevalence rates across age groups because it is easier to judge lengths than areas.  This has been scientifically proven by the likes of Cleveland.

Dirty trick, you might say because the original chart actually prints the data in each pie.

Nyt_diabetes

So now there is no mistaking the data.  This raises a philosophical question: why bother graphing the data if the reader needs to read the data in order to understand the chart?  We call this the self-sufficiency test.  The graphical elements of a pie chart can't stand on their own.


Reference: "Diabetes - underrated, insidious, and deadly", New York Times, July 18 2008.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31