Mar 22, 2008

Trying too hard

In the course of business and governing, a lot of charts are generated.  An anonymous tipster pointed us to a set created by the "Communities and Local Government" division in the UK government.  Judging from the content, this division has responsibility for economic development in local neighborhoods.

Below are a pair of exhibits.  Truly they are trying too hard!  What we see is a hybrid scatter-bubble chart.  Between the jargon, the acronyms (LAD, LSOA), the boxed text, the multi-color circles, the colored axis labels and lack of title, the reader is plunged into a state of confusion.

Uk_communities3

The chart can be unraveled.  Each district was evaluated based on two measures of "gaps in worklessness".  The vertical axis compares each district to the national average; positive numbers indicate an above-average district relative to the nation.  The horizontal axis compares the most deprived 10% neighborhood within each district to the local average; positive numbers indicate worst neighborhoods improving. 

Thus, the policy goal would be to move all districts into the upper right quadrant.  The multi-color bubbles were designed to show us the state of the nation.  On the left chart, 41% of the districts (or population?) reside in the improving districts while 19% live in deteriorating areas.

The following strategies can help improve readability:

  • Redo_communities3use English on the axis
  • relegate technical definitions to the legend
  • add succinct title to tell the story
  • use color on the data rather than on axis or data labels
  • use color to draw attention to the upper right quadrant
  • remove bubbles
  • define acronyms

 

Aug 14, 2007

Mid-week entertainment: spots

Via Andrew, an amusing chart.

Earmarkspercap_2


At least they have the good sense of not labeling the smaller bubbles.  I can imagine a scatter plot with  amount of earmarks against population or GDP of each state.

Aug 08, 2007

On the bubble

Nyt_candminsA couple of you noticed this table of bubbles in the Times, and asked what I think of it.  Dustin J suggested that this could be considered a decent application of bubble charts.  I agree, with some reservations.

The data set is the best thing about this chart.  The riches that lay beneath!  Many questions can be addressed, including:

  • Which Presidential candidates are getting the most face time?
  • Are candidates seen equally often across the stations?
  • Are there differences between network and cable stations in terms of total face time?  In terms of individual face time?
  • Are there Democratic/Republican leanings by station?  by type of station?

The intrepid can even build a regression out of it.

The bubble chart contains answers to all those questions but nothing jumps out. Okay, it's easy to see the station that gives each candidate the most face time.  Anything else requires moderate to a lot of effort.  Here's the junkart version.


Redocandmins_2 The list of things done to the data is long:

  • Candidates are grouped together by party
  • Candidates within each party are arranged in order of decreasing maximum face time
  • Stations are arranged by increasing total face time, this order happens to retain the network vs cable divide
  • A heat map construct is used instead of bubbles: the legend is missing but there are four hues for each color: darkest = top 10%; medium = 10th - 50th percentile; light = bottom 50th percentile excepting zeroes; white = no face time.  In raw numbers, 90th percentile = 81 minutes, 50th percentile = 19 minutes.
  • The only data shown are the totals by candidate and totals by station.
  • On the right margin are little bar charts that show the distribution of network/cable for each candidate.
  • On the bottom margin are little column charts showing the distribution of party affiliation by station.

A few observations follow:

  • Cable stations gave much more face time to the candidates in general.  Fox, no surprise, gives Republicans 85% of its time while all the others were roughly equal.
  • The more mainstream the candidate, the balanced was the time spent on networks versus cable.  John McCain (R), Hillary Clinton (D) and John Edwards (D) had the highest proportion of network time.
  • More time is not necessarily good since McCain was the clear winner but his campaign is struggling

Source: "Tracking Face Time", New York Times, August 1, 2007.

Apr 20, 2007

Embedding logic

Bernard L. (from France) submitted this bubble chart for consideration.  It accompanied an NYT article claiming the absence of evidence of election fraud.  (Of course, as is well-known, absence of evidence is not the same as evidence of absence.  Here, I'm purely interested in data presentation.)

As a seasoned consultant, Bernard asked if a Marimekko chart would be superior.

Nyt_convictions_2 This is one ambitious chart.  Ignoring the bubbles (which are more nuisance than anything), we are asked to interpret data at three different levels of aggregation in one go.

First, there were 95 cases classified into five indictment types.  Second, these cases resulted in either convictions or acquittals/dismissals.  Third, among the cases ending in convictions (the highlighted area), we were shown the occupations of those convicted.

By flattening three levels into one table, some key information is obscured.  For example, how many cases resulted in conviction?  The reader has to compute either 95-25 or 26+31+10+3.  What percent of civil rights violation convictions were committed by party/campaign workers?  It's not 2/3 = 67% (bottom row) but rather 2/2 = 100%.

The following junkart brings out the logic that is embedded in the complicated bubble-table.  While there is a lot on the page, the text labels plus the flow directions allow readers to absorb the data one level at a time.

Redo_convictions2

I have not attempted the Marimekko as I am not a fan of such charts.  You're welcome to try.

Source: "In 5-Year Effort, Scant Evidence of Voter Fraud", New York Times, April 2007.

PS. I will be working through the backlog of reader submissions.  Thanks for your patience.  Keep them coming!

 

Remark (Apr 25 2007): Thanks to readers for keeping me honest (see comments below).  The conviction rates shown previously were indeed the inverse.  I have now fixed them.

Mar 21, 2007

Dot com bubbles

Web_dotcombubbles Thanks to Dustin J for the pointer as well as the title of this post.  Dotcom bubbles is the most appropriate name for this overblown chart (featured as the "chart of the day" here).

The chart has no title or axis labels so only the diligent reader will figure out that the data consist of acquisition value of several high-profile Internet companies in the past three years.

There are less data than it seems.  Both the heights and the areas of the bubbles indicate the same thing, the deal values.  If we are supposed to see a trend, we are not finding it.

Most of these deals are not directly comparable anyway.  Webex and Ironport are infrastructure type companies with real business models.  Skype is a phone service.  Ask Jeeves is not a leader in its own space. Myspace and YouTube are traffic sites.

Reference: "Chart of the Day: Web deals", Valleywag, Mar 15 2007.

Feb 21, 2007

Bubbles of death

Thanks to Dustin J for bringing this stupendous chart to our attention.  I have to admit I have trouble understanding it.  The red curve appears to be part of a gigantic circle confirming that all life do end on this earth.  How it is connected to the rest of the chart I am unable to discern.  In addition, the trajectory of the bubbles, the overlaps between bubbles, the separation between bubbles all may or may not carry meaning.

Odds_dying_1

Reference: "What are the odds of dying?", National Safety Council.

Apr 04, 2006

Bubbles, troubles

On March 23, NYT served up a double dose of bubble trouble in the business pages.Nytbubbletrouble1 Nytbubbletrouble2For the record:










Both these displays contain very little data and perhaps the only way to read their intention is to see them as decorated data tables, in other words, as objets d'art rather than data displays.  The cutoffs and overlaps warn us against gleaning anything from the size of these bubbles.

Reference: "Who Will Work the Farms?" and "G.M. and Auto Union Reach Deal to Cut Work Force", New York Times, March 23 2006.

Jan 30, 2006

Nuke this bubble chart

NytnukesThis unfortunate chartjunk appeared in NYT Magazine this weekend.  Once again, bubbles prove to degrade, not enhance, our ability to interpret the data.

How to explain the overlapping circles?  The solid versus empty bubbles?  Those with numbers inside, and to the left or to the right?  Those bubbles showing a precise number and those that show a range?  Pakistan ranking below India?

The chart fails our self-sufficiency test: the chart does not lose any power if we remove all the bubbles because every piece of data has been printed on it.

A two-sided dot chart may be appropriate here, shown next.  The relative scale of Russia and U.S. warheads to those of other nuclear powers is starkly revealed.

Redonukes_1

Dec 15, 2005

Where bubbles lead

RedonytmagThis chart reminds us, yet again, of the issue with bubble charts.  I have deliberately blocked out some of the data.  If I blocked everything out, there is no reference point to estimate the size of any bubble.  Even with the unblocked data, it is not easy to estimate the blocked data.

Take a guess before you click to reveal the answer.


Reference: New York Times Magazine, Dec 11, 2005.

Nov 30, 2005

Review: Gapminder 3

The next chapters of Gapminder take the scatter plot of income and child mortality further.

V. Income and health of countries

Hdr15gdphealth_2Concept used: standard deviation, measure of location and of dispersion

Highlight: this chapter is an amazing illustration of why it is dangerous to look at only averages but not dispersion.  The screen shot on the left shows that Mauritius is nothing like the rest of Africa in terms of income or of child survival rate.

Alert readers will notice that Gapminder has switched the y-axis from child mortality to child survival, which is significantly easier to grasp even though the data is the same.  (Did they read Hadley's comment?)

Food for thought: 1) The labeling of the log scale for child survival rate may confuse some.  2) The population size dimension as rendered in bubbles interferes with our understanding of the correlation between GDP per capita and child survival while adding little if any value.

Presumably, population size is shown so that the reader can observe the correlation between population and GDP per capita, and that between population and child survival.  The reader can judge for themselves whether the bubble chart is effective in presenting such correlations (see charts below).Hdr15gdphealthbii
Hdr15gdphealthbi

3) The log-log scale can easily mislead us in judging the magnitude of dispersion.  Even though the countries in OECD (aquamarine bubble) look relatively less dispersed, in reality, this may not be so because small distances on the right side of the page must be translated conceptually to large distances (to reverse the log scale).


VI. Same Income, Different Health

Concepts used: scatter plot

Hdr16sameincHighlight: This chapter is a tour de force in explaining how to read scatter plots.  Besides, it proves how animation can significantly improve instruction.  The screen shot on the left is but one example.

VII. Development directions
VIII.  Differences within countries

We discussed development paths last time.  Chapter 8 drills down further into distributions within countries; its only disappointment is the lack of data, especially for OECD countries (wanting to hide social inequality?)


Conclusion
This is an all-around fantastic effort to bring color to the voluminous data in the Human Development Report.  Many important statistical concepts are included and carefully explained (histograms, means and dispersion, different levels of analysis, scatter plots, etc.). In some cases, the choice of graphical construct exposes its limitation.  What's more, the producers apparently are open to feedback; I have detected some improvements already and a Chapter 9 has appeared after I completed my review.  Here are my reviews of earlier chapters.

Chapter 1-3
Chapter 4 (touching on 7)

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31