Jan 15, 2006

Can good charts be entertaining?

A response to Jack's comments on the Economist charts.

Junk Charts pleads guilty to the charge that this blog's attitude is seriously serious, except on rare occasions.  That is because we believe data analysis to be a serious subject.  That said, we do wonder how entertainment value can co-exist with data integrity; and thus far, we have not found the happy medium.

Tufte's favorite chart of Napoleon's Russian campaign is one example of an entertaining and informative chart.  For anyone who knows or follows the Bumps Race, the Bumps chart is highly expressive.  We believe that entertainment can be a by-product og graph-making but deliberately seeking it is folly.

Fn2noguts_1Case in point: the palm-tree hedge-fund plot Jack thought to be funny. 

At the least, when adding entertainment, the designer must be careful not to distort the data contents but even minor chartjunk can insidiously ruin an otherwise competent chart, as happened here.

Getting rid of the chartjunk, we would revert to a standard time-series chart on a rectangular grid.   The palm-tree axis, being curved, is a curious little feature.  Its presence meant that the rectangular grid interpretation no longer applies!  When reading the data for 2000 for instance, one must trace a curved lines upwards, not the usual vertical line.

RedopalmThe right chart illustrates this.  If the designer switches to a curved grid, then the trend line must be transformed from the black line to the red line.  (This may remind some of Jacobian transformations in multi-variable calculus.)  The error in the Economist chart is akin to showing the black trend line on the red grid.

Also, when the designer focuses on beautifying the chart, she may become careless.  For instance, why on earth should the vertical axis start at negative $25 billion assets?  One would think that hedge funds with negative assets do not, and cannot, exist.  Perhaps it's truly "far from expected" in the Caymans!

I encourage other readers to comment if they have ideas as to how to integrate entertainment into data graphics.

Jan 11, 2006

Feast for the eyes?

Readers of this site will know that the otherwise venerable British business magazine, the Economist, could use some help with their data graphics.  I have  on two obsessions in particular, the awful donut chart and appending of an additional data series to a line chart.

Readers familiar with the USA Today newspaper will know about their one-a-day graphic on the lower left corner of the front page.  I have avoided commenting on them because they usually violate every rule in the book.  Here are two from the pile:

Usa12

It is with some sadness that I must report that the Economist has joined the race to the bottom.  Its recent publication called "The World in 2006" contains a score of exasperating, over-adorned graphics of the USA Today variety.  Consider these specimen (thanks to Patrick for alerting me).

Eu1wider



Fn2noguts


Ld1

Ir22026

Dec 03, 2005

Saving indices?

EcbarroombrawlLooking at a time-series chart like this, I can't help but think "save thee from the index".  Indexing to some arbitrary value is a very common tactic used in graphs but in many cases, its value is dubious.

For this example, arbitrarily setting alcohol consumption in 1985 to 100 in each country transforms the data series from absolute litres per person to annual growth in litres per person.  After the transformation, the data on absolute litres disappear, which is why the graphic designer adds the "actual per head in 2003" data, as if as an after-thought.

This added data fails my self-sufficiency test: removing the printed data takes away our ability to know which country consumed the most alcohol in any year.  It also induces confusion as we now have an unordered list of numbers right next to the vertical axis labels.

RedoalcoholUntangling the transformation (indexing) is equivalent to shifting the four lines up and down while keeping the growth trends as seen in the shape of the lines.  See the junkchart version on the right.  Note the use of my favorite construct, the Bumps chart.  (See here for another example).

Since I am using a different data source, I have only decade by decade data except for 2000-3.  But I have data back to 1970.  Now, we perceive both absolute and relative information at once.  For example, while the French have reduced their drinking significantly, they are still the biggest consumers among the four groups.  And we know this for 2003 as well as every other year plotted.  Most importantly, we take in more information after discarding the index!

Reference: "Bar room brawl", Economist, Nov 2005; OECD Health Data 2005.

Nov 29, 2005

When not to use bars

Hope everyone had a great Thanksgiving.  This weekend, I came across two examples of poorly-executed bar charts, both from the Economist.  (More on bar charts here, here, here, here.)

In both cases, an additional symbol (a range, a dot) was superimposed on the bar chart, which is an act both obfuscating and ugly.  It is made painfully clear that each bar contains only one piece of data, completely indicated by its top edge; in other words, one can replace any bar with just its top edge, which is what I have done in each case.

Redospindoctors_2

Redomusttryharder

In the first example, the baseline estimates of people living with HIV show up more clearly.  (I'm not sure why upper and lower estimates are included for years past as they should have official counts.)

In the second example, the focus on the gap between official and actual retirement ages is restored and emphasized.

It would not be proper to sign off without revisiting the start-at-zero rule (start here or here).  In both the above charts, I have chosen not to start at zero.  I assume that the point of these charts is to illustrate recent changes in the depicted variables (Andrew will want to see longer time series, I'm sure.)  If I start these charts at zero, I run into difficulty deciding the separation of the tick labels: in order to capture the differences which are squeezed into a small range (due to the narrow date range), I'd have to use a lot of ticks, most of which are useless outside the range of the data!

Reference: "Spin Doctors" and "Must Try Harder", Economist, Nov 26, 2005.

Nov 01, 2005

Wrong variable and omitted record

RobotsThe rise of robots elicited an uninspired, robotic graphical response from the Economist, reprinted by Mahalanobis.

 

 

 

A first fix, shown on the left below, puts the two data series in a scatter plot.  If one accepts the existence of a linear relationship between 2004 installations and 2004 stock, one would be mistaken indeed as such a comparison is meaningless; for countries differ significantly in terms of the number of robots deployed (Japan has over 300,000 while many other countries have fewer than 1,000).

Robotsecon

A second fix substitutes the 2004 growth rate for absolute number of installations.  It is now clear that the growth rate is not much associated with the size of the installed base, contradicting the perceived linear relationship from before.  (Note that the x-axis is plotted on a log scale.)  The European countries are shown in red, most of whom have grown their stock of robots at a higher rate than Japan.

In order to highlight the Europe/Japan comparison, one can plot the European average, rather than individual countries. The message is less murky because the graph is less busy. The following set illustrates this.  What really stands out from these graphs is China (& Taiwan), not Europe.  Incidentally, China was omitted from the Economist chart, which is a rather mischievous deletion -- but is understandable since China's data is hidden when they used the original data series of installations versus stock (green text on the left chart).Robots1

Reference: Economist; United Nations Commission for Europe

I'm writing this from a different computer while I'm travelling and I'm having trouble with the tools at my disposal.  Apologies for some glitches in the charts.

Sep 28, 2005

Which is the bigger mess?

The Economist spoke of a "perfect mess".   Was that the Bundestag or the small-half-donut-inside-a-big-half-donut mess?Redogseats
Presumably, it was something about the distribution of seats that so disturbed our esteemed editors but the reader is unlikely to empathize based on this chart.  One will be mistaken to think the size of the Bundestag doubled from 2002 to 2005; in fact, it only grew by 10 to 613 seats in 2005.

In the chart on the right, the gains and losses in seats by each party are made front and center.  This is a good chart if seat changes are the key message to be conveyed.  Lying hidden in this presentation is the fact that the two largest parties has continued to dominate despite losing some seats to the small parties.

Again, I turn to the Bumps chartHere, both the relative sizes of each party and the gains/losses are clearly depicted.Redogseats2
 
It is also easy to read off the rankings by number of seats in 2002 and 2005.  For instance, the Christian Democrats overtook the Social Democrats in 2005.  The Left Party enjoyed a spectacular rise while the Greens suffered, becoming the smallest party, even though it lost only four seats.  The total number of seats is also included without clutter.

At long last, the so-called mess is evident in the criss-crossing lines, indicating that the ranking of the parties was turned topsy-turvy in the recent election.

 
Reference: "A system in crisis, a country adrift", Economist, Sept 22 2005.
Thanks to Annette for help with this post.

Sep 26, 2005

Who works the hardest? II

Time-series data is tricky to deal with: indeed, an alarm bell should be set to ring every time we see a graph containing few sampled times (in the working hours chart, we were shown two times: 1990 and 2004).  What happened during the intervening 14 years?  one wonders.  Regarding the outlier for Japan 1990, one asks if it is an outlier in time and country or just in time.

Here is a multi-period Bumps chart:
Redohours3
At first sight, the choice of two sampled times has not distorted the picture much.  The general trend (downwards, and clustering) remains intact.

The blue segments are periods of increasing work hours; the red, decreasing hours.

This reveals the temporal pattern.  Between 1998 and 2002, working hours decreased in most countries (except Spain and Belgium).  However, for 6 out of 15 countries, 1994-8 represented a period of increasing hours.  These six included U.S. and five European countries.

Besides, the steepest decline were achieved by Japan, Portugal and the Netherlands in 1990-4, and by France in 1998-2002.  The steepest increase was by Sweden from 1990-4.

Half the countries experienced at least one 4-year period in which working hours increased.

Reference: Data from OECD Statistics Portal

Sep 25, 2005

Who works the hardest? I

Regarding the paired bar chart (right), the Economist made these observations: Econ_cin557_1

  • The average number of hours worked each year has been falling in most rich countries over the past 15 years.

  • In 2004, the Japanese worked 12% fewer hours than in 1990.

  • Most Europeans have also been working less: 10% fewer hours in France and 6% fewer in Germany.

  • Americans and New Zealanders toiled the most in 2004, while the average Dutch worker put in around 25% fewer hours than his American counterpart.

This sort of paired bar charts is frequently used in cross-tabulations of data (here, cross-tabulating country by year).  I find charts like this one confusing.  For example, the last point comparing U.S. and New Zealand requires visual comparison of two bars which are separated by 25 irrelevant bars.  Moreover, because the countries are arranged in order of increasing 2004 working hours, it is difficult to verify the statement about European countries.

The junkchart version below is modeled after the Bumps chart (here, here and here), except that the vertical axis is measured on a continuous scale rather than as ranks.  Redohours1The relevant countries are color-coded to help the visual comparisons: U.S. (blue), Japan (orange), Europe (green), others (grey).

From this chart, we further note that despite a general downward trend, there was variation between the European countries.  While most countries reduce working hours slightly, at least two European countries appeared to have increased hours.  Further, excluding Japan, the bunch of lines drifted down by less than 10% over 14 years.

One secret of this chart is that it takes advantage of both vertical axes, plotting two data series measured on the same scale.  Thus, this chart does not emphasize 2004 hours at the expense of 1990 hours.

Redohours2Based on this chart and 1990 hours, we can find three clusters of countries: those working over 1700 hours (NZ, U.S., Australia, Spain, Canada, Portugal, U.K.); those working between 1500 and 1700 hours (Italy, Sweden, Belgium, France, Germany); and those working fewer than 1500 (Norway, Netherlands).  Japan could be considered an outlier in 1990 but it re-joined the hardest-working cluster by 2004.  The lines within each cluster has drifted down slightly from 1990 to 2004 but no member country strayed from its cluster (but Germany may be the first to do so in the next few years).  The chart on the right makes this cluster structure clear.

Based on previous commentary, some of you will want to see more data than just the two years.  In the next post, I'll take a look at what further insights can be attained if we had more data.

Reference: "Working Hours", Economist, Sept 22, 2005.

Aug 27, 2005

Relative relative indices

Mahalanobis and I have been communicating over a graph that first appeared in the Economist, reproduced below.  The graph purportedly supports their front-page story on "Germany's surprising economy".  He helped clear up my initial confusion over what was being plotted.Germany_01

The asterisk in the chart was explained as "relative to euro-area average". My confusion is captured thus: when Germany's value decreased from 100 to 95 by mid-2002, was that 5 points over Q1 1999?  or was that 5 points over Euro average?

This double indexing confounds sources of change, potentially misleading readers.  The 5-point decline represented a 5% decrease in Germany's relative cost.  Even if Germany's labor cost stayed flat, this 5% decrease can be entirely due to other Euro countries performing worse.  In fact, Germany's curve would still show a decline when its unit labor costs are rising, so long as other Euro countries suffer worse setbacks.

Below, my junkchart version (on the right) is superior because it separates the two effects, showing both the growth/decline in unit labor costs within each country as well as the labor costs relative to the Euro average in each time period.  The left-side chart uses a double index akin to the Economist version; the only trick I used was removing the second index (so that only the Euro average is 100 in period 1).

Redo_eu_labor_1

The story Mahalanobis and I concocted is now clearly seen: Germany had historically had higher labor cost than France and the EU but has quickly narrowed the gap from period 1 to 3.

Jul 31, 2005

Donuts: Still inedible

Concerning my post on the Economist's use of donut charts, Phillip on Blogcritics raised two issues worth further study.

First, are pie and donut charts "natural" for representing percentages?  In my opinion, if only one set of percentages is involved, then either a table of numbers or a "decile chart" works much better.

Redosingledonut_1The "decile chart" would look better if I have used 10 human figures, one for each 10% of the population.

 


This "decile chart" addresses the visual estimation issue Phillip brought up.  Because each dot / human figure represents 10%, even if the percentages are not annotated, the reader can gauge them visually.  Not so for pie slices: no one will be able to tell a 15% slice from a 10% slice.

Besides, the point of the original graphic was to compare percentages.  The message is that the white population is expected to decline while the Hispanic, Black and Asian populations would increase.  When two pies (or donuts) are used, the reader is tasked with differentiating a 67% slice from a 58% slice situated an inch apart.  That, I submit, is a tall order.  By contrast, the growth rate is explicitly coded into the gradient of the lines in my junkart: the steeper the line, the higher the growth (or decay).

Second, what if growth rates are chaotic and lines criss-cross each other?  This presents no problem at all:
Redocrisscrosslines
The first line chart shows two segments increasing at the same rate and one segment declining fast.  The second chart shows two segments dropping at different rates and one segment skyrocketing.  Because the growth rate is explicitly plotted, the reader has no problem picking it up.

The astute reader will note this chart looks like the marvellous Bumps chart.

Phillip also noticed another feature of the Economist chart that escaped me: the two donuts were sized proportional to the total populations in 2004 and 2030 respectively.  Ouch!  Now, the area of a slice depends on both angle and radius, making it nigh impossible to compare them.

Thanks Phillip for pushing my thoughts on this.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31