« September 2006 | Main | November 2006 »

Rip tide

Nyt_autoAs if a rip tide has torn through, this chart drowned the data in the depths of colors, scales and graffiti.

Scales - every chart has its own scale, rendering it impossible to read across charts.

Colors - every brand has its own color.  This feature is redundant since the data labels already serve the purpose of linking the two columns of charts.

Compression - it is impossible to judge the growth or decline of individual companies, especially since only the current market share is provided.

If anyone has access to the data, please send them over so we can remake this chart.  Or just send in your charts and I'll put them up here.

Reference: "Now Playing in Europe: The Future of Detriot", New York Times, Oct 28 2006.

Tracking tigers


This chart is fantastic work from Amanda Cox and Joe Ward at NYT.  It tracked the baseball Tigers' season, showing how they peaked in early August (with a 10 games lead) and limped into the playoffs, five days after losing the division title.  That slide, beginning in mid-September, set them back 4.5 months.  (It would help to label the 5 games behind the leader line.)

The shading to show which team(s) were chasing them is a stroke of genius.

Further, the dot plots on the right very cleanly brings out their advantage in pitching.  The hitting numbers are mixed.

The following chart is for the Cardinals:



Reference: "World Series Preview", New York Times, Oct 21 2006.

The elusive catchup

CommoditiesThanks to Michael S. for sending in this chart from the economists at IMF (via this blog).

At its heart, this is a scatter plot that displays the correlation between a country's development stage (indicated by its PPP GDP) and the importance of the industrial sector to its economy.

On top of that, the chart adds a third dimension of time by linking the dots together with lines.  The lines trace the evolution in each country or set of countries.  Some countries (mostly developed nations) have a clear trend; others exhibit choppy curves which imply fluctuating economic conditions.

We have created this type of chart when discussing the fabulous Gapminder site.

The shading in the chart is supposed to draw attention to an inflection point around $15,000 per capita GDP, wherefrom the industrial sector starts to decline in importance.

In my view, that conclusion is forced because Korea is the only curve displayed on the chart that bridged the $15,000 divide.  Thus, one can say there exists only one data point supporting this hypothesis.

However, one aspect of this chart jumps out at us, which is the chasm between developed and developing countries, right at the $15,000 divide.   On the right side, the rich gets richer in a relatively steady fashion.  On the left side, the poor remains poor.  These nascent economies suffer from a great deal of volatility.  What's worse, the slopes are much sharper on the left than on the right, meaning that the gains in GDP are much smaller on the left of the divide.  Even more troubling are the cases of Brazil and Mexico which seemed to have endured a decline in the industrial sector without much gain in GDP.

The only bright spot is Korea.  (And China is the outlier.)


Racetrack entertainment

A warm welcome to readers of Science.  (Junk Charts is selected as "Best of the Web" this week.  Also thanks to Mitchell for the nice write-up.)

WiredgreenRacetrack graphs was a novelty item here some time ago.  They made an appearance in the October issue of Wired Magazine, known for its design.  We have already discussed information distortion in such charts.

This chart fails the self-sufficiency test, forcing readers to read and interpret the data labels, and to ignore the racetrack construct.

Graphical elements applied as cosmetics?  Charts sacrificing data integrity for entertainment?  This takes us back to our previous discussion: can good charts be entertaining?  Now flipped over: can entertaining charts be good?

Reference: "Good, Green Livin'", Wired Magazine, 10/2006.

Arming the competition

At the TCS blog, Tim Worstall attacked a chart comparing global levels of income inequity, originally published by the Economic Policy Institute.  His post is here.  Tim claimed that this chart proved precisely the opposite of what the EPI intended it to show, that is, that the chart showed that "the poor in America have exactly the same standard of living as the poor in Finland (and Sweden)", two countries which he derided as "redistributionist paradises".  From this, Tim concluded that the U.S. is doing enough for the poor.

Tcs_incomeStephen C., who sent in this chart, was very confused by the length of the bars: left of the divider, the larger the income index, the shorter the bar; right of the divider, the larger the income index, the longer the bar.

For the EPI, this is a case of arming the competition.  Echoing Robert's comment from yesterday, this is one chart that opines but should have murmurred. 

The chart is a very convoluted way to study the idea of income inequality.  The first bar states that the 90th percentile income in Finland is 1.11 times the median U.S. income, after adjusting for PPP.  Notice the simultaneous change in percentile and country, which complicates our understanding of the difference.

The median income is perhaps the simplest (not most informative) measure of income equality.  In the EPI chart, the edges of each bar describe the 10th and 90th percentile income in a country.  We only know 80% of the population lie within each bar but nothing about how they are distributed.

Redo_income_1In the revised chart, I plotted another popular measure of income equality, the ratio of 90th percentile to 10th percentile (since the data is readily available from the EPI chart).  It's clear that inequality is highest in the English-speaking Western world where the top earners get 4-6 times more than the bottom earners.

This income ratio is computed for each country, and can be used to compare across countries without resorting to another index. 

Reference: "America: More Like Sweden Than You Think", TCS Daily, Aug 26 2006.

Graphical equity 3

Zuil provides an alternative rendering of the Sankey diagram / flow chart.  This one is surely superior, being easier to understand while capturing more information than the previous example.

Govt_sankey2_1Ultimately, however, this type of chart will please specialists more than the general reader.

It is designed to be purely descriptive, which explains the absolute equality given to each flow, as indicated by the choice of unique colors and/or patterns for each.

As a data graphic, it can be  improved if the designer has a point to make.  In that situation, only the relevant flows can be highlighted while all others stay in the background.

As it stands, this chart murmurs but does not opine.

Reference: "U.S. Energy Flow - 2002", Energy & Environment Directorate, Lawrence Livermore National Laboratory.

Higher, higher and higher


How high can it go?  This chart, sent in by Michael McCracken and attributed to Yale economist Shiller of "Irrational Exuberance" fame, very effectively poses this question.  The "hockey stick" on the right side of the chart really hits us like a gigantic question mark.

When we have good data, or are looking at the data from the right angle, the charting task is that much easier.

Michael especially likes background shading to highlight specific periods.   I'm a bit perplexed by the "World War I" label as that period does not appear remarkable to my eyes; it is also the only shaded reason that is not a boom period.

The text explains the need to remove "new construction" in order to study housing as an "investment over time".  As an outsider to the real estate industry, I find this definition arbitrary.  The 2001 data presumably would include the sales price of any house that was constructed from 2000 and back.  Why exclude only current-year construction?  Could a sale of a one-year-old property be considered "investment" and not "speculation"?

Reference: New York Times, Aug 26 2006.

For love of Color

Derek C. pointed us to this piece of chartjunk on Wikipedia.  This chart compares the mass of solar system objects, relative to the Earth's mass.Wiki_solar

Derek's comment:

The bars are inappropriate, as their length is proportional to the
logarithm of the ratio of the masses of the object and the Earth. Also
the multiple colours are distracting.

I'm also mystified by the first bar called "Solar System".  It seems to convey the idea that the Solar System is much larger than the Earth;  combined with the second bar ("Sun"), it tells us that every object but the Sun pales into insignificance.  If this is true, then the Solar System needs to be labelled differently as it is not a "solar system object".

Derek sent in a much improved chart:


His version is much cleaner.  The axis labels, properly oriented, are much easier to read.  The use of color is admirably restrained: I suspect that he is as baffled as I about the asterisks (now blue dots) in the original chart. I'd retain the vertical line through the Earth (relative mass = 1) to help anchor the chart.

But a job well done!  He should send it in to the powers to be at Wikipedia.

Graphical equity 2

Based on my last post, Zuil and Lope engaged in a lively conversation about "flow charts", apparently also called "Sankey charts" in some circles.  Here is an example Zuil found at the EIA site:Govt_sankey

Zuil commented that

Though often difficult to draw, Sankey diagrams are IMHO unbeatable to represent any type of lossless flow (energy, money, fluids, etc).

I mostly agree: flow charts are great at tracing flows, and it's easy to figure out proportional sources and uses from this example.  Moreover, as Lope suggested, it's fun (to read).

But... the data content of this chart is lower than that of the network graph or the Marimekko.  Imagine removing all the lines (arcs) in the network graph: that is what the flow chart includes.  It achieves more readability by simplification.

Graphical equity 1

I've been slow checking my email lately: several of you have pointed me to interesting charts; I will work through them over the next week or so.  This post is inspired by John S. who forwarded two charts, illustrating where the U.S. gets its energy and how the U.S. uses its energy.


The first visualization, created by Energy Information Administration, emphasizes the physical connections between energy sources and energy use sectors.  This construct is known as a "network graph", and widely used by engineers; the ovals/rectangles are called "nodes", the lines "arcs".  It functions well as a map visualizing physical relationships but it fails as a vessel for data.  Problems are multiple:

  • The web of arcs is messy and gets worse with more nodes
  • Here, each node has either an input or an output but not both, keeping it simple.  If a node is allowed to take both input and output (the so-called transhipment node), then the graph gets messier
  • Arcs converging at a node leave little space for data labels

Optimist123_energyNext, the Skeptical Optimist blog recast the data onto a construct known to "Marimekko" to management consultants.  Deconstructed, these are column charts,  such that the width of each column represents the relative size of each energy source.

This one does a fairly effective job showing most of our transportation needs are met with oil, our electricity needs are met with coal, our energy sources are roughly split between oil, gas and coal, and so on.

One weakness of Marimekko is "inequity": by its origin as a column chart, it elevates one variable over the other.  What's the relative size of energy used by the industrial sector (blue)?  That's not a question easily answered by this chart.  Even when the column segments are adjoining, as in the case of electricity use (yellow), it is very taxing to size up the yellow area relative to the total area.

So it is that we seek a graph that treats the two variables (source, use sector) equitably.  More later.

Update: Jon posted a response here, and points to a tutorial for creating Marimekko type charts.