New but is it better?

Conventionally, the bracket in a sports tournament is presented like this (link):


In the Euro 2012 that's happening right now, the group stage is followed by the knockout stage (quarter-, semi- and final).

The knockout stage is pretty straightforward. The group stage presents some challenges because it's difficult to present the chronology together with the team standing at the same time.


The official site of Euro 2012 has an innovative "Tournament Map" that is an attempt to improve upon the traditional design. (link)


I have mixed feelings about this presentation. It's easier to get a sense of how each team performed chronologically over the course of the competition. But then, I can't figure out what day the winner of a quarterfinal would play in the semifinal.

Ron Paul confuses the charts

Andrew Sullivan (link) re-printed this grouped column chart showing the result of a Washington Post-ABC poll on how voters say they would react to Ron Paul running as an independent candidate in next year's U.S. presidential election.


One aspect of this chart bothers me... depending on one's familiarity with the election politics, the need to read carefully both the titles at the bottom of the chart, and the legend, and possibly also the title of the chart (or the knowledge that the Republican wears red and Democrat blue) in order to orient onself. You can experiment by blocking out one or two of these three items.

Here's the same chart with a small number of fixes. Printing the legend onto the bars themselves makes the data more readable. This change necessitates flipping the columns over to horizontal bars. There are pros and cons to using a stacked chart versus a grouped chart.


Neither of these charts answer the burning question in the reader's mind, which is likely to be from whom would Paul take his votes. The key message from above is that the insertion of Paul is projected to make the identity of the Republican candidate irrelevant. The following flow chart emphasizes the shift in votes as opposed to the vote totals.


It appears that the Others/Undecided voters who can still swing the election do not consider Ron Paul as a desirable alternative. Most of Ron Paul's supporters would come from voters who would have cast their votes for the Republican or Democratic candidate (by a ratio of 3 Republican votes to 1 Democratic vote if Romney is running, or 3 to 2 if Gingrich is running).

Lost in complexity

Felix Salmon (link) and others linked to this BBC News graphic about European debt recently.



At first sight, the use of arrows inside a ring, enhanced by an interactive filter by country, seems to be an inspired idea.

Then, I started clicking. Here is the German view.


According to the paragraph beneath the headline, the arrows show how much money is owed by each country to banks in other nations. So, it appears that German banks have lentborrowed about equal amounts tofrom France and Italy, and also good amounts tofrom the U.S., the U.K. and Japan. And German banks would be affected if these debtors were to default.

Now, take a look at the right column where BBC tells us "The biggest European economy is exposed to Greek, Irish and Portuguese, but mostly, Spanish debt." Say what?

Much more important than appearance, the designer must ensure that the data and conclusions make sense. Here, the chart doesn't support the discussion.


See also my previous post about Europe debt. 



Reading the landscape

Here are some posts I find worth reading on other graphics blogs:

Nick has done wonderful work on the evolution of the rail industry in the U.S., with a flow chart showing how mergers have produced the four giants of today, as well as a small multiples of maps showing how they split up the country.

A lovely feature of the flow chart is the use of red lines to let readers see at a glance that Union Pacific is the only rail company that has lasted the entire 4 decades, while the other 3 giants came into being within the last 20 years.

On the maps, notice a slight inconsistency between the left and right columns: on the right side, both maps have the same set of anchor cities, which act as "axes" to help readers compare the maps; on the left side, the sets of anchor cities are not identical. It would also be interested to see a version with all four route maps superimposed and differentiated by color. That may bring out the competitive structure better.


Georgette has a nice post summarizing issues with picking colors when producing charts. Her blog is called Moved by Metrics.


Meanwhile, Martin finds a shockingly poor pie chart here.


There was a time where you'd find the kind of heatmaps featured here by Nathan as wallpaper in my office. It's a great visualization tool for exploring temporal patterns in large data sets. However, I'd never even think of putting these in a presentation.  It's a starting point, not an end-point, of an analysis project. Some things are wonderful for consumption only in private!






False promises of equality and structure

(Here's something especially for those like me who are stuck in their homes in the Northeast USA this weekend.)

A few readers weren't impressed by Nielsen's presentation of the smartphone marketplace:


This chart type is very popular, both among business consultants and statisticians. Consultants call them "marimekko charts" while statisticians call them "mosaic charts". It's got multiple names as it has been reinvented multiple times. I have nightmares from having to produce this sort of charts in Powerpoint by hand (deconstructing and reconstructing column charts), and I have written before about my dislike of them (see here, and here).


Supporters point to two advantages of this type of chart:

  • Equality: it puts the two dimensions of the market place -- operating system/software, and producer/brand -- on equal footing. As an added bonus, the areas of the rectangles are meaningful: they correspond to the relative market shares.
  • Structure: the chart often reveals interesting aspects of the structure of the data. For instance, here it shows that certain smartphones have "closed" systems where the OS and producer forms a one-to-one relationship while some producers like HTC makes phones with different operating systems.

A little thought exposes these as false promises.

The two dimensions are, in fact, not equal. Look at the one contiguous column for Apple versus two separate sections for HTC. In order to know the market share of HTC, the reader needs to do additions... in his/her head. While this is not so hard when HTC appears only twice, your reader would not be amused if HTC appears seven times on the same mosaic. It is a limitation of this chart type that one cannot get the column sections to be of one piece without destroying the one-piece structure of the row sections.

In addition, I don't think it is easy to compare the areas of fat rectangles versus narrow rectangles, or squares versus long strips, etc. On consulting style charts, you almost always find the entire data set printed, which is to say, this chart is rendered not self-sufficient. On statistical charts, you typically find axis labels; this is not much better because of the difficulty in estimating relative areas.

The extent to which one can learn the structure of the data is restricted by our ability to estimate and sum areas.


In the junkart version, I use a flow chart. Special attention is paid to expressing as clearly as possible the structure of the marketplace, thus the separate sections for the "open" versus "closed" systems, as well as the many-to-many relationships among the "open"-system players.


The thickness of the flows is proportional to the market shares. I added a few data points to anchor the scale. The two dimensions of the data are treated symmetrically.

There is also no need to startle readers with a kaleidoscope of colors so typical of marimekkos.






The cross-hairs of religions

Long-time reader Nick B. found this attractive flow chart.


The chart was produced by the Internet Monk blog. The data was culled from this report (PDF) by the Pew Forum.

The cross-hairs trumpet excitement but the reader is left without much. One could tell that the unaffiliated proportion (red) has more than doubled, mostly at the expense of Catholics (green); that most religions retain the vast majority of their faithful (at least by internal proportions); and that people of one or another faith  move to one or another faith.

Yet, any of these high-level insights do not require a chart that contains data on movement between each pair of religions.

One smart thing about this chart is the inclusion of "unaffiliated / no religion", which completes the picture; otherwise, some previously faithful people would drop off the chart (literally).

The other smart thing is its self-sufficiency: none of the data is printed on the chart, and I doubt readers miss them.


Here, I attempt an alternative, which is a variant of the Web of Debt chart discussed here.


Note the economy of colors, lines, etc. I have chosen to use the number of people with a particular childhood faith as the base for all the percentages; other bases can be selected. For example, the unaffiliated has grown by 144% of the childhood base, with about half of that growth coming from previous Protestants; meanwhile, an exodus of Catholics has occurred. (PS. the data for other faiths being incomplete in the aforementioned report, I made up some of the data so as to finish the chart.)

If the line thickness is made proportional to the percentages, that would eliminate the need to have all those numbers on the chart.

Untangling Europe's debt web

A number of blogs have hailed this NYT diagram/chart/infographic as "nice". The accompanying article is here.


Whether this is nice or not depends on what message you want to convey with this graphic. If it is entanglement, then yes, the graphic reveals the complexity very well. If one wants to understand the debt situation in Europe, then no, this chart doesn't make it clear at all.

From the perspective of someone wanting to dissect the debt web, an enhanced data table is hard to beat.


The first section looks at the interdependency between the five troubled countries, collectively known as PIIGS on Wall Street. The additional debt owed to Britain, Germany and France are shown below. Notice that the original chart does not treat these three countries the same way as PIIGS: we do not know what the values are of the arrows pointing from these three into PIIGS.


Expressed on per-capita terms, Ireland stands out as the worst of the bunch while the citizens of the other four countries are bearing roughly equal amounts of debt per person.


I tried to come up with something more "fun", as below:

Redo_eudebt_revHere, I opted to use a small multiples chart to split the countries. In so doing, I accepted redundancy in search of clarity. Each amount is plotted twice once as a borrowing (red line) and once as a lending (black arrow).

It is immediately clear why Greece is the most urgent issue.

Perhaps the chart type is not as important as the transformations I did to the data:

1) All amounts shown are "net" amounts between any pair of countries. In the original data, there are two arrows between each pair. For example, Italy owes Ireland $46 million but Ireland owes Italy $18 million; this means Italy owes Ireland $28 million net.

2) All amounts are expressed per capita. Since the populations of these countries vary from 4.5 million (Ireland) to 60 million (Italy), the total debt cannot and should not be compared to each other.

3) Not shown here but I also expressed the net amount lent/borrowed per dollar of GDP. This is another metric that makes sense. The nominal GDP of these countries range from $0.2 - $2 trillion. The PPP GDP has a similar range.

4) One item I did not fix is the currency. Given the fluctuation in exchange rate between the Euro and US$, I think it may be better to express all the numbers in Euros.

A next step would be to include Britain, Germany and France.

Reference: "In and Out of Each Other's European Wallets", New York Times, April 30, 2010.

PS. Reader JF pointed out an inconsistency in the numbers on the chart. I revised the chart to fix this issue. In the current chart, one can read the information as: the average Portuguese owes Spain $5,453, owes Italy $141, while having lent $903 to Greece and $1,561 to Ireland. Each chart can be interpreted from the perspective of the average citizen in that particular country. (For details, see the comments below.)

Leave good alone

In Cousin misfit, we looked at a problematic area chart in which the areas on the chart contain no useful information. The lines in a line chart should carry some meaning, and so too should areas in an area chart.


The Wall Street Journal recently printed something that looked like a cross between a column chart, an area chart, and a flow chart.  Whatever it is, the areas of the pieces do not match the data.

The data describes how the TV market is split between the top 5 brands (comprising over 50% of the total unit sales) and all other brands -- basically the six numbers printed on the chart.

The graphical construct can be broken up into three parts: a stacked column (on the left), a stacked column with gaps (on the right), and some connecting areas (which are parallelograms).

The last two parts are unnecessary, and in particular, the parallelograms distort the total areas.

It can be baffling to the reader why the left column is shorter than the right column when both show the identical data.

At first, I thought this is some kind of flow chart illustrating the change in market share over time but that's not the case.

What's wrong with the standard stacked column?

Reference: "Samsung Edges out TV Rivals", Wall Street Journal, Feb 17 2010.


Here are some of my favorite links from other places:

GeneticsA spatial journey illustrating a very long scale, created by the Genetic Science Learning Center (here)

Long scales are very difficult to deal with in charts; I have never been satisfied with log scales since it addresses the designer's challenge of trying to fit everything onto one page, bu does not deal with the reader's need to compare the elements accurately

Not sure how this helps but perhaps some of you will figure it out

Movie_narrative_charts_large Tommi left a comment about this conceptual chart by xkcd, which has been making the rounds.  Fits into our Light Entertainment category.

Says there is no optimal chart type.  A type that works very well for one data set may get hopelessly cluttered for another, similar data set.

Unemploystate From fellow bloggers (especially Jorge), a whole series of views of the U.S. unemployment figures by state over time.  Alternatives that are much more interesting to look at than the typically line chart. Jorge even found something in Excel that looks good.