Right on the heels of the disastrous bubble chart comes another, courtesy of the NYT Magazine. Bubble charts are okay for the conceptual ("this is really big, and that is really tiny"). This chart wants readers to compare the sizes of the bubbles, which highlights the worst part of such graphs.
Poor scaling is the huge issue with bubble charts. They are the prototype of what I call not "self-sufficient" charts. Without printing all the data, the chart is unscaled, and thus useless (see below middle). When all the data is printed (as in the original, below left), it is no better than a data table.
In the above right chart, we simulated the situation of a bar or column chart, i.e. we provide a scale. For this chart, the convenient "tick marks" are at 10, 20, 34, 41. Unfortunately, this scaled version also fails to amuse.
Note further that the data should have been presented in two sections: the party affiliation analysis and the gender analysis. Also, it is customary to place "Independents" between "Republicans" and "Democrats" because they are middle-of-the-road.
A profile chart is an attractive way to show this data. Here, we quickly learn a couple of things obscured in the bubble chart.
On the issue of abortion, Independents are much closer to Democrats than Republicans. Also, there is barely any difference between the genders, the only difference being the strength of support among those who want to legalize.
Reference: "A matter of Choice", New York Times Magazine, Oct 19 2008.
PS. Based on RichmondTom's suggestion, here are the cumulative profile charts.
Frederic M. sent in this chart, together with his commentary.
Bubbles across rows have vastly different numbers but their circles are
of identical size (or vice versa). It borders on the ridiculous that all
bubbles of the US
row have the same size... The question if teenage birth rates and teen sex are
correlated cannot be eye-balled with this kind of display. The fact that you
cannot compare across rows make this an instance of “chart junk”.
White spaces -- always dangerous. Does lack of bubble imply no data or no abortions/sex?
Sorting -- this is what Howard Wainer called "Arizona first" with a twist (United States)
Loss aversion -- would U.S. readers be resentful if countries like Iceland are excluded? A much reduced version comparing U.S. to say Canada, U.K, Japan and Germany may yield more information for the reader.
Sufficiency -- if all the data are printed as in a table, why do we need the bubbles?
Reference: "Let's Talk About Sex ", New York Times, Sep 6 2008.
Guess where I went for vacation (clue in the chart).
This long, narrow country is divided into 15 regions. In the chart below, an uneven parade of 13 bubbles was used to present some sort of economic projections.The capital of the country was singled out as thetop of the table.
The unevenness has a side effect, that the guiding lines are forced to have differing lengths and bewildering turns.Further, because bubbles have no intrinsic scale, the designer must put all the data onto the map as well, thus failing our self-sufficiency test..
The following bar chart version respects the wide, thin space and yet delivers the data more clearly. The top version displays all the data while the bottom one uses a simple axis. The bottom chart is my preference since most readers are probably interested in approximate and relative comparisons, rather than exact projections. (The map would be better off without colors.)
Reference: "Inversiones entre 2008 y 2012 llegaran a US$ 57 mil millones impulsadas por mineria y energia", El Mercurio, Aug 25 2008.
Andrew N., a reader from Australia, wasn't too impressed with the way National Nine News presents the Olympic medal table on its home page. To the extent that we want to venture beyond the typical tabular presentation, this bar chart is in fact quite appropriate. Let me explain.
Lets take a tour around the world. It's the battle of the data tables.
The Boston Globe's is the cleanest of the bunch. I especially like the way they set up the USA count at the top; the use of country codes is inferior to spelling out country names, as done in all of the other examples. The New York Times is the only one to utilize colors to set aside gold, silver and bronze, which lets readers easily assess the two dominant metrics, total golds and total medals. A small touch but very nice.
The biggest design issue here is the existence of the two different metrics. In any tabular presentation, the countries can be ranked by only one metric so the designer must make a choice. The American papers present ranking by total medals; the French paper by total golds; the two Canadian ones shown here are split. The American papers also choose to carry the ranking implicitly while the others explicitly provide a numerical rank. Le Monde and Globe and Mail provide ranks that are consistent with ordering of countries, both by total golds. The Star, by contrast, wants it both ways: the order reflects total medals while the "POS" column shows total golds. This extra column does help the readers who prefer ranking by golds but the primacy of the other ranking has not been overcome.
So what about National Nine News? I have not been a fan of stacked bar charts but surprisingly, this is a great application. Stacked bars have the disadvantage that the stacked segments don't share the same base and thus it is difficult to compare their lengths. Here, though, our two metrics are total medals and total golds so readers should be drawn to compare the total lengths, and the lengths of the first segments. Those wanting to compare silvers and bronzes must make a stronger effort but they will be in the minority.
What can be improved are the distracting data labels, especially the gold circles. Instead, one should provide a scale, or use symbols such as one circle per medal. (See this old post.) Here is a version with a scale:
One cannot end this post without mentioning the attempt by NYT editors to insert levity into these proceedings with first a cartogram and then a bubble chart.
The New York Times continued to push the envelope by printing super-complicated data graphics (while the Economist regrettably seemed to have picked the USA Today route... more on that in a future post). The following graphic was used to illustrate the relationship between CEO compensation and their company's stock performance.
The two dotplot lookalikes depicted the percent change in CEO pay and the change in companies stock price, in both cases, from 2006 to 2007. The size of the dots indicates the relative value of the CEO's pay. The gray dots depict "similarly sized" companies for comparability.
In this post, I will focus on the comparison between change in pay and change in stock price for a given CEO. In particular, the calibration of the axis/scale is problematic. The scale is automatically determined by an algorithm; as one switches from one CEO to another, the graphs take on different ranges, use different axis labels, and the zero-percent points shift.
This means that the two charts have different scales. In this example, each tick mark advances 6% in the top chart but 12% in the bottom chart.
Since the zero points do not line up, the distance between the zero and the orange dot loses meaning: the 2.5x longer distance in the top chart actually represented the same percentage change as in the bottom chart (31% versus 28%).
In order to respect the grid-lines (white lines), the tick marks fall onto stray percentages (24%, 36%, 48%, etc.). That's unfortunate.
What's the culprit? This chart is "bound to extremes". In other words, the range of the depicted data is used to determine the plot area. The bottom chart had zero on the left edge because all the stocks depicted rose between 2006 and 2007. It is often better to use domain knowledge to determine the plot area. Extreme values should be omitted if they don't add to the message. Oftentimes, by leaving extreme values in the picture, we squash the rest of the data.
This is also why programs like Excel do a poor job picking a scale.
As an aside, the use of bubbles is almost always troubling. Bubbles do not have a scale so the only information we get is relative size. However, we can't estimate areas properly so we get the relative size wrong. Sometimes, even the chart designer may get stumped. In the chart of Steve Jobs, you would think his bubble (total compensation $1) would be dwarfed by all the other bubbles, as in the WSJ chart we showed the other day. Not so.
In the course of business and governing, a lot of charts are generated. An anonymous tipster pointed us to a set created by the "Communities and Local Government" division in the UK government. Judging from the content, this division has responsibility for economic development in local neighborhoods.
Below are a pair of exhibits. Truly they are trying too hard! What we see is a hybrid scatter-bubble chart. Between the jargon, the acronyms (LAD, LSOA), the boxed text, the multi-color circles, the colored axis labels and lack of title, the reader is plunged into a state of confusion.
The chart can be unraveled. Each district was evaluated based on two measures of "gaps in worklessness". The vertical axis compares each district to the national average; positive numbers indicate an above-average district relative to the nation. The horizontal axis compares the most deprived 10% neighborhood within each district to the local average; positive numbers indicate worst neighborhoods improving.
Thus, the policy goal would be to move all districts into the upper right quadrant. The multi-color bubbles were designed to show us the state of the nation. On the left chart, 41% of the districts (or population?) reside in the improving districts while 19% live in deteriorating areas.
The following strategies can help improve readability:
use English on the axis
relegate technical definitions to the legend
add succinct title to tell the story
use color on the data rather than on axis or data labels
use color to draw attention to the upper right quadrant