The bubble chart is one of the most hopeless data graphics ever invented. It is sometimes useful for conceptual charts but trying to express data with it is a lost cause.
The Wall Street Journal used a bubble chart to show the trend in whistle-blower lawsuits in the U.S. The original chart looks like this:
Focus on the top part of the chart. Now apply the self-sufficiency test (link), as follows:
First, cover up the data labels. You'll notice that no information is conveyed by the bubbles in and of themselves.
Second, give yourself a hint. The size of the first bubble corresponds to 363 suits. What does that tell you about the second bubble? Unfortunately, the answer is still nothing.
Third, give yourself two hints. The second bubble from the left has size 311. Now try to estimate the size of the rightmost bubble given those two pieces of data. This exercise is still extremely taxing.
Thus, the conclusion about bubble charts is:
That is to say, it fails the self-sufficiency test (link). The chart cannot exist without the data labels. The graphical elements do not provide any additional value.
Augustine F. (@acfou) was not amused by a set of charts made by the Bureau of Labor Statistics, via Business Insider (link). Here's one of them:
The article's message is that the book, periodical and music stores industry has shrunk drastically (over 50%) in the last 10 years but unless you spend time studying the chart, you're not likely to get this picture.
The bubbles are going right and up, which usually is indicative of an increasing trend. What is tripping us up is the employment level occupying the horizontal axis rather than the expected time dimension. The only real way to see the plunge in employment is to focus on the horizontal axis, and to notice the deepening color of the bubbles.
The chart is actually a scatter plot of number of firms versus number of employees. The slope of the line gives us the number of firms per employee, which is also unexpected since the usual metric is its reciprocal, the number of employees per company. However, since the slope is essentially constant, highlighting this number is pointless. While the industry is collapsing, the average workforce of the surviving firms has remained more or less the same.
I added a cone to the chart to visualize the narrow range in which the employees per firm varied during the past decade.
As if it's not confusing enough, the reciprocal of the slope is coded to the size of the bubbles on the chart. This requires a legend to explain. All of this means that readers' attention is directed to the average work force metric, instead of the drop in employment.
The following indexed chart shows that the number of employees and the number of firms dropped in step during the ten years. Both dropped about 55% during the decade. This just confirms that the average employee per firm metric is not meaningful.
If you follow the link to the BLS analysis, you'll find some other interesting data, namely the "internet publishing" industry. Does it make sense to talk about the drastic decline in traditional publishing without talking about the rise of the "substitute" industry? The chart below shows that the new jobs created in Internet publishing filled almost all of the hole left in the traditional publishing industry. The decline from 2009 on may not be specific to the industry; it could just be the Great Recession. (As defined, I don't think the two industry sectors are exactly what I'm looking for, but it's close enough.)
I enjoy looking at the New York Times' summation of National Convention speeches via visualization. (link)
It's a disguised word cloud combined with a bubble chart with a little bar chart thrown in for good measure.
The size of the bubble is the total number of mentions of particular words or phrases. So the bubbles tell us the importance of specific concepts in aggregate of two parties.
It's the split within each bubble that represents the relative emphasis by party. Helpfully, the bubbles are sorted from left to right with the most Democratic words on the left. This splitting uses a bar chart paradigm. The diameter of the bubble is being partitioned, not the areas of the segments.
I wanted to see this as a straight-out word cloud. In the following, I use the red-blue-purple color gradient to indicate the Republican-Democratic bias, and the size of the words to indicate the number of mentions.
This word cloud is created using the Wordle tool, advanced options. My colleague John helped me pick the colors. (By the way, I don't like the insertion of small words within large letters, like what happened here inside the O in Obama.)
Also, I'd line the colors up so that the red words are on one side, blue on the other and purple in the middle. I'd need a different tool to be able to exercise this type of control.
This WSJ graphic gives me a reason to talk about the self-sufficiency test: go ahead, and block out the data labels on the chart, you are left with concentric circles but no way to learn anything from the chart, not the absolute dollar values, nor the relative dollar values. In other words, the only way to read this chart is to look at the data labels.
The online article does not include the graphic. It's an article talking about Neil Armstrong's death. Here's the same data using bar charts:
The chart would be much improved if a longer time series is included giving us values for each year. It's pretty clear that this data is subject to sudden jumps (e.g. Armstrong's death) and so picking arbitrary years will likely cause is to miss important events.
Circles are also subject to various types of optical illusion. Before you use bubble plots, give the following a look:
Can we judge the size of circles in relation to other circles? (credit)
Can we judge the relative distance between circles? (credit)
Can we judge the relative sizes of circles within circles? (credit)
An email lay in my inbox with the tantalizing subject line: "How to Create Good Infographics Quickly and Cheaply?" It's a half-spam from one of the marketing sites that I signed up for long time ago. I clicked on the link, which led me to a landing page which required yet another click to get to the real thing (link). (Now, you wonder why marketers keep putting things in your inbox!)
The article was surprisingly sane. The author, Carrie Hill, suggests that the first thing to do is to ask "who cares?" This is the top corner of my Trifecta Checkup, asking what's the point of the chart. Some of us not so secretly hope that answer to "who cares?" is no one.
Carrie then lists a number of resources for creating infographics "quickly and cheaply".
Easel.ly caught my eye. This website offers templates for creating infographics. You want time-series data depicted as a long, hard road ahead, you have this on the right.
You want several sections of multi-colored bubble charts, you have this theme:
In total, they have 15 ready-made templates that you can use to make infographics. I assume paid customers will have more.
infogr.am is another site with similar capabilities, and apparently for those with some data in hand.
Based on this evidence, the avanlanche of infographics is not about to pass. In fact, we are going to see the same styles repetitively. It's like looking at someone's Powerpoint presentation and realizing that they are using the "Advantage" theme (one of the less ugly themes loaded by default). In the same way, we will have a long, winding road of civil rights, and a long, winding road of Argentina's economy, and a long, winding road of Moore's Law, etc.
But I have long been an advocate of drag-and-drop style interfaces for producing statistical charts. So I hope the vendors out there learn from these websites and make your products ten times better so that it is as "quick and cheap" to make nice statistical charts as it is to make infographics.
Reader Sushil B. offers this chart from Business Week on hedge fund returns. (link)
Unmoored bubbles, slanted text, positive and negative returns undifferentiated, bubble within bubble, paired data scattered apart, and it's not even that attractive.
Here is a Bumps-chart style version of this data:
The author never explained how the five funds were chosen so it's hard to know what's the point of the chart. It appears like Harbinger Capital Partners had a similar experience as Paulson. In addition, given the potentially huge gyrations from year to year, it's very odd that we are not shown the annual returns between 2007 and 2011... we can't be sure that some of the three other funds suffered a particularly bad year in between the end points shown here.
Ryan McCarthy linked to a post by Ruchir Sharma running on Ezra Klein's blog analyzing global billionaires.
It has an accompanying chart, which fails our self-sufficiency test. That test involves erasing raw data from a chart, and figuring out how much information the graphical elements themselves convey.
The primary metric used by Sharma is the billionares' total net worth as a percentage of the country's GDP. This metric is embedded in double concentric circles. Unfortunately, without mental gymnastics, readers can't tell what the proportion is. This means we must look at the raw data which is supplied as a column on the right of the graphic. If readers are taking the information from the column of raw data, then why draw a chart?
The actual data is revealed on the left . Don't tell anyone you read it here but pie charts would work well with this dataset. You might complain that there is a conceptual problem - that if we sum up the net worth of everyone in a country, it would not equal GDP. I think the sum doesn't work - economists can chime in about this. Sharma seems to imply that the total would sum to 1. Anyone's net worth is accumulated over a number of years in which the GDP is fluctuating while the total GDP is given for a specific end of quarter of some year so does it make sense to divide one by the other?
Also, the fact that some people may have negative net worth creates problems with the pie-chart format and it's not much better in a concentric-circle format either.
*** A maddening decision puts the United States, which is the biggest circle, at the bottom of the chart. Notice that the countries are sorted from larger billionaires' share to smaller. The U.S. belongs to the top 5 nations with the worst inequality by this metric and yet a cheeky little bookmark sends us to the bottom of the list together with the more-equal nations.
Not only is the location of U.S. privileged, the location of the text, the number of decimal places given in the net worth amount, and the presence of the GDP value all set the U.S. apart from the other countries plotted.
The most interesting piece of information is waiting to be reconstructed. In Malaysia, nine citizens own as much as 18.3% of the country's GDP. In Mexico, 11 people own 10.9% of the country's GDP.
To make the number even more telling, we have to incorporate the population size. For Malaysia it is 28 million. This means that the top 0.000032% of the population owns 18.3%. In the case of perfect equality, this proportion would own 0.000032%. We can say the inequality index is 570,000. In Mexico, the index is 1.1 million. So in fact, the concentration of wealth at the time is worse in Mexico than in Malaysia. For reference, the U.S. comes in at 78,000.
Of course, the use of billionaires as a filtering device to determine who to count or not is completely arbitrary. In measuring income inequality, one should look at what proportion of the population control 50% of the wealth, for example.
There is no explanation for the choice of countries. The U.S. is the only developed nation in the entire chart.
Reader Joe DiNoto sent me to the following National Post (Canada) chart via Twitter, complaining about the circles. (The full chart is found here.)
This chart is supposed to show that the students in Quebec are wrong to go on strike against a roughly 10% increase in tuition fees because the cost of education in Quebec is dwarfed by those in other provinces. This particular message is visible by virtue of the small amount of space occupied by the Quebec "flower" relative to other provinces.
However, to convey that message would require only a chart of the average tuition of the seven provinces. The dataset here contains a lot more information than just the average: it has the tuition by major. But, does the general pattern of relative tuitions apply to individual majors? This chart type (a disguised bubble chart) does the reader few favors. (At least, the designer managed to keep each "petal" at the same angles; otherwise it would make our lives even harder.)
In order to bring out the tuition by major comparison, the following set of dot plots helps:
The purple dots are Quebec tuitions. The gray dots are the remaining provinces. We find that Quebec is at the bottom of the cost scale for every major. We also learn that the variance of tuition for dentistry, medicine, and law is very high. Surprisingly, the business degree is rather cheap - maybe the demand for it up north is lower?
Where would this chart fall in my "return on effort matrix"? It is an extremely high-effort chart; I got tired trying to figure out what all those dimensions mean.
Is it a high-reward or a low-reward chart? It depends on why you're reading the chart. For most readers, I suspect it's low-reward.
In my view, the best charts are high-reward, low-effort. I'd emphasize that by effort, I mean effort by the reader. In general, the effort by the chart designer is inversely proportional to that by the reader.
In some special cases, high-effort charts may have high reward justifying the destruction of some brain cells.