This WSJ graphic caught my eye. The accompanying article is here.
The article (judging from the sub-header) makes two separate points, one about the total amount of money raised in IPOs in a year, and the change in market value of those newly-public companies one year from the IPO date.
The first metric is shown by the size of the bubbles while the second metric is displayed as distances from the horizontal axis. (The second metric is further embedded, in a simplified, binary manner, in the colors of the bubbles.)
The designer has decided that the second metric - performance after IPO - to be more important. Therefore, it is much easier for readers to know how each annual cohort of IPOs has performed. The use of color to map to the second metric (and not the first) also helps to emphasize the second metric.
There are details on this chart that I admire. The general tidiness of it. The restraint on the gridlines, especially along the horizontal ones. The spatial balance. The annotation.
And ah, turning those bubbles into lollipops. Yummy! Those dotted lines allow readers to find the center of each bubble, which is where the values of the second metrics lie. Frequently, these bubble charts are presented without those guiding lines, and it is often hard to find the circles' anchors.
That leaves one inexplicable decision - why did they place two vertical gridlines in the middle of two arbitrary years?
When this post appears, I will be on my way to Seattle. Maybe I will meet some of you there. You can still register here.
I held onto this tip from a reader for a while. I think it came from Twitter:
The Economist found a fun topic but what's up with the axis not starting at zero?
The height x weight gimmick seems cool but on second thought, weight is not the same as girth so it doesn't make much sense!
In the re-design, I use bubbles to indicate weight and vertical location to indicate height. The data aren't as interesting as one might think. All the actors pretty much stayed true to the comic-book ideal, with Adam West being the closest. I also changed the order of the actors.
I left out the Lego, as it creates a design challenge that does not justify the effort.
Robert Kosara posted a response to my previous post.
He raises an important issue in data visualization - the need to aggregate data, and not plot raw data. I have no objection to that point.
What was shown in my original post are two extremes. The bubble chart is high drama at the expense of data integrity. Readers cannot learn any of the following from that chart:
the shape of the growth and subsequent decline of the flu epidemic
the beginning and ending date of the epidemic
the peak of the epidemic*
* The peak can be inferred from the data label, although there appears to be at least one other circle of approximately equal size, which isn't labeled.
The column chart is low drama but high data integrity. To retain some dramatic element, I encoded the data redundantly in the color scale. I also emulated the original chart in labeling specific spikes.
The designer then simply has to choose a position along these two extremes. This will involve some smoothing or aggregation of the data. Robert showed a column chart that has weekly aggregates, and in his view, his version is closer to the bubble chart.
Robert's version indeed strikes a balance between drama and data integrity, and I am in favor of it. Here is the idea (I am responsible for the added color).
Where I depart from Robert is how one reads a column chart such as the one I posted:
Robert thinks that readers will perceive each individual line separately, and in so doing, "details hide the story". When I look at a chart like this, I am drawn to the envelope of the columns. The lighter colors are chosen for the smaller spikes to push them into the background. What might be the problem are those data labels identifying specific spikes; they are a holdover from the original chart--I actually don't know why those specific dates are labeled.
In summary, the key takeaway is, as Robert puts it:
the point of this [dataset] is really not about individual days, it’s about the grand totals and the speed with which the outbreak happened.
We both agree that the weekly version is the best among these. I don't see how the reader can figure out grand totals and speed with which the outbreak happened by staring at those dramatic but overlapping bubbles.
On the sister blog, I wrote about Hans Rosling’s recent presentation in New York (link). I noted that Rosling has apparently simplified his visual palette.
Rosling is best known as the developer of the Gapminder tool, used to visualize global social statistics data collected by national statistical agencies. I wrote favorably about this tool in a series of posts (link). Gapminder made popular the moving bubble chart, although not the only graphical form present.
These animated bubble charts also made Rosling a YouTube star (See here.)
In last week’s presentation, Rosling only showed one moving bubble chart. The rest of his graphics are noticeably simpler, something that anyone can produce on Excel or Powerpoint. Here is one example:
I’m particularly impressed by a simple sequence of charts in which Rosling explains the demographic changes the world is expecting to see in the next 50 to 100 years.
This is an enhanced area chart. Each slice of area is subdivided into stick figures so that an axis for population counts becomes unnecessary.
Instead, the reader sees two useful dimensions: region of the world, and age group.
How the population ages as it grows is the feature story and the effect of aging is ingeniously portrayed as layers. This becomes apparent as Rosling lets time roll forward, and the layers literally walk off the page. (Unfortunately, I couldn't capture each step fast enough.)
(This photo courtesy of Daniel Vadnais.)
When Rosling showed the 2085 projection, we find that the entire rectangle has filled up, so the world population has definitely grown, roughly by 30 percent. The growth happens by filling up of adults; the total number of children has not changed. This is one of the key insights from recent demographic data. The first photo above shows something remarkable: the fertility rate in Asian countries has plunged to about the same level of developed countries already.
This set of charts is unusually effective. It represents another level of simplification in visual means. At the same time, the message is sharpened.
As I reported the other day (link), Rosling does not believe modern tools have improved data analysis. This talk which utilized simple tools is a good demonstration of his point.
What makes this work is that the picture of the running back serves a purpose here, in organizing the data. Contrast this to the airplane from Consumer Reports (link), which did a poor job of providing structure. An alternative of using a bar chart is clearly inferior and much less engaging.
I went ahead and experimented with it:
I fixed the self-sufficiency issue, always present when using bubble charts. In this case, I don't think it matters whether the readers know the exact number of injuries so I removed all of the data from the chart.
Here are three temptations that I did not implement:
Not include the legend
Not include the text labels, which are rendered redundant by the brilliant idea of using the running guy