This WSJ graphic caught my eye. The accompanying article is here.
The article (judging from the sub-header) makes two separate points, one about the total amount of money raised in IPOs in a year, and the change in market value of those newly-public companies one year from the IPO date.
The first metric is shown by the size of the bubbles while the second metric is displayed as distances from the horizontal axis. (The second metric is further embedded, in a simplified, binary manner, in the colors of the bubbles.)
The designer has decided that the second metric - performance after IPO - to be more important. Therefore, it is much easier for readers to know how each annual cohort of IPOs has performed. The use of color to map to the second metric (and not the first) also helps to emphasize the second metric.
There are details on this chart that I admire. The general tidiness of it. The restraint on the gridlines, especially along the horizontal ones. The spatial balance. The annotation.
And ah, turning those bubbles into lollipops. Yummy! Those dotted lines allow readers to find the center of each bubble, which is where the values of the second metrics lie. Frequently, these bubble charts are presented without those guiding lines, and it is often hard to find the circles' anchors.
That leaves one inexplicable decision - why did they place two vertical gridlines in the middle of two arbitrary years?
Your first impression is to interpret the graphic as a bar chart. But it really is a bar within a bar: the crux of the matter - gender balance - is embedded in individual bars.
Instead of pie charts or stacked bar charts, we see stacked columns within each bar.
I see what the designer is attempting to accomplish. The first message is the sharp decline in gender equality at higher job titles. The next message is the sharp drop in the frequency of higher job titles.
This chart is a variant of the "Marimekko" chart (beloved by management consultants), also called the mosaic chart. The only difference being how the distribution of jobs in the work force is coded.
The Marimekko is easier to understand:
A key advantage of this version is to be found in the thin columns.
Here is another way to visualize this data, drawing attention to the gender gap.
In the other versions, the reader must do subtractions to figure out the size of the gaps.
The following chart traces the flow of funds into AI (artificial intelligence) startups.
I found it on this webpage and it is attributed to Financial Times.
Here, I apply the self-sufficiency test to show that the semicircles are playing no role in the visualization. When the numbers are removed, readers cannot understand the data at all. So the visual elements are toothless.
Actually, it's worse. The data got encoded in the diameters of the semicircles, but not the areas. Thus, anyone courageously computing the ratios of the areas finds their effort frustrated.
Here is a different view that preserves the layout:
The two data series in the original chart show the current round of funding and the total funds raised. In the junkcharts version, I decided to compare the new funds versus the previously-raised funds so that the total area represents the total funds raised.
The title of the article is "Fiscal Constraints Await the Next President." The key message is that "the next president looks to inherit a particularly dismal set of fiscal circumstances." Josh Zumbrun, who tipped Jimmy about this chart on Twitter, said that it is worth spending time on.
I like the concept of the chart, which juxtaposes the economic condition that faced each president at inauguration, and how his performance measured against expectation, as represented by CBO predictions.
The top portion of the graphic did require significant time to digest:
A glance at the sidebar informs me that there are two scenarios being depicted, the CBO projections and the actual deficit-to-GDP ratios. Then I got confused on several fronts.
One can of course blame the reader (me) for mis-reading the chart but I think dataviz faces a "the reader is always right" situation -- although there can be multiple types of readers for a given graphic so maybe it should say "the readers are always right."
I kept lapsing into thinking that the bold lines (in red and blue) are actual values while the gray line/area represents the predictions. That's because in most financial charts, the actual numbers are in the foreground and the predictions act as background reference materials. But in this rendering, it's the opposite.
For a while, a battle was raging in my head. There are a few clues that the bold red/blue lines cannot represent actual values. For one thing, I don't recall Reagan as a surplus miracle worker. Also, some of the time periods overlap, and one assumes that the CBO issued one projection only at a given time. The Obama line also confused me as the headline led me to expect an ugly deficit but the blue line is rather shallow.
Then, I got even more confused by the units on the vertical axis. According to the sidebar, the metric is deficit-to-GDP ratio. The majority of the line live in the negative territory. Does the negative of the negative imply positive? Could the sharp upward turn of the Reagan line indicate massive deficit spending? Or maybe the axis should be relabelled surplus-to-GDP ratio?
As I proceeded to re-create this graphic, I noticed that some of the tick marks are misaligned. There are various inconsistencies related to the start of each projection, the duration of the projection, the matching between the boxes and the lines, etc. So the data in my version is just roughly accurate.
To me, this data provide a primary reference to how presidents perform on the surplus/deficit compared to expectations as established by the CBO projections.
I decided to only plot the actual surplus/deficit ratios for the duration of each president's tenure. The start of each projection line is the year in which the projection is made (as per the original). We can see the huge gap in every case. Either the CBO analysts are very bad at projections, or the presidents didn't do what they promised during the elections.
Someone at the Wall Street Journal noticed that Denver's transit agency has outspent other top transit agencies, after accounting for number of rides -- and by a huge margin.
But the accompanying graphic conspires against the journalist.
For one thing, Denver is at the bottom of the page. Denver's two bars do not stand out in any way. New York's transit system dwarfs everyone else in both number of rides and total capital expenses and funding. And the division into local, state, and federal sources of funds is on the page, absorbing readers' mindspace for unknown reasons.
My friend Tonny M. sent me a tip to two pretty nice charts depicting the state of U.S. healthcare spending (link).
The first shows U.S. as an outlier:
This chart is a replica of the Lane Kenworthy chart, with some added details, that I have praised here before. This chart remains one of the most impactful charts I have seen. The added time-series details allow us to see a divergence from about 1980.
The second chart shows the inequity of healthcare spending among Americans. The top 10% spenders consume about 6.5 times as much as the average while the bottom 16% do not spend anything at all.
This chart form is standard for depicting imbalance in scientific publications. But the general public finds this chart difficult to interpret, mostly because both axes operate on a cumulative scale. Further, encoding inequity in the bend of the curve is not particularly intuitive.
So I tried out some other possibilities. Both alternatives are based on incremental, not cumulative, metrics. I take the spend of the individual ten groups (deciles) and work with those dollars. Also, I provide a reference point, which is the level of spend of each decile if the spend were to be distributed evenly among all ten groups.
The first alternative depicts the "excess" or "deficient" spend as column segments.
The second alternative shows the level of excess or deficient spending as slopes of lines. I am aiming for a bit more drama here.
Now, the interpretation of this chart is not simple. Since illness is not evenly spread out within the population, this distribution might just be the normal state of affairs. Nevertheless, this pattern can also result from the top spenders purchasing very expensive experimental treatments with little chance of success, for example.
This Economist chart has a great concept but I find it difficult to find the story: (link)
I am a fan of color-coding the text as they have done here so that part is good.
The journalist has this neat idea of comparing those who are apathetic ("don't care about whether Britain is in or out") and those who are passionate ("strongly prefer" that Britain is either in or out).
The chosen format suffers because of graphical inequity: the countries are sorted by decreasing apathy, which means it is challenging to figure out the degree of passion.
This chosen order is unrelated to the question at hand. One possible way of interpreting the chart is to compare individual countries against the European average. The journalist also recognizes this, and highlighted the Euro average.
The problem is that there are two different averages and no good way to decide whether a particular country is above or below average.
Here is my version of the chart:
The biggest change is to create the new metric: how many people say they really care about Brexit/Bremain for every person who say they don't care. In Britain, over four people really care for each one who doesn't while in Slovenia, you can only find fewer than half a person who really cares for each one who doesn't.
Catching a dose of Alberto Cairo the other day. He has a good post about various Brexit/Bremain maps.
The story started with an editor of The Spectator, who went on twitter to make the claim that the map on the right is better than someone else's map on the left:
There are two levels at which we should discuss these maps: the scaling of the data, and the mapping of colors.
The raw data are percentages based on counts of voters so the scale is decimal. In general, we discretize the decimal data in order to improve comprehension. Discretizing means we lose granularity. This is often a good thing. The binary map on the left takes the discretization to its logical extreme. Every district is classified as either Brexit (> 50% in favor) or Bremain (> 50% opposed). The map on the right uses six total groups (so three subgroups of Brexit and three subgroups of Bremain.
Then we deal with mapping of numbers to colors. The difference between these two maps is the use of hues versus shades. The binary map uses two hues, which is probably most people's choice since we are representing two poles. The map on the right uses multiple shades of one hue. Alternatively, Alberto favors a "diverging" color scheme in which we use three shades of two hues.
The editor of The Spectator claims that his map is more "true to the data." In my view, his statement applies in these two senses: the higher granularity in the scaling, and also, the fact that there is only one data series ("share of vote for Brexit") and therefore only one color.
The second point relates to polarity of the scale. I wrote about this issue before - related to a satisfaction survey designed (not too well) by SurveyMonkey, one of the major online survey software services. In that case, I suggested that they use a bipolar instead of unipolar scale. I'd rather describe my mood as somewhat dissatisfied instead of a little bit satisfied.
I agree with Alberto here in favor of bipolarity. It's quite natural to underline the Brexit/Bremain divide.
Given what I just said, why complain about the binary map?
We agree with the editor that higher granularity improves comprehension. We just don't agree on how to add graularity. Alberto tells his readers he likes the New York Times version:
This is substantively the same map as The Spectator's, except for 8 groups instead of 6, and two hues instead of one.
Curiously enough, I gave basically the same advice to the Times regarding their maps showing U.S. Presidential primary results. I noted that their use of two hues with no shades in the Democratic race obscures the fact that none of the Democratic primiaries was a winners-take-all contest. Adding shading based on delegate votes would make the map more "truthful."
That said, I don't believe that the two improvements by the Times are sufficient. Notice that the Brexit referendum is one-person, one-vote. Thus, all of the maps above have a built-in distortion as the sizes of the regions are based on (distorted) map areas, rather than populations. For instance, the area around London is heavily Bremain but appears very small on this map.
The Guardian has a cartogram (again, courtesy of Alberto's post) which addresses this problem. Note that there is a price to pay: the shape of Great Britain is barely recognizable. But the outsized influence of London is properly acknowledged.
This one has two hues and four shades. For me, it is most "truthful" because the sizes of the colored regions are properly mapped to the vote proportions.