All these charts lament the high prices charged by U.S. hospitals


A former student asked me about this chart from the New York Times that highlights much higher prices of hospital procedures in the U.S. relative to a comparison group of seven countries.

The dot plot is clearly thought through. It is not a default chart that pops out of software.

Based on its design, we surmise that the designer has the following intentions:

  1. The names of the medical procedures are printed to be read, thus the long text is placed horizontally.

  2. The actual price is not as important as the relative price, expressed as an index with the U.S. price at 100%. These reference values are printed in glaring red, unignorable.

  3. Notwithstanding the above point, the actual price is still of secondary importance, and the values are provided as a supplement to the row labels. Getting to the actual prices in the comparison countries requires further effort, and a calculator.

  4. The primary comparison is between the U.S. and the rest of the world (or the group of seven countries included). It is less important to distinguish specific countries in the comparison group, and thus the non-U.S. dots are given pastels that take some effort to differentiate.

  5. Probably due to reader feedback, the font size is subject to a minimum so that some labels are split into two lines to prevent the text from dominating the plotting region.


In the Trifecta Checkup view of the world, there is no single best design. The best design depends on the intended message and what’s in the available data.

To illustate this, I will present a few variants of the above design, and discuss how these alternative designs reflect the designer's intentions.

Note that in all my charts, I expressed the relative price in terms of discounts, which is the mirror image of premiums. Instead of saying Country A's price is 80% of the U.S. price, I prefer to say Country A's price is a 20% saving (or discount) off the U.S. price.

First up is the following chart that emphasizes countries instead of hospital procedures:


This chart encourages readers to draw conclusions such as "Hospital prices are 60-80 percent cheaper in Holland relative to the U.S." But it is more taxing to compare the cost of a specific procedure across countries.

The indexing strategy already creates a barrier to understanding relative costs of a specific procedure. For example, the value for angioplasty in Australia is about 55% and in Switzerland, about 75%. The difference 75%-55% is meaningless because both numbers are relative savings from the U.S. baseline. Comparing Australia and Switzerland requires a ratio (0.75/0.55 = 1.36): Australia's prices are 36% above Swiss prices, or alternatively, Swiss prices are a 64% 26% discount off Australia's prices.

The following design takes it even further, excluding details of individual procedures:


For some readers, less is more. It’s even easier to get a rough estimate of how much cheaper prices are in the comparison countries, for now, except for two “outliers”, the chart does not display individual values.

The widths of these bars reveal that in some countries, the amount of savings depends on the specific procedures.

The bar design releases the designer from a horizontal orientation. The country labels are shorter and can be placed at the bottom in a vertical design:


It's not that one design is obviously superior to the others. Each version does some things better. A good designer recognizes the strengths and weaknesses of each design, and selects one to fulfil his/her intentions.


P.S. [1/3/20] Corrected a computation, explained in Ken's comment.

Crazy rich Asians inspire some rich graphics

On the occasion of the hit movie Crazy Rich Asians, the New York Times did a very nice report on Asian immigration in the U.S.

The first two graphics will be of great interest to those who have attended my free dataviz seminar (coming to Lyon, France in October, by the way. Register here.), as it deals with a related issue.

The first chart shows an income gap widening between 1970 and 2016.


This uses a two-lines design in a small-multiples setting. The distance between the two lines is labeled the "income gap". The clear story here is that the income gap is widening over time across the board, but especially rapidly among Asians, and then followed by whites.

The second graphic is a bumps chart (slopegraph) that compares the endpoints of 1970 and 2016, but using an "income ratio" metric, that is to say, the ratio of the 90th-percentile income to the 10th-percentile income.


Asians are still a key story on this chart, as income inequality has ballooned from 6.1 to 10.7. That is where the similarity ends.

Notice how whites now appears at the bottom of the list while blacks shows up as the second "worse" in terms of income inequality. Even though the underlying data are the same, what can be seen in the Bumps chart is hidden in the two-lines design!

In short, the reason is that the scale of the two-lines design is such that the small numbers are squashed. The bottom 10 percent did see an increase in income over time but because those increases pale in comparison to the large incomes, they do not show up.

What else do not show up in the two-lines design? Notice that in 1970, the income ratio for blacks was 9.1, way above other racial groups.

Kudos to the NYT team to realize that the two-lines design provides an incomplete, potentially misleading picture.


The third chart in the series is a marvellous scatter plot (with one small snafu, which I'd get t0).


What are all the things one can learn from this chart?

  • There is, as expected, a strong correlation between having college degrees and earning higher salaries.
  • The Asian immigrant population is diverse, from the perspectives of both education attainment and median household income.
  • The largest source countries are China, India and the Philippines, followed by Korea and Vietnam.
  • The Indian immigrants are on average professionals with college degrees and high salaries, and form an outlier group among the subgroups.

Through careful design decisions, those points are clearly conveyed.

Here's the snafu. The designer forgot to say which year is being depicted. I suspect it is 2016.

Dating the data is very important here because of the following excerpt from the article:

Asian immigrants make up a less monolithic group than they once did. In 1970, Asian immigrants came mostly from East Asia, but South Asian immigrants are fueling the growth that makes Asian-Americans the fastest-expanding group in the country.

This means that a key driver of the rapid increase in income inequality among Asian-Americans is the shift in composition of the ethnicities. More and more South Asian (most of whom are Indians) arrivals push up the education attainment and household income of the average Asian-American. Not only are Indians becoming more numerous, but they are also richer.

An alternative design is to show two bubbles per ethnicity (one for 1970, one for 2016). To reduce clutter, the smaller ethnicites can be aggregated into Other or South Asian Other. This chart may help explain the driver behind the jump in income inequality.






Getting into the head of the chart designer

When I look at this chart (from Business Insider), I try to understand the decisions made by its designer - which things are important to her/him, and which things are less important.


The chart shows average salaries in the top 2 percent of income earners. The data are split by gender and by state.

First, I notice that the designer chooses to use the map form. This decision suggests that the spatial pattern of top incomes is of top interest to the designer because she/he is willing to accept the map's constraints - namely, the designer loses control of the x and y dimensions, as well as the area and shape of the data containers. For the U.S. state map, there is no elegant solution to the large number of small states problem in the Northeast.

Second, I notice the color choice. The designer provides actual values on the visualization but also groups all state-average incomes into five categories. It's not clear how she/he determines the boundaries of these income brackets. There are many more dark blue states than there are light blue states in the map for men. Because women incomes are everywhere lower than men, the map at the bottom fits all states into two large buckets, plus Connecticut. Women incomes are lower than men but there is no need to break the data down by gender to convey this message.

Third, the use of two maps indicates that the designer does not care much about gender comparisons within each state. These comparisons are difficult to accomplish on the chart - one must involuntarily bob one's head up and down to make the comparisons. The head bobbing isn't even enough: then you must pull out your calculator and compute the ratio of women to men average. If the designer wants to highlight state-level comparisons, she/he could have plotted the gender ratio on a single map, like this:

Screen Shot 2017-09-18 at 11.47.23 PM


So far, I infer that the key questions are (a) the gender gap in aggregate (b) the variability of incomes within each gender, or the spatial clustering (c) the gender gap within each state.

(a) is better conveyed in more aggregate form. Goal (b) is defeated by the lack of clear clustering. (c) is not helped by the top-bottom split.

In making the above chart, I discover a pattern - that women fare better in the smaller states like Montana, Iowa, North & South Dakota. Meanwhile, the disparity in New York is of the same degree as Oklahoma and Wyoming.


 This chart tells readers a bit more about the underlying data, without having to print the entire dataset on the page.




Political winds and hair styling

Washington Post (link) and New York Times (link) published dueling charts last week, showing the swing-swang of the political winds in the U.S. Of course, you know that the pendulum has shifted riotously rightward towards Republican red in this election.

The Post focused its graphic on the urban / not urban division within the country:


Over Twitter, Lazaro Gamio told me they are calling these troll-hair charts. You certainly can see the imagery of hair blowing with the wind. In small counties (right), the wind is strongly to the right. In urban counties (left), the straight hair style has been in vogue since 2008. The numbers at the bottom of the chart drive home the story.

Previously, I discussed the Two Americas map by the NY Times, which covers a similar subject. The Times version emphasizes the geography, and is a snapshot while the Post graphic reveals longer trends.

Meanwhile, the Times published its version of a hair chart.


This particular graphic highlights the movement among the swing states. (Time moves bottom to top in this chart.) These states shifted left for Obama and marched right for Trump.

The two sets of charts have many similarities. They both use curvy lines (hair) as the main aesthetic feature. The left-right dimension is the anchor of both charts, and sways to the left or right are important tropes. In both presentations, the charts provide visual aid, and are nicely embedded within the story. Neither is intended as exploratory graphics.

But the designers diverged on many decisions, mostly in the D(ata) or V(isual) corner of the Trifecta framework.


The Times chart is at the state level while the Post uses county-level data.

The Times plots absolute values while the Post focuses on relative values (cumulative swing from the 2004 position). In the Times version, the reader can see the popular vote margin for any state in any election. The middle vertical line is keyed to the electoral vote (plurality of the popular vote in most states). It is easy to find the crossover states and times.

The Post's designer did some data transformations. Everything is indiced to 2004. Each number in the chart is the county's current leaning relative to 2004. Thus, left of vertical means said county has shifted more blue compared to 2004. The numbers are cumulative moving top to bottom. If a county is 10% left of center in the 2016 election, this effect may have come about this year, or 4 years ago, or 8 years ago, or some combination of the above. Again, left of center does not mean the county voted Democratic in that election. So, the chart must be read with some care.

One complaint about anchoring the data is the arbitrary choice of the starting year. Indeed, the Times chart goes back to 2000, another arbitrary choice. But clearly, the two teams were aiming to address slightly different variations of the key question.

There is a design advantage to anchoring the data. The Times chart is noticeably more entangled than the Post chart. There are tons more criss-crossing. This is particularly glaring given that the Times chart contains many fewer lines than the Post chart, due to state versus county.

Anchoring the data to a starting year has the effect of combing one's unruly hair. Mathematically, they are just shifting the lines so that they start at the same location, without altering the curvature. Of course, this is double-edged: the re-centering means the left-blue / right-red interpretation is co-opted.

On the Times chart, they used a different coping strategy. Each version of their charts has a filter: they highlight the set of lines to demonstrate different vignettes: the swing states moved slightly to the right, the Republican states marched right, and the Democratic states also moved right. Without these filters, the readers would be winking at the Times's bad-hair day.


Another decision worth noting: the direction of time. The Post's choice of top to bottom seems more natural to me than the Times's reverse order but I am guessing some of you may have different inclinations.

Finally, what about the thickness of the lines? The Post encoded population (voter) size while the Times used electoral votes. This decision is partly driven by the choice of state versus county level data.

One can consider electoral votes as a kind of log transformation. The effect of electorizing the popular vote is to pull the extreme values to the center. This significantly simplifies the designer's life. To wit, in the Post chart (shown nbelow), they have to apply a filter to highlight key counties, and you notice that those lines are so thick that all the other countries become barely visible.



Denver outspends everyone on this

Someone at the Wall Street Journal noticed that Denver's transit agency has outspent other top transit agencies, after accounting for number of rides -- and by a huge margin.

But the accompanying graphic conspires against the journalist.


For one thing, Denver is at the bottom of the page. Denver's two bars do not stand out in any way. New York's transit system dwarfs everyone else in both number of rides and total capital expenses and funding. And the division into local, state, and federal sources of funds is on the page, absorbing readers' mindspace for unknown reasons.

But Denver is an outlier, as can be seen here:



What if the RNC assigned seating randomly

The punditry has spoken: the most important data question at the Republican Convention is where different states are located. Here is the FiveThirtyEight take on the matter:


They crunched some numbers and argue that Trump's margin of victory in the state primaries is the best indicator of how close to the front that state's delegation is situated.

Others have put this type of information on a map:


The scatter plot with the added "trendline" is often misleading. Your eyes are drawn to the line, and distracted from the points that are far away from the line. In fact, the R-squared of the regression line is only about 20%. This is quite obvious from the distribution of green shades in the map below.


So, I wanted to investigate the question of how robust this regression line is. The way statisticians address this question is as follows: imagine that the seating has been assigned completely at random - how likely would the actual seating plan have arisen from random assignment?

Take the seating assignments from the scatter plot. Then randomly shuffle the assignment to create simulated random seating plans. We keep the same slots, for example, four states were given #1 positions in the actual arrangement. In every simulation, four states got #1 positions - it's just that which four states were decided by flipping coins.

I did one hundred simulated seating plans at a time. For each plan, I created the scatter plot of seating position versus Trump margin (mirror image of  the FiveThirtyEight chart), and fitted a regression line. The following shows the slopes of the first 200 simulations:


The more negative the slope, the more power Trump margin has in explaining the seating arrangement.

Notice that even though all these plans are created at random, the magnitude of the slopes range widely. In fact, there is one randomly created plan that sits right below the actual RNC plan shown in red. So, it is possible--but very unlikely--that the RNC plan is randomly drawn up.

Another view of this phenomenon is the histogram of the slopes:


This again shows that the actual seating plan is very unlikely to be produced by a random number generator. (I plotted 500 simulations here.)

In statistics, we measure rarity by "standard errors". The actual plan is almost but not quite three standard errors away from the average random plan. A rule of thumb is that 3 standard errors or more is rare. (This corresponds to over 99% confidence.)


PS. Does anyone have the data corresponding to the original scatter plot? There are other things I want to do with the data but I'd need to find (a) the seating position by state and (b) the primary results nicely set in a spreadsheet.

Nice title but dubious message

I like to uaeuse declarative titles for charts. This chart below, found in an investment magazine published by Charles Schwab, wants to tell us that emerging markets "perform differently."


That is a nice concise message. Now, what does the chart say?

Readers have to jump through some hoops. First, the axes are flipped from their normal posture. Time typically is shown running horizontally. And market returns which range widely from positive to negative values are frequently displayed vertically. But not here.

Second, this chart equally treats all three categories of equity returns (domestic, international developed markets, international emerging markets) when the title draws attention to emerging markets. In fact, emerging markets is placed last in the legend. Try blocking the top section, just staring at the grouped bar chart -- the emerging markets do not jump out.

Third, we are asking ourselves what the designer/analyst means by "performing differently." The most obvious difference is the blue spike corresponding to the 79% return in 2009. But in many other years, the blue bar is not obviously different.

One way to interpret "perform differently" is that the emerging market returns exhibit low correlation with the returns in either domestic or international-developed markets. (Such a finding would be helpful to investors looking for diversification.) The scatter plot can be used to examine correlations.


The pattern is surprising. The chart on the left shows that emerging market returns are highly correlated in a linear way with international devleoped-market returns. The chart on the right shows that domestic returns are less correlated with emerging market returns but the correlation is still pretty strong.

There were two unusual years, one (2009) in which emerging markets did quite a bit better and another (2013) in which emerging marketss did quite a bit worse.

These observations imply that the data do not really support the title of the original chart.


Misguided warheads in the classroom

Alberto Cairo just gave a wonderful talk to my workshop, in which he complains about the state of dataviz teaching. So, it's quite opportune that reader Maja Z. sent in a couple of examples from a recent course on data visualization for academics. She was surprised to see these held out as examples of good work. I'll discuss one chart today, and the other one some other day.


The original is from a Korean newspaper.

The instructor for the course praised this chart for this principle: "always try to find a graphic that relates to your subject, like the bullets here representing military spending, and use it in the chart."

For students who take my class, they learn the opposite lesson: I like to say imagery often backfires. I do like charts with imagery that makes the data come alive but more often than not, the designer falls in love with the imagery and let the data down.

This chart presumably shows the top 10 military spenders in the world by total amount spent in 2013. You'd think that the Chinese spent a bit more than half what the Americans did. But the data labels say $640 billion vs $188 billion, only about 30%. Next, the Russian spend is 46% of the Chinese according to the data, etc. So, is this really a data visualization or just some pictures with numbers printed next to them?

It's possible that the data is encoded in the surface areas or the volumes of these warheads but in reality, this is a glorified column chart, so most readers will respond to the heights of the columns.

Perhaps the shadows are there to demonstrate shadow spending.


The designer seems to appreciate that total spending is not necessarily a great metric. Spending as a proportion of GDP is provided as a secondary metric. I'm not so sure what to make of this though: should we expect richer nations to need/want to spend more building bombs and such? It just doesn't seem very logical to me.

Instead, a more meaningful metric might be military spending per capita. Controlling for population seems somewhat logical; the more people you have to protect, the more money you have to spend.

In the end, I made this scatter plot that tries to have it both ways:


(The percentages are of GDP.)

Here, we can see that Saudi Arabia and the U.S. are particularly aggressive spenders, spending over $2000 per person per year. The respective two dots are way above the average line (for the top 10 spenders). At the richer end of the scale, the American spending is way above the international average. On the other hand, Japan and Germany both spend significantly less than would be predicted by their GDP per capita levels. 

Of note, readers more easily relate to the per-capita numbers than the aggregate figures in the original chart. They learn, for instance, that Saudi Arabia's average GDP was $27,000 per head, of which $2,500 went to arming itself up.




People are happier in some parts of the country as Labor Day nears

An anonymous reader sent in a Type V critique of the following map of July unemployment rates by state. The map was published by the Bureau of Labor Statistics (BLS), and used in a recent article in Vox.


Matt @ Vox took the BLS's bait, and singled out Mississippi as the worst in the nation. Our reader-contributor is none too pleased with this conclusion.

He noted that the red state stands out only because of the high "out of sample" top range of the legend. Three out of the seven colors are not found on the map at all! This is kind of like the white space problem when doing a line plot with large values and an axis starting at zero (for example, here), but the opposite. All the states are compressed into four colors, three of which are shades of orange.

The reader investigated, and reported back:

The top end of the legend seems to be set by Puerto Rico's 13.1%. Puerto Rico is omitted from the Vox map as well as from the BLS publication (link to PDF).

Mississippi only has the bare minimum, 8.0%, to qualify for the red color. Georgia is a 7.8; Michigan, Nevada, and Rhode Island are all 7.7.
24 (of the 50 States plus DC) are in the 6-8% band, and 21 are in the 4-6% band, with the remaining 5 under 4%.
None of the above is obvious when looking at the map.
In the Trifecta Checkup, this is a Type V chart. The data is accurate. The question being asked is clear but the visual construction is problematic.
[I'm seizing back the mike.] While the map is often not the best choice for showing geographic data, something we frequently cover on this blog, in this particular case, there is a strong regional pattern. Of course, with the compressed choice of colors, this regional pattern is not easily observed in the original.
The following small-multiples set of maps makes clear the regional pattern.

Happy Labor Day!


Law of small numbers, in action

Loyal reader John M. expressed dismay over Twitter about 538's excessive use of bubble charts. Here's the picture that pushed John over the edge:


The associated article is here.

The question on the table is motivated by the extraordinary performance of a young baseball player Mike Trout. The early success can be interpreted either as evidence of future potential or as evidence of a future drought. As an analogy, someone wins a lottery. You can argue that the odds are so low that winning again is impossible. Or you can argue that winning once indicates that this person is "lucky" and lucky people might win again.

The chart shows the proportion of players who performed even better after the initial success, given the age at which they first broke out. One way to read this chart is to mentally replace the bubbles with dots (or columns), and then interpret the size of the bubbles as the statistical significance of the corresponding probability estimate. The legend says number of players, which is the sample size, which governs the error bar associated with that particular number.

This bubble chart is no different from others: it is impossible to judge the relative sizes of bubbles. Even though the legend provides us two reference points (a nice enough idea on its own), it is still impossible to know, for example, what proportion of players did better later in life when they first peaked at age 24. The bubble for age 23 looks like it's exactly five players but I still cannot figure out how many players the adjacent bubble represents.

The designer should have just replaced each bubble with an error bar, and the chart is instantly more readable. (I have another version of this at the end of the post.)

The rest of the design elements are clean and well-done, particularly use of notes to point out interesting aspects of the data.


From a Trifecta checkup perspective, I am uncertain about how the nature of the data used to investigate the interesting question posed above.

Readers should note the concept of "early success" and "later success" are not universally defined. The author here selects two proxies. Reaching an early peak is equated to "batters first posting 15+ WAR over two seasons". Next, reversion to the mean is defined as not having a better two-year span subsequent to the aforementioned early peak.

Why two seasons? Why WAR and not a different metric? Why 15 as the cutoff? These are all design decisions made while working with the data.

One can make reasonable arguments to justify the above two questions. A bigger head-scratcher relates to the horizontal axis, which identifies the first time a player reaches his "early peak," as defined above. The way the above chart is set up, it is almost preordained to exhibit a negative slope. The older the player is when he reaches the first peak, the fewer years left in his playing career to try to emulate or surpass that feat.

This last point is nicely illustrated in the next chart of the article:


 This chart is excellent on many levels. It's not clear, though, whether it says anything other than aging.


Near the end of the post, the author rightfully pointed out that "there’s not really enough data to demonstrate this effect". Going back to the first chart, it appears that no single bubble contains a double-digit count of players. So every sample size is between one and, say, seven. We should be wary of conclusions based on so little data.

It's always fun to find examples of the Law of Small Numbers, courtesy of Kahneman & Tversky.


Here is a sketch of how I might re-make the first chart (I made up data; see the note below).


While making this chart, I realize another issue with the original bubble chart. When the proportion of players improving on their early peak is zero percent, how many players did not make it is quite hidden. In the revised chart, this data is clearly seen (look at age 22).

Note: I wonder if I totally missed the point of the original chart.... I actually had trouble eyeballing the data so I ended up making up numbers. The bubble at age 22 looks like it should stand for 5 players and yet it sits at precisely 50%, which would map to 2.5 players. If I assume the 22 bubble to be 4 players, then I don't know what the 26 bubble is. If it is 4 players also, then the minimum non-zero proportion should have been 1/4, but the bubble clearly lies below 25%. If it is 3 players, the minimum non-zero proportion is 1/3, which should be at 33%.