Out of line

This simple chart showing life expectancies in 10 countries raises one's eyebrows.

Lifeexpectancy_indiatv

The first curiosity is the deliberate placement of Pakistan behind India and China. Every nation is sorted from lowest to highest, except for Pakistan. Is the reason politics? I have no idea. If you have an explanation, please leave a comment.

***
This graphic is an example of data visualization that does not actually show the data.

The positions of the flags do not in fact encode the data! For example, the Indian flag is closer to the Chinese flag than to the Pakistani flag even though the gap between India and China (7) is more than double the gap between India and Pakistan (3).

Here is what it looks like if the gaps encode the data. With this selection of countries, Pakistan and India are separated from the rest. 

Junkcharts_redo_indiatvlifeexpectancy

In the original chart, the readers must read the data labels to understand it, and resist intepreting the visual elements.

I removed the flag poles because they have the unintended consequence of establishing a zero level (where the cartoon characters stand) but the positions of the flags don't reflect a start-at-zero posture.

***

Returning to our first topic for a second. If the message of the chart is to single out Pakistan, it actually works! If all other countries are sorted by value, with Pakistan inserted out of order, it draws our attention.

In a conventional layout, Pakistan is shoved to the left side in the bottom corner. See below:

Junkcharts_redo_indiatvlifeexpectancy_2

 

 


The line-angle illusion

In a recent presentation, Prof. Matthias Schonlau explains his "hammock plot." I wrote about it here. During the talk, he used the hammock plot to illustrate an optical illusion found in plots requiring users to compare angular lines, known as the line-angle illusion. (Others prefer the name "sine" illusion.)

Here is a simple demonstration of the line-angle illusion, extracted from this paper.

Vanderplas_hofmann_sineillusion

Think of the two sine curves as time series, and we're comparing the differences between them. This requires us to assess trend in the vertical distances between the two lines. Weirdly, we perceive the vertical lines on the above chart to have varying lengths, even though they have equal lengths.

***
The link here contains an example of how the line-angle illusion can lead to misreading of trends on line charts:

Sineillusion_twolines

Is there a bigger difference in revenue at Time 1 than Time 2? Many of us will think so but on careful judgment, I think all of us can agree that the difference at Time 2 is in fact larger.

***

Much of the interest in a hammock plot lies in the links between the vertical blocks, and this is where the line-angle illustion can distort our perception. Studies have shown that humans tend to read not the vertical gaps but the angular gaps. Again, this issue is illustrated in the first mentioned paper:

Vanderplas_hofmann_sineillusion_distances

Matthias explained that their implementation of the hammock plot uses a strategy to counteract this line-angle illusion.

I take this to mean they distort the data in such a way that after readers apply the line-angle illusion, the resulting view would convey correctly the correct trend. A kind of double negative strategy. The paper linked above offers one such counter-illusion strategy.

I imagine this is a bit controversial as we are introducing deliberate distortion to counteract an expected perceptual illusion.

I'm not aware of any software that offers built-in functions that perform this type of illusion-busting adjustments. Do you know any?

 

P.S. [5-30-2025] Andrew Gelman has some comments on this topic on his blog. He said:

But to get closer to what Kaiser is asking: the analogy I’ve given is, suppose you’re building a wooden chair but using boards that are warped. In this case, the right thing to do is to incorporate the warp into the design, i.e. cut some pieces shorter than others and at different angles, etc., so that they fit together as is, rather than trying to go all rectilinear and then glue/nail everything together. The trouble with the latter strategy is that the wood will exert pressure on the joints and eventually the chair will break or distort itself in some way.


Hammock plots

Prof. Matthias Schonlau gave a presentation about "hammock plots" in New York recently.

Here is an example of a hammock plot that shows the progression of different rounds of voting during the 1903 papal conclave. (These are taken at the event and thus a little askew.)

Hammockplot_conclave

The chart shows how Cardinal Sarto beat the early favorite Rampolla during later rounds of voting. The chart traces the movement of votes from one round to the next. The Vatican destroys voting records, and apparently, records were unexpectedly retained for this particular conclave.

The dataset has several features that brings out the strengths of such a plot.

There is a fixed number of votes, and a fixed number of candidates. At each stage, the votes are distributed across the subset of candidates. From stage to stage, the support levels for candidate shift. The chart brings out the evolution of the vote.

From the "marginals", i.e. the stacked columns shown at each time point, we learn the relative strengths of the candidates, as they evolve from vote to vote.

The links between the column blocks display the evolution of support from one vote to the next. We can see which candidate received more votes, as well as where the additional votes came from (or, to whom some voters have drifted).

The data are neatly arranged in successive stages, resulting in discrete time steps.

Because the total number of votes are fixed, the relative sizes of the marginals are nicely constrained.

The chart is made much more readable because of binning. Only the top three candidates are shown individually with all the others combined into a single category. This chart would have been quite a mess if it showed, say, 10 candidates.

How precisely we can show the intra-stage movement depends on how the data records were kept. If we have the votes for each person in each round, then it should be simple to execute the above! If we only have the marginals (the vote distribution by candidate) at each round, then we are forced to make some assumptions about which voters switched their votes. We'd likely have to rule out unlikely scenarios, such as that in which all of the previous voters for candidate X switched to someone other candidates while another set of voters switched their votes to candidate X.

***

Matthias also showed examples of hammock plots applied to different types of datasets.

The following chart displays data from course evaluations. Unlike the conclave example, the variables tied to questions on the survey are neither ordered nor sequential. Therefore, there is no natural sorting available for the vertical axes.

Hammockplot_evals

Time is a highly useful organizing element for this type of charts. Without such an organizing element, the designer manually customizes an order.

The vertical axes correspond to specific questions on the course evaluation. Students are aggregated into groups based on the "profile" of grades given for the whole set of questions. It's quite easy to see that opinions are most aligned on the "workload" question while most of the scores are skewed high.

Missing values are handled by plotting them as a new category at the bottom of each vertical axis.

This example is similar to the conclave example in that each survey response is categorical, one of five values (plus missing). Matthias also showed examples of hammock plots in which some or all of the variables are numeric data.

***

Some of you will see some resemblance of the hammock plot with various similar charts, such as the profile chart, the alluvial chart, the parallel coordinates chart, and Sankey diagrams. Matthias discussed all those as well.

Matthias has a book out called "Applied Statistical Learning" (link).

Also, there is a Python package for the hammock plot on github.


Scrambled egg

Let's take a look at the central message this chart is aiming to convey: "U.S. egg prices hit a 10-year high in 2025 after avian flu killed 30 million egg-laying birds." (The original is found on Visual Capitalist.)

Visualcapitalist_eggs

_trifectacheckup_image

Using the Trifecta Checkup framework (link), we ask how the data are aligned with this question. What do the data say?

The data give the average egg prices in 41 countries, sorted from highest to lowest, and arranged in a clockwise manner starting from the top.

The dataset does not address the question posed by the central message.

  • With no history, it cannot show that U.S. egg prices is at a 10-year high.
  • With no explanatory variables, it cannot say why egg prices have increased in 2025.
  • Without context, it cannot address the avian flu.
  • The U.S. does not even stand out.
  • It also does not show the extreme magnitude of the recent increase in egg price in the U.S.

Because of this mismatch, the graphic fails to deliver the intended message.

Notably, the dataset introduces the country dimension, which is unrelated to the central message, but nevertheless interesting. Yet the question of interest isn't the point-in-time comparison. I'd like to know if egg price inflation is a global trend, or an American exclusive. At some point, the inflation will flatten out, although the price of eggs would probably not return to the pre-inflation level. An international comparison across time would bring this insight out clearly.

***

Before ending, we'll make a quick stop at the Visual corner of the Trifecta Checkup. Since the designer uses an ellipse to represent the egg, the bars sticking out of the ellipse are somewhat distorted. Do the bar lengths encode the data accurately?

I looked at Brazil vs Italy. The price in Italy $3.97 is basically twice that in Brazil $1.99. But the length of BRA bar is 40% that of the ITA bar.

Italy and Belgium, shown side by side, have the same egg price to the second decimal place. The bar lengths are not the same.

This observation suggests that the chart fails my self-sufficiency test. If the entire dataset were not printed on the chart, the reader can't interpret the bars.