Visual design is hard, brought to you by NYC subway

This poster showed up in a NY subway train recently.


Visual design is hard!

What is the message? The intention is, of course, to say Rootine is better than others. (That's the Q corner, if you're following the Trifecta Checkup.)

What is the visual telling us (V corner)? It says Rootine is yellow while Others are purple. What do these color mean? There is no legend to help decipher it. And yellow-purple doesn't have a canonical interpretation (unlike say, red-green). In theory, purple can be better than yellow.

The other mystery is the black dot on the fifth item. (This is the NYC subway so the poster could have been vandalized.) It could mean "diet + lifestyle analyzed" is a unique feature of Rootine, not available on any other platform. That implies purple to mean available but not as effective, which significantly lessnes the impact of the chart.


Finally, let's imagine the data that may exist to support this chart.

The aggregation of all competitors to "Others" imposes a major challenge. If yellow means yes, and purple means no, we'd expect few if any purple dots because across all competitors, there is a good chance that at least one of them has a particular feature.

Next, I'm dubious about the claim of "precision dosed, unique to you". I'm imagining they are selling some kind of medicine or health food, which can be "dosed". Predictive modelers like to market their models as "personalized," unique to each person but such a thing is impractical. Before you start using their products, they have no data on you, or your response to those products. How could the recommendation be "precision dosed, unique to you"?

Even if you've used the product for a while, it will be tough to achieve a good level of optimality with so little data. In fact, given that your past data are used to generate actions intended to improve your health - that is to say, to cause the future data to diverge from the past data, how do you know that any change you observe next period is caused by the actions you took? The pre-post difference is both affected by temporal shifts and the actions you've taken. If the next period's metric improves, you may want to believe that the actions worked. If the next period's metric declines, are you willing to conclude that the actions you took backfired?

"Formulas improve with you". This makes me more worried than relieved.


Problems like these can be solved by showing our work to others. Sometimes, we're too immersed in our own world we don't see we have left off key information.



How does the U.K. vote in the U.N.?

Through my twitter feed, I found my way to this chart, made by jamie_bio.


This is produced using R code even though it looks like a slide.

The underlying dataset concerns votes at the United Nations on various topics. Someone has already classified these topics. Jamie looked at voting blocs, specifically, countries whose votes agree most often or least often with the U.K.

If you look at his Github, this is one in a series of works he produced to hone his dataviz skills. Ultimately, I think this effort can benefit from some re-thinking. However, I also appreciate the work he has put into this.

Let's start with the things I enjoyed.

Given the dataset, I imagine the first visual one might come up with is a heatmap that shows countries in rows and topics in columns. That would work ok, as any standard chart form would but it would be a data dump that doesn't tell a story. There are almost 200 countries in the entire dataset. The countries can only be ordered in one way so if it's ordered for All Votes, it's not ordered for any of the other columns.

What Jamie attempts here is story-telling. The design leads the reader through a narrative. We start by reading the how-to-read-this box on the top left. This tells us that he's using a lunar eclipse metaphor. A full circle in blue indicates 0% agreement while a full circle in white indicates 100% agreement. The five circles signal that he's binning the agreement percentages into five discrete buckets, which helps simplify our understanding of the data.

Then, our eyes go to the circle of circles, labelled "All votes". This is roughly split in half, with the left side showing mostly blue and the right showing mostly white. That's because he's extracting the top 5 and bottom 5 countries, measured by their vote alignment with the U.K. The countries names are clearly labelled.

Next, we see the votes broken up by topics. I'm assuming not all topics are covered but six key topics are highlighted on the right half of the page.

What I appreciate about this effort is the thought process behind how to deliver a message to the audience. Selecting a specific subset that addresses a specific question. Thinning the materials in a way that doesn't throw the kitchen sink at the reader. Concocting the circular layout that presents a pleasing way of consuming the data.


Now, let me talk about the things that need more work.

I'm not convinced that he got his message across. What is the visual telling us? Half of the cricle are aligned with the U.K. while half aren't so the U.K. sits on the fence on every issue? But this isn't the message. It's a bit of a mirage because the designer picked out the top 5 and bottom 5 countries. The top 5 are surely going to be voting almost 100% with the U.K. while the bottom 5 are surely going to be disagreeing with the U.K. a lot.

I did a quick sketch to understand the whole distribution:


This is not intended as a show-and-tell graphic, just a useful way of exploring the dataset. You can see that Arms Race/Disarmament and Economic Development are "average" issues that have the same form as the "All issues" line. There are a small number of countries that are extremely aligned with the UK, and then about 50 countries that are aligned over 50% of the time, then the other 150 countries are within the 30 to 50% aligned. On human rights, there is less alignment. On Palestine, there is more alignment.

What the above chart shows is that the top 5 and bottom 5 countries both represent thin slithers of this distribution, which is why in the circular diagrams, there is little differentiation. The two subgroups are very far apart but within each subgroup, there is almost no variation.

Another issue is the lunar eclipse metaphor. It's hard to wrap my head around a full white circle indicating 100% agreement while a full blue circle shows 0% agreement.

In the diagrams for individual topics, the two-letter acronyms for countries are used instead of the country names. A decoder needs to be provided, or just print the full names.







Simple charts are the hardest to do right

The CDC website has a variety of data graphics about many topics, one of which is U.S. vaccinations. I was looking for information about Covid-19 data broken down by age groups, and that's when I landed on these charts (link).


The left panel shows people with at least one dose, and the right panel shows those who are "fully vaccinated." This simple chart takes an unreasonable amount of time to comprehend.


The analyst introduces three metrics, all of which are described as "percentages". Upon reflection, they are proportions of the people in specific age ranges.

Readers are thus invited to compare these proportions. It's not clear, however, which comparisons are intended. The first item listed in the legend states "Percent among Persons who completed all recommended doses in last 14 days". For most readers, including me, this introduces an unexpected concept. The 14 days here do not refer to the (in)famous 14-day case-counting window but literally the most recent two weeks relative to when the chart was produced.

It would have been clearer if the concept of Proportions were introduced in the chart title or axis title, while the color legend explains the concept of the base population. From the lighter shade to the darker shade (of red and blue) to the gray color, the base population shifts from "Among Those Who Completed/Initiated Vaccinations Within Last 14 Days" to "Among Those Who Completed/Initiated Vaccinations Any Time" to "Among the U.S. Population (regardless of vaccination status)".

Also, a reverse order helps our comprehension. Each subsequent category is a subset of the one above. First, the whole population, then those who are fully vaccinated, and finally those who recently completed vaccinations.

The next hurdle concerns the Q corner of our Trifecta Checkup. The design leaves few hints as to what question(s) its creator intended to address. The age distribution of the U.S. population is useless unless it is compared to something.

One apparently informative comparison is the age distribution of those fully vaccinated versus the age distribution of all Americans. This is revealed by comparing the lengths of the dark blue bar and the gray bar. But is this comparison informative? It's telling me that people aged 50 to 64 account for ~25% of those who are fully vaccinated, and ~20% of all Americans. Because proportions necessarily add to 100%, this implies that other age groups have been less vaccinated. Duh! Isn't that the result of an age-based vaccination prioritization? During the first week of the vaccination campaign, one might expect close to 100% of all vaccinations to be in the highest age group while it was 0% for the other age groups.

This is a chart in search of a question. The 25% vs 20% comparison does not assist readers in making a judgement. Does this mean the vaccination campaign is working as expected, worse than expected or better than expected? The problem is the wrong baseline. The designer of this chart implies that the expected proportions should conform to the overall age distribution - but that clearly stands in the way of CDC's initial prioritization of higher-risk age groups.


In my version of the chart, I illustrate the proportion of people in each age group who have been fully vaccinated.


Among those fully vaccinated, some did it within the most recent two weeks:



Elsewhere on the CDC site, one learns that on these charts, "fully vaccinated" means one shot of J&J or 2 shots of Pfizer or Moderna, without dealing with the 14-day window or other complications. Why do we think different definitions are used in different analyses? Story-first thinking, as I have explained here. When it comes to telling the story about vaccinations, the story is about the number of shots in arms. They want as big a number as possible, and abandon any criterion that decreases the count. When it comes to reporting on vaccine effectiveness, they want as small a number of cases as possible.






What metaphors give, they take away

Aleks pointed me to the following graphic making the rounds on Twitter:


It's being passed around as an example of great dataviz.

The entire attraction rests on a risque metaphor. The designer is illustrating a claim that Covid-19 causes erectile dysfunction in men.

That's a well-formed question so in using the Trifecta Checkup, that's a pass on the Q corner.

What about the visual metaphor? I advise people to think twice before using metaphors because these devices can give as they can take. This example is no exception. Some readers may pay attention to the orientation but other readers may focus on the size.

I pulled out the tape measure. Here's what I found.


The angle is accurate on the first chart but the diameter has been exaggerated relative to the other. The angle is slightly magnified in the bottom chart which has a smaller circumference.


Let's look at the Data to round out our analysis. They come from a study from Italy (link), utilizing survey responses. There were 25 male respondents in the survey who self-reported having had Covid-19. Seven of these submitted answers to a set of five questions that were "suggestive of erectile dysfunction". (This isn't as arbitrary as it sounds - apparently it is an internationally accepted way of conducting reseach.) Seven out of 25 is 28 percent. Because the sample size is small, the 95% confidence range is 10% to 46%.

The researchers then used the propensity scoring method to find 3 matches per each infected person. Each match is a survey respondent who did not self-report having had Covid-19. See this post about a real-world vaccine study to learn more about propensity scoring. Among the 75 non-infected men, 7 were judged to have ED. The 95% range is 3% to 16%.

The difference between the two subgroups is quite large. The paper also includes other research that investigates the mechanisms that can explain the observed correlation. Nevertheless, the two proportions depicted in the chart have wide error bars around them.

I have always had a question about analysis using this type of survey data (including my own work). How do they know that ED follows infection rather than precedes it? One of the inviolable rules of causation is that the effect follows the cause. If it's a series of surveys, the sequencing may be measurable but a single survey presents challenges. 

The headline of the dataviz is "Get your vaccines". This comes from a "story time" moment in the paper. On page 1, under Discussion and conclusion, they inserted the sentence "Universal vaccination against COVID-19 and the personal protective equipment could possibly have the added benefit of preventing sexual dysfunctions." Nothing in the research actually supports this claim. The only time the word "vaccine" appears in the entire paper is on that first page.

"Story time" is the moment in a scientific paper when the researchers - after lulling readers to sleep over some interesting data - roll out statements that are not supported by the data presented before.


The graph succeeds in catching people's attention. The visual metaphor works in one sense but not in a different sense.


P.S. [8/6/2021] One final note for those who do care about the science: the internet survey not surprisingly has a youth bias. The median age of 25 infected people was 39, maxing out at 45 while the median of the 75 not infected was 42, maxing out at 49.

One of the most frequently produced maps is also one of the worst

Summer is here, many Americans are putting the pandemic in their rear-view mirrors, and gas prices are soaring. Business Insider told the story using this map:


What do we want to learn about gas prices this summer?

Which region has the highest / lowest prices?

How much higher / lower than the national average are the regional prices?

How much has prices risen, compared to last year, or compared to the last few weeks?


How much work did you have to do to get answers to those questions from the above map?

Unfortunately, this type of map continues to dominate the popular press. It merely delivers a geography lesson and not much else. Its dominant feature tells readers how to classify the 50 states into regions. Its color encodes no data.

Not surprisingly, this map fails the self-sufficiency test (link). The entire dataset is printed on the map, and if those numbers were removed, we would be left with a map of the regions of the U.S. The graphical elements of the chart are not doing much work.


In the following chart, I used the map as a color legend. Also, an additional plot shows each region's price level against the national average.


One can certainly ditch the map altogether, which makes having seven colors unnecessary. To address other questions, just stack on other charts, for example, showing the price increase versus last year.


_trifectacheckup_imageFrom a Trifecta Checkup perspective, we find that the trouble starts with the Q corner. There are several important questions not addressed by the graphic. In the D corner, no context is provided to interpret the data. Are these prices abnormal? How do they compare to the national average or to a year ago? In the V corner, the chart takes too much effort to comprehend a basic fact, such as which region has the highest average price.

For more on the Trifecta Checkup, see this guide.


Start at zero improves this chart but only slightly

The following chart was forwarded to me recently:


It's a good illustration of why the "start at zero" rule exists for column charts. The poor Indian lady looks extremely short in this women's club. Is the average Indian woman really half as tall as the average South African woman? (Surely not!)

Junkcharts_redo_womenheight_columnThe problem is only superficially fixed by starting the vertical axis at zero. Doing so highlights the fact that the difference in average heights is but a fraction of the average heights themselves. The intra-country differences are squashed in such a representation - which works against the primary goal of the data visualization itself.

Recall the Trifecta Checkup. At the top of the trifecta is the Question. The designer obviously wants to focus our attention on the difference of the averages. A column chart showing average heights fails the job!

This "proper" column chart sends the message that the difference in average heights is noise, unworthy of our attention. But this is a bad take of the underlying data. The range of average heights across countries isn't that wide, by virtue of large population sizes.

According to Wikipedia, they range from 4 feet 10.5 to 5 feet 6 (I'm ignoring several entries in the table based on non representative small samples.) How do we know that the difference of 2 inches between averages of South Africa and India is actually a sizable difference? The Wikipedia table has the average heights for most of the world's countries. There are perhaps 200 values. These values are sprinkled inside the range of about 8 inches top to bottom. If we divide the full range into 10 equal bins, that's roughly 0.8 inches per bin. So if we have two numbers that are 2 inches apart, they almost span 2 bins. If the data were evenly distributed, that's a huge shift.

(In reality, the data should be normally distributed, bell-shaped, with much more at the center than on the edges. That makes a difference of 2 inches even more significant if these are normal values near the center but less significant if these are extreme values on the tails. Stats students should be able to articulate why we are sure the data are normally distributed without having to plot the data.)


The original chart has further problems.

Another source of distortion comes from the scaling of the stick figures. The aspect ratio is being preserved, which means the area is being scaled. Given that the heights are scaled as per the data, the data are encoded twice, the second time in the widths. This means that the sizes of these figures grow at the rate of the square of the heights. (Contrast this with the scaling discussed in my earlier post this week which preserves the relative areas.)

At the end of that last post, I discuss why adding colors to a chart when the colors do not encode any data is a distraction to the reader. And this average height chart is an example.

From the Data corner of the Trifecta Checkup, I'm intrigued by the choice of countries. Why is Scotland highlighted instead of the U.K.? Why Latvia? According to Wikipedia, the Latvia estimate is based on a 1% sample of only 19 year olds.

Some of the data appear to be incorrect (or the designer used a different data source). Wikipedia lists the average height of Latvian women as 5 ft 6.5 while the chart shows 5 ft 5 in. Peru's average height of females is listed as 4 ft 11.5 and of males as 5 ft 4.5. The chart shows 5 ft 4 in.


Lest we think only amateurs make this type of chart, here is an example of a similar chart in a scientific research journal:


(link to original)

I have seen many versions of the above column charts with error bars, and the vertical axes not starting at zero. In every case, the heights (and areas) of these columns do not scale with the underlying data.


I tried a variant of the stem-and-leaf plot:


The scale is chosen to reflect the full range of average heights given in Wikipedia. The chart works better with more countries to fill out the distribution. It shows India is on the short end of the scale but not quite the lowest. (As mentioned above, Peru actually should be placed close to the lower edge.)


Finding the hidden information behind nice-looking charts

This chart from Business Insider caught my attention recently. (link)


There are various things they did which I like. The use of color to draw a distinction between the top 3 lines and the line at the bottom - which tells the story that the bottom 50% has been left far behind. Lines being labelled directly is another nice touch. I usually like legends that sit atop the chart; in this case, I'd have just written the income groups into the line labels.

Take a closer look at the legend text, and you'd notice they struggled with describing the income percentiles.


This is a common problem with this type of data. The top and bottom categories are easy, as it's most natural to say "top x%" and "bottom y%". By doing so, we establish two scales, one running from the top, and the other counting from the bottom - and it's a head scratcher which scale to use for the middle categories.

The designer decided to lose the "top" and "bottom" descriptors, and went with "50-90%" and "90-99%". Effectively, these follow the "bottom" scale. "50-90%" is the bottom 50 to 90 percent, which corresponds to the top 10 to 50 percent. "90-99%" is the bottom 90-99%, which corresponds to the top 1 to 10%. On this chart, since we're lumping the top three income groups, I'd go with "top 1-10%" and "top 10-50%".


The Business Insider chart is easy to mis-read. It appears that the second group from the top is the most well-off, and the wealth of the top group is almost 20 times that of the bottom group. Both of those statements are false. What's confusing us is that each line represents very different numbers of people. The yellow line is 50% of the population while the "top 1%" line is 1% of the population. To see what's really going on, I look at a chart showing per-capita wealth. (Just divide the data of the yellow line by 50, etc.)


For this chart, I switched to a relative scale, using the per-capita wealth of the Bottom 50% as the reference level (100). Also, I applied a 4-period moving average to smooth the line. The data actually show that the top 1% holds much more wealth per capita than all other income segments. Around 2011, the gap between the top 1% and the rest was at its widest - the average person in the top 1% is about 3,000 times wealthier than someone in the bottom 50%.

This chart raises another question. What caused the sharp rise in the late 2000s and the subsequent decline? By 2020, the gap between the top and bottom groups is still double the size of the gap from 20 years ago. We'd need additional analyses and charts to answer this question.


If you are familiar with our Trifecta Checkup, the Business Insider chart is a Type D chart. The problem with it is in how the data was analyzed.

Same data + same chart form = same story. Maybe.

We love charts that tell stories.

Some people believe that if they situate the data in the right chart form, the stories reveal themselves.

Some people believe for a given dataset, there exists a best chart form that brings out the story.

An implication of these beliefs is that the story is immutable, given the dataset and the chart form.

If you use the Trifecta Checkup, you already know I don't subscribe to those ideas. That's why the Trifecta has three legs, the third is the question - which is related to the message or the story.


I came across the following chart by Statista, illustrating the growth in Covid-19 cases from the start of the pandemic to this month. The underlying data are collected by WHO and cover the entire globe. The data are grouped by regions.


The story of this chart appears to be that the world moves in lock step, with each region behaving more or less the same.

If you visit the WHO site, they show a similar chart:


On this chart, the regions at the bottom of the graph (esp. Southeast Asia in purple) clearly do not follow the same time patterns as Americas (orange) or Europe (green).

What we're witnessing is: same data, same chart form, different stories.

This is a feature, not a bug, of the stacked area chart. The story is driven largely by the order in which the pieces are stacked. In the Statista chart, the largest pieces are placed at the bottom while for WHO, the order is exactly reversed.

(There are minor differences which do not affect my argument. The WHO chart omits the "Other" category which accounts for very little. Also, the Statista chart shows the smoothed data using 7-day averaging.)

In this example, the order chosen by WHO preserves the story while the order chosen by Statista wipes it out.


What might be the underlying question of someone who makes this graph? Perhaps it is to identify the relative prevalence of Covid-19 in different regions at different stages of the pandemic.

Emphasis on the word "relative". Instead of plotting absolute number of cases, I consider plotting relative number of cases, that is to say, the proportion of cases in each region at given times.

This leads to a stacked area percentage chart.


In this side-by-side view, you see that this form is not affected by flipping the order of the regions. Both charts say the same thing: that there were two waves in Europe and the Americas that dwarfed all other regions.



Circular areas offer misleading cues of their underlying data

John M. pointed me on Twitter to this chart about the progress of U.S.'s vaccination campaign:


This looks like a White House production, retweeted by WHO. John is unhappy about this nested bubble format, which I'll come back to later.

Let's zoom in on what matters:


An even bigger problem with this chart is the Q corner in our Trifecta Checkup. What is the question they are trying to address? It would appear to be the proportion of population that has "already received [one or more doses of] vaccine". And the big words tell us the answer is 8 percent.

_junkcharts_trifectacheckupBut is that really the question? Check out the dark blue circle. It is labeled "population that has already received vaccine" and thus we infer this bubble represents 8 percent. Now look at the outer bubble. Its annotation is "new population that received vaccine since January 27, 2021". The only interpretation that makes sense is that 8 percent  is not the most current number. If that is the case, why would the headline highlight an older statistic, and not the most up-to-date one?

Perhaps the real question is how fast is the progress in vaccination. Perhaps it took weeks to get to the dark circle and then days to get beyond. In order to improve this data visualization, we must first decide what the question really is.


Now let's get to those nested bubbles. The bubble chart is a format that is not "sufficient," by which I mean the visual by itself does not convey the data without the help of aids such as labels. Try to answer the following questions:


In my view, if your answer to the last question is anything more than 5 seconds, the dataviz has failed. A successful data visualization should not make readers solve puzzles.

The first two questions depict the confusing nature of concentric circle diagrams. The first data point is coded to the inner circle. Where is the second data point? Is it encoded to the outer circle, or just the outer ring?

In either case, human brains are not trained to compare circular areas. For question 1, the outer circle is 70% larger than the smaller circle. For question 2, the ring is 70% of the area of the dark blue circle. If you're thinking those numbers seem unreasonable, I can tell you that was my first reaction too! So I made the following to convince myself that the calculation was correct:


Circular areas offer misleading visual cues, and should be used sparingly.

[P.S. 2/10/2021. In the next post, I sketch out an alternative dataviz for this dataset.]

Illustrating differential growth rates

Reader Mirko was concerned about a video published in Germany that shows why the new coronavirus variant is dangerous. He helpfully provided a summary of the transcript:

The South African and the British mutations of the SARS-COV-2 virus are spreading faster than the original virus. On average, one infected person infects more people than before. Researchers believe the new variant is 50 to 70 % more transmissible.

Here are two key moments in the video:


This seems to be saying the original virus (left side) replicates 3 times inside the infected person while the new variant (right side) replicates 19 times. So we have a roughly 6-fold jump in viral replication.


Later in the video, it appears that every replicate of the old virus finds a new victim while the 19 replicates of the new variant land on 13 new people, meaning 6 replicates didn't find a host.

As Mirko pointed out, the visual appears to have run away from the data. (In our Trifecta Checkup, we have a problem with the arrow between the D and the V corners. What the visual is saying is not aligned with what the data are saying.)


It turns out that the scientists have been very confusing when talking about the infectiousness of this new variant. The most quoted line is that the British variant is "50 to 70 percent more transmissible". At first, I thought this is a comment on the famous "R number". Since the R number around December was roughly 1 in the U.K, the new variant might bring the R number up to 1.7.

However, that is not the case. From this article, it appears that being 5o to 70 percent more transmissible means R goes up from 1 to 1.4. R is interpreted as the average number of people infected by one infected person.

Mirko wonders if there is a better way to illustrate this. I'm sure there are many better ways. Here's one I whipped up:


The left side is for the 40% higher R number. Both sides start at the center with 10 infected people. At each time step, if R=1 (right side), each of the 10 people infects 10 others, so the total infections increase by 10 per time step. It's immediately obvious that a 40% higher R is very serious indeed. Starting with 10 infected people, in 10 steps, the total number of infections is almost 1,000, almost 10 times higher than when R is 1.

The lines of the graphs simulate the transmission chains. These are "average" transmission chains since R is an average number.


P.S. [1/29/2021: Added the missing link to the article in which it is reported that 50-70 percent more transmissible implies R increasing by 40%.]