## Getting to first before going to second

##### Dec 28, 2021

Happy holidays to all my readers! A special shutout to those who've been around for over 15 years.

***

The following enhanced data table appeared in Significance magazine (August 2021) under an article titled "Winning an election, not a popularity contest" (link, paywalled)

It's surprising hard to read and there are many reasons contributing to this.

First is the antiquated style guide of academic journals, in which they turn legends into text, and insert the text into a caption. This is one of the worst journalistic practices that continue to be followed.

The table shows 50 states plus District of Columbia. The authors are interested in the extreme case in which a hypothetical U.S. presidential candidate wins the electoral college with the lowest possible popular vote margin. If you've been following U.S. presidential politics, you'd know that the electoral college effectively deflates the value of big-city votes so that the electoral vote margin can be a lot larger than the popular vote margin.

The two sub-tables show two different scenarios: Scenario A is a configuration computed by NPR in one of their reports. Scenario B is a configuration created by the authors (Leinwand, et. al.).

The table cells are given one of four colors: green = needed in the winning configuration; white = not needed; yellow = state needed in Scenario B but not in Scenario A; grey = state needed in Scenario A but not in Scenario B.

***

The second problem is that the above description of the color legend is not quite correct. Green, it turns out, is only correctly explained for Scenario A. Green for Scenario B encodes those states that are needed for the candidate to win the electoral college in Scenario B minus those states that are needed in Scenario B but not in Scenario A (shown in yellow). There is a similar problem with interpreting the white color in the table for Scenario B.

To fix this problem, start with the Q corner of the Trifecta Checkup.

The designer wants to convey an interlocking pair of insights: the winning configuration of states for each of the two scenarios; and the difference between those two configurations.

The problem with the current design is that it elevates the second insight over the first. However, the second insight is a derivative of the first so it's hard to get to the second spot without reaching the first.

The following revision addresses this problem:

[12/30/2021: Replaced chart and corrected the blue arrow for NJ.]

## Graphing highly structured data

##### Dec 08, 2021

The following sankey diagram appeared in my Linkedin feed the other day, and I agree with the poster that this is an excellent example.

It's an unusual use of a flow chart to show the P&L (profit and loss) statement of a business. It makes sense since these are flows of money. The graph explains how Spotify makes money - or how little profit it claims to have earned on over 2.5 billion of revenues.

What makes this chart work so well?

The first thing to notice is how they handled negative flows (costs). They turned the negative numbers into positive numbers, and encoded the signs of the numbers as colors. This doesn't come as naturally as one might think. The raw data are financial tables with revenues shown as positive numbers and costs shown as negative numbers, perhaps in parentheses. Like this:

Now, some readers are sure to have an issue with using the red-green color scheme. I suppose gray-red can be a substitute.

The second smart decision is to pare down the details. There are only four cost categories shown in the entire chart. The cost of revenue represents more than two-thirds of all revenues, and we know nothing about sub-categories of this cost.

The third feature is where the Spotify logo is placed. This directs our attention to the middle of the diagram. This is important because typically on a sankey diagram you read from left to right. Here, the starting point is really the column labeled "total Spotify revenue". The first column just splits the total revenue between subscription revenue and advertising revenue.

Putting the labels of the last column inside the flows improves readability as well.

On the whole, a job well done.

***

Sankey diagrams have limitations. The charts need to be simple enough to work their magic.

It's difficult to add a time element to the above chart, for example. The next question a business analyst might want to ask is how the revenue/cost/profit structure at Spotify have changed over time.

Another question a business analyst might ask is the revenue/cost/profit structure of premium vs ad-supported users. We have a third of the answer - the revenue split. Depending on relative usage, and content preference, the mix of royalties is likely not to replicate the revenue split.

Yet another business analyst might be interested in comparing Spotify's business model to a competitor. It's also not simple to handle this on a sankey diagram.

***

I searched for alternative charts, and when you look at what's out there, you appreciate the sankey version more.

Here is a waterfall chart, which is quite popular:

Here is a stacked column chart, rooted at zero:

Of course, someone has to make a pie chart - in this case, two pie charts:

## How does the U.K. vote in the U.N.?

##### Dec 01, 2021

Through my twitter feed, I found my way to this chart, made by jamie_bio.

This is produced using R code even though it looks like a slide.

The underlying dataset concerns votes at the United Nations on various topics. Someone has already classified these topics. Jamie looked at voting blocs, specifically, countries whose votes agree most often or least often with the U.K.

If you look at his Github, this is one in a series of works he produced to hone his dataviz skills. Ultimately, I think this effort can benefit from some re-thinking. However, I also appreciate the work he has put into this.

Given the dataset, I imagine the first visual one might come up with is a heatmap that shows countries in rows and topics in columns. That would work ok, as any standard chart form would but it would be a data dump that doesn't tell a story. There are almost 200 countries in the entire dataset. The countries can only be ordered in one way so if it's ordered for All Votes, it's not ordered for any of the other columns.

What Jamie attempts here is story-telling. The design leads the reader through a narrative. We start by reading the how-to-read-this box on the top left. This tells us that he's using a lunar eclipse metaphor. A full circle in blue indicates 0% agreement while a full circle in white indicates 100% agreement. The five circles signal that he's binning the agreement percentages into five discrete buckets, which helps simplify our understanding of the data.

Then, our eyes go to the circle of circles, labelled "All votes". This is roughly split in half, with the left side showing mostly blue and the right showing mostly white. That's because he's extracting the top 5 and bottom 5 countries, measured by their vote alignment with the U.K. The countries names are clearly labelled.

Next, we see the votes broken up by topics. I'm assuming not all topics are covered but six key topics are highlighted on the right half of the page.

What I appreciate about this effort is the thought process behind how to deliver a message to the audience. Selecting a specific subset that addresses a specific question. Thinning the materials in a way that doesn't throw the kitchen sink at the reader. Concocting the circular layout that presents a pleasing way of consuming the data.

***

Now, let me talk about the things that need more work.

I'm not convinced that he got his message across. What is the visual telling us? Half of the cricle are aligned with the U.K. while half aren't so the U.K. sits on the fence on every issue? But this isn't the message. It's a bit of a mirage because the designer picked out the top 5 and bottom 5 countries. The top 5 are surely going to be voting almost 100% with the U.K. while the bottom 5 are surely going to be disagreeing with the U.K. a lot.

I did a quick sketch to understand the whole distribution:

This is not intended as a show-and-tell graphic, just a useful way of exploring the dataset. You can see that Arms Race/Disarmament and Economic Development are "average" issues that have the same form as the "All issues" line. There are a small number of countries that are extremely aligned with the UK, and then about 50 countries that are aligned over 50% of the time, then the other 150 countries are within the 30 to 50% aligned. On human rights, there is less alignment. On Palestine, there is more alignment.

What the above chart shows is that the top 5 and bottom 5 countries both represent thin slithers of this distribution, which is why in the circular diagrams, there is little differentiation. The two subgroups are very far apart but within each subgroup, there is almost no variation.

Another issue is the lunar eclipse metaphor. It's hard to wrap my head around a full white circle indicating 100% agreement while a full blue circle shows 0% agreement.

In the diagrams for individual topics, the two-letter acronyms for countries are used instead of the country names. A decoder needs to be provided, or just print the full names.