Who trades with Sweden

It's great that the UN is publishing dataviz but it can do better than this effort:


Certain problems are obvious. The country names turned sideways. The meaningless use of color. The inexplicable sequencing of the country/region.

Some problems are subtler. "Area, nes" - upon research - is a custom term used by UN Trade Statistics, meaning "not elsewhere specified".

The gridlines are debatable. Their function is to help readers figure out the data values if they care. The design omitted the top and bottom gridlines, which makes it hard to judge the values for USA (dark blue), Netherlands (orange), and Germany (gray).

See here, where I added the top gridline.


Now, we can see this value is around 3.6, just over the halfway point between gridlines.


A central feature of trading statistics is "balance". The following chart makes it clear that the positive numbers outweigh the negative numbers in the above chart.


At the time I made the chart, I wasn't sure how to interpret the gap of 1.3%. Looking at the chart again, I think it's saying Sweden has a trade surplus equal to that amount.

A German obstacle course

Tagesschau_originalA twitter user sent me this chart from Germany.

It came with a translation:

"Explanation: The chart says how many car drivers plan to purchase a new state-sponsored ticket for public transport. And of those who do, how many plan to use their car less often."

Because visual language should be universal, we shouldn't be deterred by not knowing German.

The structure of the data can be readily understood: we expect three values that add up to 100% from the pie chart. The largest category accounts for 58% of the data, followed by the blue category (40%). The last and smallest category therefore has 2% of the data.

The blue category is of the most interest, and the designer breaks that up into four sub-groups, three of which are roughly similarly popular.

The puzzle is the identities of these categories.

The sub-categories are directly labeled so these are easy for German speakers. From a handy online translator, these labels mean "definitely", "probably", "rather not", "definitely not". Well, that's not too helpful when we don't know what the survey question is.

According to our correspondent, the question should be "of those who plan to buy the new ticket, how many plan to use their car less often?"

I suppose the question is found above the column chart under the car icon. The translator dutifully outputs "Thus rarer (i.e. less) car use". There is no visual cue to let readers know we are supposed to read the right hand side as a single column. In fact, for this reader, I was reading horizontally from top to bottom.

Now, the two icons on the left and the middle of the top row should map to not buying and buying the ticket. The check mark and cross convey that message. But... what do these icons map to on the chart below? We get no clue.

In fact, the will-buy ticket group is the 40% blue category while the will-not group is the 58% light gray category.

What about the dark gray thin sector? Well, one needs to read the fine print. The footnote says "I don't know/ no response".

Since this group is small and uninformative, it's fine to push it into the footnote. However, the choice of a dark color, and placing it at the 12-o'clock angle of the pie chart run counter to de-emphasizing this category!

Another twitter user visually depicts the journey we take to understand this chart:


The structure of the data is revealed better with something like this:


The chart doesn't need this many colors but why not? It's summer.





Variance is a friend of dataviz

Seven years ago, I wrote a post about "invariance" in data visualization, which is something we should avoid (link). Yesterday, Business Insider published the following chart in an article about rising gas prices (link):


The map shows the average prices at the pump in seven regions of the United States. 

This chart is succeeded by the following map:


This second map shows the change in average gas prices in the same seven regions.

This design is invariant to the data! While the data change, the visualization looks identical. That's because the data are not encoded to any visual element - they are just printed as labels.


What does Elon Musk do every day?

The Wall Street Journal published a fun little piece about tweets by Elon Musk (link).

Here is an overview of every tweet he sent since he started using Twitter more than a decade ago.

Apparently, he sent at least one tweet almost every day for the last four years. In addition, his tweets appear at all hours of the day. (Presumably, he is not the only one tweeting from his account.)

He doesn't just spend time writing tweets; he also reads other people's tweets. WSJ finds that up to 80% of his tweets include mentions of other users.



One problem with "big data" analytics is that they often don't answer interesting questions. Twitter is already one of the companies that put more of their data out there, but still, analysts are missing some of the most important variables.

We know that Musk has 93 million followers. We already know from recent news that a large proportion of such users may be spam/fake. It is frequently assumed in twitter analysis that any tweet he makes reaches 93 million accounts. That's actually far from correct. Twitter uses algorithms to decide what posts show up in each user's feed so we have no idea how many of the 93 million accounts are in fact exposed to any of Musk's tweets.

Further, not every user reads everything on their Twitter feed. I don't even check it every day. Because Twitter operates as a 'firehose" with ever-changing content as users send out short messages at all hours, what one sees depends on when one reads. If Musk tweets in the morning, the users who log on in the afternoon won't see it.

Let's say an analyst wants to learn how impactful Musk's tweets are. That's pretty difficult when one can't figure out which of the 93 million followers were shown these tweets, and who read them. The typical data used to measure response are retweets and likes. Those are convenient metrics because they are available. They are very limited in what they measure. There are lots of users who don't like or retweet at all.


The available data do make for some fun charts. This one gave me a big smile:


Between writing tweets, reading tweets, and ROTFL, every hour of almost every day, Musk finds time to run his several companies. That's impressive.