What does Elon Musk do every day?

The Wall Street Journal published a fun little piece about tweets by Elon Musk (link).

Here is an overview of every tweet he sent since he started using Twitter more than a decade ago.

Wsj_musk_tweets_alldaylong2
Apparently, he sent at least one tweet almost every day for the last four years. In addition, his tweets appear at all hours of the day. (Presumably, he is not the only one tweeting from his account.)

He doesn't just spend time writing tweets; he also reads other people's tweets. WSJ finds that up to 80% of his tweets include mentions of other users.

Wsj_musk_tweets_mentionsothers7

***

One problem with "big data" analytics is that they often don't answer interesting questions. Twitter is already one of the companies that put more of their data out there, but still, analysts are missing some of the most important variables.

We know that Musk has 93 million followers. We already know from recent news that a large proportion of such users may be spam/fake. It is frequently assumed in twitter analysis that any tweet he makes reaches 93 million accounts. That's actually far from correct. Twitter uses algorithms to decide what posts show up in each user's feed so we have no idea how many of the 93 million accounts are in fact exposed to any of Musk's tweets.

Further, not every user reads everything on their Twitter feed. I don't even check it every day. Because Twitter operates as a 'firehose" with ever-changing content as users send out short messages at all hours, what one sees depends on when one reads. If Musk tweets in the morning, the users who log on in the afternoon won't see it.

Let's say an analyst wants to learn how impactful Musk's tweets are. That's pretty difficult when one can't figure out which of the 93 million followers were shown these tweets, and who read them. The typical data used to measure response are retweets and likes. Those are convenient metrics because they are available. They are very limited in what they measure. There are lots of users who don't like or retweet at all.

***

The available data do make for some fun charts. This one gave me a big smile:

Wsj_musk_tweets_emojis9

Between writing tweets, reading tweets, and ROTFL, every hour of almost every day, Musk finds time to run his several companies. That's impressive.

 


Selecting the right analysis plan is the first step to good dataviz

It's a new term, and my friend Ray Vella shared some student projects from his NYU class on infographics. There's always something to learn from these projects.

The starting point is a chart published in the Economist a few years ago.

Economist_richgetricher

This is a challenging chart to read. To save you the time, the following key points are pertinent:

a) income inequality is measured by the disparity between regional averages

b) the incomes are given in a double index, a relative measure. For each country and year combination, the average national GDP is set to 100. A value of 150 means the richest region of Spain has an average income that is 50% higher than Spain's national average in the year 2015.

The original chart - as well as most of the student work - is based on a specific analysis plan. The difference in the index values between the richest and poorest regions is used as a measure of the degree of income inequality, and the change in the difference in the index values over time, as a measure of change in the degree of income inequality over time. That's as big a mouthful as the bag of words sounds.

This analysis plan can be summarized as:

1) all incomes -> relative indices, at each region-year combination
2) inequality = rich - poor region gap, at each region-year combination
3) inequality over time = inequality in 2015 - inequality in 2000, for each country
4) country difference = inequality in country A - inequality in country B, for each year

***

One student, J. Harrington, looks at the data through an alternative lens that brings clarity to the underlying data. Harrington starts with change in income within the richest regions (then the poorest regions), so that a worsening income inequality should imply that the richest region is growing incomes at a faster clip than the poorest region.

This alternative analysis plan can be summarized as:
1) change in income over time for richest regions for each country
2) change in income over time for poorest regions for each country
3) inequality = change in income over time: rich - poor, for each country

The restructuring of the analysis plan makes a big difference!

Here is one way to show this alternative analysis:

Junkcharts_kfung_sixeurocountries_gdppercapita

The underlying data have not changed but the reader's experience is transformed.


Deficient deficit depiction

A twitter user alerted me to this chart put out by the Biden adminstration trumpeting a reduction in the budget deficit from 2020 to 2021:

Omb_deficitreduction

This column chart embodies a form that is popular in many presentations, including in scientific journals. It's deficient in so many ways it's a marvel how it continues to live.

There are just two numbers: -3132 and -2772. Their difference is $360 billion, which is less than just over 10 percent of the earlier number. It's not clear what any data graphic can add.

Indeed, the chart does not do much. It obscures the actual data. What is the budget deficit in 2020? Readers must look at the axis labels, and judge that it's about a quarter of the way between 3000 and 3500. Five hundred quartered is 125. So it's roughly $3.125 trillion. Similarly, the 2021 number is slightly above the halfway point between 2,500 and 3,000.

These numbers are upside down. Taller columns are bad! Shortening the columns is good. It's all counter intuitive.

Column charts encode data in the heights of the columns. The designer apparently wants readers to believe the deficit has been cut by about a third.

As usual, this deception is achieved by cutting the column chart off at its knees. Removing equal sections of each column destroys the propotionality of the heights.

Why hold back? Here's a version of the chart showing the deficit was cut by half:

Junkcharts_redo_ombbudgetdeficit

The relative percent reduction depends on where the baseline is placed. The only defensible baseline is the zero baseline. That's the only setting under which the relative percent reduction is accurately represented visually.

***

This same problem presents itself subtly in Covid-19 vaccine studies. I explain in this post, which I rate as one of my best Covid-19 posts. Check it out!