Organizing time-stamped data
Feb 06, 2025
In a previous post, I looked at the Economist chart about Elon Musk's tweeting compulsion. It's chart that contains lots of data, every tweet included, but one can't tell the number or frequency of tweets.
In today's post, I'll walk through a couple of sketches of other charts. I was able to find a dataset on Github that does not cover the same period of time but it's good enough for illustration purposes.
As discussed previously, I took cues from the Economist chart, in particular that the hours of the day should be divided up into four equal-width periods. One thing Musk is known for is tweeting at any hour of the day.
This is a small-multiples arrangement of column charts. Each column chart represents the tweets that were posted during a six-hour window, across all days in the dataset. A column covers half a year of tweets. We note that there were more tweets in the afternoon hours as he started tweeting more. In the first half of 2022, he sent roughly 750 tweets between 7 pm and midnight.
***
In this next sketch, I used a small-multiples of line charts. Each line chart represents tweets posted during a six-hour window, as before. Instead of counting how many tweets, here I "smoothed" the daily tweet count, so that each number is an average daily tweet count, with the average computed based on a rolling time window.
***
Finally, let's cover a few details only people who make charts would care about. The time of day variable only makes sense if all times are expressed as "local time", i.e. the time at the location where Musk was tweeting from. This knowledge is not necessary to make a chart but it is essential to make the chart interpretable. A statement like Musk tweets a lot around midnight assumes that it was midnight where he was when he sent each tweet.
Since we don't have his travel schedule, we will definitely be wrong. In my charts, I assumed he is in the Pacific time zone, and never tweeted anywhere outside that time zone.
(Food for thought: the server that posts tweets certainly had the record of the time and time zone for each tweet. Typically, databases store these time stamps standardized to one time zone - call it Greenwich Mean Time. If you have all time stamps expressed in GMT, is it now possible to make a statement about midnight tweeting? Does standardizing to one time zone solve this problem?)
In addition, I suspect that there may be problems with the function used to compute those rolling sums and averages, so take the actual numbers on those sketches with a grain of salt. Specifically, it's hard to tell on any of these charts but Musk did not tweet every single day so there are lots of holes in the time series.