Why some dataviz fail

Maxim Lisnic's recent post should delight my readers (link). Thanks Alek for the tip. Maxim argues that charts "deceive" not merely by using visual tricks but by a variety of other non-visual means.

This is also the reasoning behind my Trifecta Checkup framework which looks at a data visualization project holistically. There are lots of charts that are well designed and constructed but fail for other reasons. So I am in agreement with Maxim.

He analyzed "10,000 Twitter posts with data visualizations about COVID-19", and found that 84% are "misleading" while only 11% of the 84% "violate common design guidelines". I presume he created some kind of computer program to evaluate these 10,000 charts, and he compiled some fixed set of guidelines that are regarded as "common" practice.


Let's review Maxim's examples in the context of the Trifecta Checkup.


The first chart shows Covid cases in the U.S. in July and August of 2021 (presumably the time when the chart was published) compared to a year ago (prior to the vaccination campaign).


Maxim calls this cherry-picking. He's right - and this is a pet peeve of mine, even with all the peer-reviewed scientific research. In my paper on problems with observational studies (link), my coauthors and I call for a new way forward: researchers should put their model calculations up on a website which is updated as new data arrive, so that we can be sure that the conclusions they published apply generally to all periods of time, not just the time window chosen for the publication.

Looking at the pair of line charts, readers can quickly discover its purpose, so it does well on the Q(uestion) corner of the Trifecta. The cherry-picking relates to the link between the Question and the Data, showing that this chart suffers from subpar analysis.

In addition, I find that the chart also misleads visually - the two vertical scales are completely different: the scale on the left chart spans about 60,000 cases while on the right, it's double the amount.

Thus, I'd call this a Type DV chart, offering opportunities to improve in two of the three corners.


The second chart cited by Maxim plots a time series of all-cause mortality rates (per 100,000 people) from 1999 to 2020 as columns.

The designer does a good job drawing our attention to one part of the data - that the average increase in all-cause mortality rate in 2020 over the previous five years was 15%. I also like the use of a different color for the pandemic year.

Then, the designer lost the plot. Instead of drawing a conclusion based on the highlighted part of the data, s/he pushed a story that the 2020 rate was about the same as the 2003 rate. If that was the main message, then instead of computing a 15% increase relative to the past five years, s/he should have shown how the 2003 and 2020 levels are the same!

On a closer look, there is a dashed teal line on the chart but the red line and text completely dominate our attention.

This chart is also Type DV. The intention of the designer is clear: the question is to put the jump in all-cause mortality rate in a historical context. The problem lies again with subpar analysis. In fact, if we take the two insights from the data, they both show how serious a problem Covid was at the time.

When the rate returned to the level of 2003, we have effectively gave up all the gains made over 17 years in a few months.

Besides, a jump in 15% from year to year is highly significant if we look at all other year-to-year changes shown on the chart.


The next section concerns a common misuse of charts to suggest causality when the data could only indicate correlation (and where the causal interpretation appears to be dubious). I may write a separate post about this vast topic in the future. Today, I just want to point out that this problem is acute with any Covid-19 research, including official ones.


I find the fourth section of Maxim's post to be less convincing. In the following example, the tweet includes two charts, one showing proportion of people vaccinated, and the other showing the case rate, in Iceland and Nigeria.


This data visualization is poor even on the V(isual) corner. The first chart includes lots of countries that are irrelevant to the comparison. It includes the unnecessary detail of fully versus partially vaccinated, unnecessary because the two countries selected are at two ends of the scale. The color coding is off sync between the two charts.

Maxim's critique is:

The user fails to account, however, for the fact that Iceland had a much higher testing rate—roughly 200 times as high at the time of posting—making it unreasonable to compare the two countries.

And the section is titled "Issues with Data Validity". It's really not that simple.

First, while the differential testing rate is one factor that should be considered, this factor alone does not account for the entire gap. Second, this issue alone does not disqualify the data. Third, if testing rate differences should be used to invalidate this set of data, then all of the analyses put out by official sources lauding the success of vaccination should also be thrown out since there are vast differences in testing rates across all countries (and also across different time periods for the same country).

One typical workaround for differential testing rate is to look at deaths rather than cases. For the period of time plotted on the case curve, Nigeria's cumulative death per million is about 1/8th that of Iceland. The real problem is again in the Data analysis, and it is about how to interpret this data casually.

This example is yet another Type DV chart. I'd classify it under problems with "Casual Inference". "Data Validity" is definitely a real concern; I just don't find this example convincing.


The next section, titled "Failure to account for statistical nuance," is a strange one. The example is a chart that the CDC puts out showing the emergence of cases in a specific county, with cases classified by vaccination status. The chart shows that the vast majority of cases were found in people who were fully vaccinated. The person who tweeted concluded that vaccinated people are the "superspreaders". Maxim's objection to this interpretation is that most cases are in the fully vaccinated because most people are fully vaccinated.

I don't think it's right to criticize the original tweeter in this case. If by superspreader, we mean people who are infected and out there spreading the virus to others through contacts, then what the data say is exactly that most such people are fully vaccinated. In fact, one should be very surprised if the opposite were true.

Indeed, this insight has major public health implications. If the vaccine is indeed 90% effective at stopping cases, we should not be seeing that level of cases. And if the vaccine is only moderately effective, then we may not be able to achieve "herd immunity" status, as was the plan originally.

I'd be reluctant to complain about this specific data visualization. It seems that the data allow different interpretations - some of which are contradictory but all of which are needed to draw a measured conclusion.

The last section on "misrepresentation of scientific results" could use a better example. I certainly agree with the message: that people have confirmation bias. I have been calling this "story-first thinking": people with a set story visualize only the data that support their preconception.

However, the example given is not that. The example shows a tweet that contains a chart from a scientific paper that apparently concludes that hydroxychloroquine helps treat Covid-19. Maxim adds this study was subsequently retracted. If the tweet was sent prior to the retraction, then I don't think we can grumble about someone citing a peer reviewed study published in Lancet.


Overall, I like Maxim's message. In some cases, I think there are better examples.



Visual story-telling: do you know or do you think?

One of the most important data questions of all time is: do you know? or do you think?

And one of the easiest traps to fall into is: I think, therefore I know.


Visual story-telling can be great but it can also mislead. Deception sometimes happens when readers are nudged to "fill in the blanks" with stuff they think they know, but they don't.

A Twitter reader asked me to look at the map in this Los Angeles Times (paywall) opinion column.


The column promptly announces its premise:

Years of widening economic inequality, compounded by the pandemic and political storm and stress, have given Americans the impression that the country is on the wrong track. Now there’s empirical data to show just how far the country has run off the rails: Life expectancies have been falling.

The writer creates the expectation that he will reveal evidence in the form of data to show that life expectancies have been driven down by economic inequality, pandemic, and politics. Does he succeed?


The map portrays average life expectancy (at birth) for some mysterious, presumably very recent, year for every county in the United States. From the color legend, we learn that the bottom-to-top range is about 20 years. There is a clear spatial pattern, with the worst results in the south (excepting south Florida).

The choice of colors is telling. Red and blue on a U.S. map has heavy baggage, as they signify the two main political parties in the country. Given that the author believes politics to be a key driver of health outcomes, the usage of red and blue here is deliberate. Throughout the article, the columnist connects the lower life expectancies in southern states to its politics.

For example, he said "these geographical disparities aren't artifacts of pure geography or demographics; they're the consequences of policy decisions at the state level... Of the 20 states with the worst life expectancies, eight are among the 12 that have not implemented Medicaid expansion under the Affordable Care Act..."

Casual readers may fall into a trap here. There is nothing on the map itself that draws the connection between politics and life expectancies; the idea is evoked purely through the red-blue color scheme. So, as readers, we are filling in the blanks with our own politics.

What could have been done instead? Let's look at the life expectancy map side by side with the map of the U.S. 2020 Presidential election.


Because of how close recent elections have been, we may think the political map has a nice balance of red and blue but it isn't. The Democrats' votes are heavily concentrated in densely-populated cities so most of the Presidential election map is red. When placed next to each other, it's obvious that politics don't explain the variance in life expectancy well. The Midwest is deep red and yet they have above average life expectancies. I have circled out various regions that contradict the claim that Republican politics drove life expectancies down.

It's not sufficient to point to the South, in which Republican votes and life expectancy are indeed inversely correlated. A good theory has to explain most of the country.


The columnist also suggests that poverty is the cause of low life expectancy. That too cannot be gleaned from the published map. Again, readers are nudged to use their wild imagination to fill in the blank.

Data come to the rescue. Here is a side-by-side comparison of the map of life expectancies and the map of median incomes.


A similar conundrum. While the story feels right in the South, it fails to explain the northwest, Florida, and various other parts of the country. Take a look again at the circled areas. Lower income brackets are also sometimes associated with high life expectancies.


The author supplies a third cause of lower life expectancies: Covid-19 response. Because Covid-19 was the "most obvious and convenient" explanation for the loss of life expectancy during the pandemic, this theory suggests that the red areas on the life expectancy map should correspond to the regions most ravaged by Covid-19.

Let's see the data.


The map on the right shows the number of confirmed cases until June 2021. As before, the correlation holds somewhat in the South but there are notable exceptions, e.g. the Midwest. We also have states with low Covid-19 cases but below-average life expectancy.


What caused the decline of life expectancy in the U.S. - which began before the pandemic, and has continued beyond - is highly complex, beyond what a single map or a pair of maps or a few pairs of maps could convey. Showing a red-blue map presents a trap for readers to fall into, in which they start thinking, without knowing.


Parsons Student Projects

I had the pleasure of attending the final presentations of this year's graduates from Parsons's MS in Data Visualization program. You can see the projects here.


A few of the projects caught my eye.

A project called "Authentic Food in NYC" explores where to find "authentic" cuisine in New York restaurants. The project is notable for plowing through millions of Yelp reviews, and organizing the information within. Reviews mentioning "authentic" or "original" were extracted.

During the live presentation, the student clicked on Authentic Chinese, and the name that popped up was Nom Wah Tea Parlor, which serves dim sum in Chinatown that often has lines out the door.


Curiously, the ranking is created from raw counts of authentic reviews, which favors restaurants with more reviews, such as restaurants that have been operating for a longer time. It's unclear what rule is used to transfer authenticity from reviews to restaurants: does a single review mentioning "authentic" qualify a restaurant as "authentic", or some proportion of reviews?

Later, we see a visualization of the key words found inside "authentic" reviews for each cuisine. Below are words for Chinese and Italian cuisines:


These are word clouds with a twist. Instead of encoding the word counts in the font sizes, she places each word inside a bubble, and uses bubble sizes to indicate relative frequency.

Curiously, almost all the words displayed come from menu items. There isn't any subjective words to be found. Algorithms that extract keywords frequently fail in the sense that they surface the most obvious, uninteresting facts. Take the word cloud for Taiwanese restaurants as an example:


The overwhelming keyword found among reviews of Taiwanese restaurants is... "taiwanese". The next most important word is "taiwan". Among the remaining words, "886" is the name of a specific restaurant, "bento" is usually associated with Japanese cuisine, and everything else is a menu item.

Getting this right is time-consuming, and understandably not a requirement for a typical data visualization course.

The most interesting insight is found in this data table.


It appears that few reviewers care about authenticity when they go to French, Italian, and Japanese restaurants but the people who dine at various Asian restaurants, German restaurants, and Eastern European restaurants want "authentic" food. The student concludes: "since most Yelp reviewers are Americans, their pursuit of authenticity creates its own trap: Food authenticity becomes an americanized view of what non-American food is."

This hits home hard because I know what authentic dim sum is, and Nom Wah Tea Parlor it ain't. Let me check out what Yelpers are saying about Nom Wah:

  1. Everything was so authentic and delicious - and cheap!!!
  2. Your best bet is to go around the corner and find something more authentic.
  3. Their dumplings are amazing everything is very authentic and tasty!
  4. The food was delicious and so authentic, and the staff were helpful and efficient.
  5. Overall, this place has good authentic dim sum but it could be better.
  6. Not an authentic experience at all.
  7. this dim sum establishment is totally authentic
  8. The onions, bean sprouts and scallion did taste very authentic and appreciated that.
  9. I would skip this and try another spot less hyped and more authentic.
  10. I would have to take my parents here the next time I visit NYC because this is authentic dim sum.

These are the most recent ten reviews containing the word "authentic". Seven out of ten really do mean authentic, the other three are false friends. Text mining is tough business! The student removed "not authentic" which helps. As seen from above, "more authentic" may be negative, and there may be words between "not" and "authentic". Also, think "not inauthentic", "people say it's authentic, and it's not", etc.

One thing I learned from this project is that "authentic" may be a synonym for "I like it" when these diners enjoy the food at an ethnic restaurant. I'm most curious about what inauthentic onions, bean sprouts and scallion taste like.

I love the concept and execution of this project. Nice job!


Another project I like is about tourism in Venezuela. The back story is significant. Since a dictatorship took over the country, the government stopped reporting tourism statistics. It's known that tourism collapsed, and that it may be gradually coming back in recent years.

This student does not have access to ready-made datasets. But she imaginatively found data to pursue this story. Specifically, she mentioned grabbing flight schedules into the country from the outside.

The flow chart is a great way to explore this data:


A map gives a different perspective:


I'm glad to hear the student recite some of the limitations of the data. It's easy to look at these visuals and assume that the data are entirely reliable. They aren't. We don't know that what proportion of the people traveling on those flights are tourists, how full those planes are, or the nationalities of those on board. The fact that a flight originated from Panama does not mean that everyone on board is Panamanian.


The third project is interesting in its uniqueness. This student wants to highlight the effect of lead in paint on children's health. She used the weight of lead marbles to symbolize the impact of lead paint. She made a dress with two big pockets to hold these marbles.

Scherer_parsons_dataviz_leaddress sm

It's not your standard visualization. One can quibble that dividing the marbles into two pockets doesn't serve a visualziation purpose, and so on. But at the end, it's a memorable performance.

Finding the story in complex datasets

In CT Mirror's feature about Connecticut, which I wrote about in the previous post, there is one graphic that did not rise to the same level as the others.


This section deals with graduation rates of the state's high school districts. The above chart focuses on exactly five districts. The line charts are organized in a stack. No year labels are provided. The time window is 11 years from 2010 to 2021. The column of numbers show the difference in graduation rates over the entire time window.

The five lines look basically the same, if we ignore what looks to be noisy year-to-year fluctuations. This is due to the weird aspect ratio imposed by stacking.

Why are those five districts chosen? Upon investigation, we learn that these are the five districts with the biggest improvement in graduation rates during the 11-year time window.

The same five schools also had some of the lowest graduation rates at the start of the analysis window (2010). This must be so because if a school graduated 90% of its class in 2010, it would be mathematically impossible for it to attain a 35% percent point improvement! This is a dissatisfactory feature of the dataviz.


In preparing an alternative version, I start by imagining how readers might want to utilize a visualization of this dataset. I assume that the readers may have certain school(s) they are particularly invested in, and want to see its/their graduation performance over these 11 years.

How does having the entire dataset help? For one thing, it provides context. What kind of context is relevant? As discussed above, it's futile to compare a school at the top of the ranking to one that is near the bottom. So I created groups of schools. Each school is compared to other schools that had comparable graduation rates at the start of the analysis period.

Amistad School District, which takes pole position in the original dataviz, graduated only 58% of its pupils in 2010 but vastly improved its graduation rate by 35% over the decade. In the chart below (left panel), I plotted all of the schools that had graduation rates between 50 and 74% in 2010. The chart shows that while Amistad is a standout, almost all schools in this group experienced steady improvements. (Whether this phenomenon represents true improvement, or just grade inflation, we can't tell from this dataset alone.)


The right panel shows the group of schools with the next higher level of graduation rates in 2010. This group of schools too increased their graduation rates almost always. The rate of improvement in this group is lower than in the previous group of schools.

The next set of charts show school districts that already achieved excellent graduation rates (over 85%) by 2010. The most interesting group of schools consists of those with 85-89% rates in 2010. Their performance in 2021 is the most unpredictable of all the school groups. The majority of districts did even better while others regressed.


Overall, there is less variability than I'd expect in the top two school groups. They generally appeared to have been able to raise or maintain their already-high graduation rates. (Note that the scale of each chart is different, and many of the lines in the second set of charts are moving within a few percentages.)

One more note about the charts: The trend lines are "smoothed" to focus on the trends rather than the year to year variability. Because of smoothing, there is some awkward-looking imprecision e.g. the end-to-end differences read from the curves versus the observed differences in the data. These discrepancies can easily be fixed if these charts were to be published.

Thoughts on Daniel's fix for dual-axes charts

I've taken a little time to ponder Daniel Z's proposed "fix" for dual-axes charts (link). The example he used is this:


In that long post, Daniel explained why he preferred to mix a line with columns, rather than using the more common dual lines construction: to prevent readers from falsely attributing meaning to crisscrossing lines. There are many issues with dual-axes charts, which I won't repeat in this post; one of their most dissatisfying features is the lack of connection between the two vertical scales, and thus, it's pretty easy to manufacture an image of correlation when it doesn't exist. As shown in this old post, one can expand or restrict one of the vertical axes and shift the line up and down to "match" the other vertical axis.

Daniel's proposed fix retains the dual axes, and he even restores the dual lines construction.


How is this chart different from the typical dual-axes chart, like the first graph in this post?

Recall that the problem with using two axes is that the designer could squeeze, expand or shift one of the axes in any number of ways to manufacture many realities. What Daniel effectively did here is selecting one specific way to transform the "New Customers" axis (shown in gray).

His idea is to run a simple linear regression between the two time series. Think of fitting a "trendline" in Excel between Revenues and New Customers. Then, use the resulting regression equation to compute an "estimated" revenues based on the New Customers series. The coefficients of this regression equation then determines the degree of squeezing/expansion and shifting applied to the New Customers axis.

The main advantage of this "fix" is to eliminate the freedom to manufacture multiple realities. There is exactly one way to transform the New Customers axis.

The chart itself takes a bit of time to get used to. The actual values plotted in the gray line are "estimated revenues" from the regression model, thus the blue axis values on the left apply to the gray line as well. The gray axis shows the respective customer values. Because we performed a linear fit, each value of estimated revenues correspond to a particular customer value. The gray line is thus a squeezed/expanded/shifted replica of the New Customers line (shown in orange in the first graph). The gray line can then be interpreted on two connected scales, and both the blue and gray labels are relevant.


What are we staring at?

The blue line shows the observed revenues while the gray line displays the estimated revenues (predicted by the regression line). Thus, the vertical gaps between the two lines are the "residuals" of the regression model, i.e. the estimation errors. If you have studied Statistics 101, you may remember that the residuals are the components that make up the R-squared, which measures the quality of fit of the regression model. R-squared is the square of r, which stands for the correlation between Customers and the observed revenues. Thus the higher the (linear) correlation between the two time series, the higher the R-squared, the better the regression fit, the smaller the gaps between the two lines.


There is some value to this chart, although it'd be challenging to explain to someone who has not taken Statistics 101.

While I like that this linear regression approach is "principled", I wonder why this transformation should be preferred to all others. I don't have an answer to this question yet.


Daniel's fix reminds me of a different, but very common, chart.


This chart shows actual vs forecasted inflation rates. This chart has two lines but only needs one axis since both lines represent inflation rates in the same range.

We can think of the "estimated revenues" line above as forecasted or expected revenues, based on the actual number of new customers. In particular, this forecast is based on a specific model: one that assumes that revenues is linearly related to the number of new customers. The "residuals" are forecasting errors.

In this sense, I think Daniel's solution amounts to rephrasing the question of the chart from "how closely are revenues and new customers correlated?" to "given the trend in new customers, are we over- or under-performing on revenues?"

Instead of using the dual-axes chart with two different scales, I'd prefer to answer the question by showing this expected vs actual revenues chart with one scale.

This does not eliminate the question about the "principle" behind the estimated revenues, but it makes clear that the challenge is to justify why revenues is a linear function of new customers, and no other variables.

Unlike the dual-axes chart, the actual vs forecasted chart is independent of the forecasting method. One can produce forecasted revenues based on a complicated function of new customers, existing customers, and any other factors. A different model just changes the shape of the forecasted revenues line. We still have two comparable lines on one scale.






Getting simple charts right

Ian K. submitted this chart on Twitter:


The chart comes from a video embedded in this report (link) about Chicago cops leaving their jobs.

Let's start with the basics. This is an example of a simple line chart illustrating a time series of five observations. The vertical axis starts at 10,000 instead of 0. With this choice, the designer wants to focus on the point-to-point change in values, rather than its relation to the initial value.

Every graph has add-ons that assist cognition. On this chart, we have axis labels, gridlines and data labels. Every add-on increases reading time so we should be sparing.

First consider the gridlines. In the following chart, I conduct a self-sufficiency test by removing the data labels from the chart:


You can see that the last three values present no problems. The first two, especially the first value, are hard to read - because the top gridline is missing! The next chart restores the bounding gridline, so you can see the difference that one small detail can make:



Next, let's compare the following versions of the chart. The left one contains data labels without gridlines and axis labels. The right one has the gridlines and axis labels but no data labels.


The left chart prints the entire dataset onto the chart. The reader in essence is reading the raw data. That appears to be the intention of the chart designer as the data labels are in large size, placed inside shiny white boxes. The level of the boxes determines the reader's perception as those catch more of our attention than the dots that actually represent the data.

The right chart highlights the dots and the lines between them. The gridlines are way too thick and heavy so as to distract rather than abet. This chart presumes that the reader isn't that interested in the precise numbers as she is in the trend.


As Ian pointed out, one of the biggest problems with this chart is the appearance of even time intervals when all except one of the date values are January. This seemingly innocent detail destroys the chart. The line segments of the chart encodes the pre-post change in the staffing numbers. For most of the line segments, the metric is year-on-year change but the last two line segments on the right show something else: a 19-month change, followed by a 5-month change.

I did the following analysis to understand how big of a staffing problem CPD faces.

First I restored the January 2022 time value, while shifting the Aug 2022 value to its rightful place on the time axis. Next, I added the dashed brown line, which represents a linear extension of the trend seen between January 2020-2021, before the sudden dip. We don't know what the true January 2022 value is but the projected value based on past trend is around 12,200. By August, the projected value is around 11,923, about 300 above the actual value of 11,611. By January 2023, the projected value is almost exactly the same as the actual value.

This linear trending analysis is likely too simplistic but it offers a baseline to start thinking about what the story is. The long-term trend is still down but the apparent dip in 2022 may not be meaningful.



Longest life, shortest length

Racetrack charts refuse to die. For old time's sake, here is a blog post from 2005 in which I explain why they don't make good dataviz.

Our latest example comes from Visual Capitalist (link), which publishes a fair share of nice dataviz. In this infographics, they feature a racetrack chart, just because the topic is the lifespan of cars.


The whole infographic has four parts, each a racetrack chart. I'll focus on the first racetrack chart (shown above), which deals with the product category of sedans and hatchbacks.

The first thing I noticed is the reference value of 100,000 miles, which is described as the expected lifespan of a typical car made in the 1970s. This is of dubious value since the top of the page informs us the current relevant reference value is 200,000 miles, which is unlabeled. We surmise that 200,000 miles is indicated by the end of the grey sections of the racetrack. (This is eventually confirmed in the next racettrack chart for SUVs in the second sectiotn of the infographic.)

Now let's zoom in on the brown section of the track. Each of the four sections illustrates the same datum = 100,000 miles and yet they exhibit different lengths. From this, we learn that the data are not encoded in the lengths of these tracks -- but rather the data are to be found in the angle sustained at the centre of the concentric circles. The problem with racetrack charts is that readers are drawn to the lengths of the tracks rather than the angles at the center, which are not explicitly represented.

The Avalon model has the longest life span on this chart, and yet it is shown as the shortest curve.


The most baffling part of this chart is not the visual but the analysis methodology.

I quote:

iSeeCars analyzed over 2M used cars on the road between Jan. and Oct. 2022. Rankings are based on the mileage that the top 1% of cars within each model obtained.

According to this blurb, the 245,710 miles number for Avalon is the average mileage found in the top 1% of Avalons within the iSeeCars sample of 2M used cars.

The word "lifespan" strikes me as incorporating a date of death, and yet nothing in the above text indicates that any of the sampled cars are at end of life. The cars they really need are not found in their sample at all.

I suppose taking the top 1% is meant to exclude younger cars but why 1%? Also, this sample completely misses the cars that prematurely died, e.g. the cars that failed after 100,000 miles but before 200,000 miles. This filtering also ensures that newer models are excluded from the sample.

_trifectacheckup_imageIn the Trifecta Checkup, this qualifies as Type DV. The dataset does not answer the question of concern while the visual form distorts the data.

The blue mist

The New York Times printed several charts about Twitter "blue checks," and they aren't one of their best efforts (link).

Blue checks used to be credentials given to legitimate accounts, typically associated with media outlets, celebrities, brands, professors, etc. They are free but must be approved by Twitter. Since Elon Musk acquired Twitter, he turned blue checks into a revenue generator. Yet another subscription service (but you're buying "freedom"!). Anyone can get a blue check for US$8 per month.

[The charts shown here are scanned from the printed edition.]


The first chart is a scatter plot showing the day of joining Twitter and the total number of followers the account has as of early November, 2022. Those are very strange things to pair up on a scatter plot but I get it: the designer could only work with the data that can be pulled down from Twitter's API.

What's wrong with the data? It would seem the interesting question is whether blue checks are associated with number of followers. The chart shows only Twitter Blue users so there is nothing to compare to. The day of joining Twitter is not the day of becoming "Twitter Blue", almost surely not for any user (Nevetheless, the former is not a standard data element released by Twitter). The chart has a built-in time bias since the longer an account exists, one would assume the higher the number of followers (assuming all else equal). Some kind of follower rate (e.g. number of followers per year of existence) might be more informative.

Still, it's hard to know what the chart is saying. That most Blue accounts have fewer than 5,000 followers? I also suspect that they chopped off the top of the chart (outliers) and forgot to mention it. Surely, some of the celebrity accounts have way over 150,000 followers. Another sign that the top of the chart was removed is that an expected funnel effect is not seen. Given the follower count is cumulative from the day of registration, we'd expect the accounts that started in the last few months should have markedly lower counts than those created years ago. (This is even more true if there is a survivorship bias - less successful accounts are more likely to be deleted over time.)

The designer arbitrarily labelled six specific accounts ("Crypto influencer", "HBO fan", etc.) but this feature risks sending readers the wrong message. There might be one HBO fan account that quickly grew to 150,000 followers in just a few months but does the data label suggest to readers that HBO fan accounts as a group tend to quickly attain high number of followers?


The second chart, which is an inset of the first, attempts to quantify the effect of the Musk acquisition on the number of "registrations and subscriptions". In the first chart, the story was described as "Elon Musk buys Twitter sparking waves of new users who later sign up for Twitter Blue".


The second chart confuses me. I was trying to figure out what is counted in the vertical axis. This was before I noticed the inset in the first chart, easy to miss as it is tucked into the lower right corner. I had presumed that the axis would be the same as in the first chart since there weren't any specific labels. In that case, I am looking at accounts with 0 to 500 followers, pretty inconsequential accounts. Then, the chart title uses the words "registrations and subscriptions." If the blue dots on this chart also refer to blue-check accounts as in the first chart, then I fail to see how this chart conveys any information about registrations (wbich presumably would include free accounts). As before, new accounts that aren't blue checks won't appear.

Further, to the extent that this chart shows a surge in subscriptions, we are restricted to accounts with fewer than 500 followers, and it's really unclear what proportion of total subscribers is depicted. Nor is it possible to estimate the magnitude of this surge.

Besides, I'm seeing similar densities of the dots across the entire time window between October 2021 and 2022. Perhaps the entire surge is hidden behind the black lines indicating the specific days when Musk announced and completed the acquisition, respectively. If the surge is hiding behind the black vertical lines, then this design manages to block the precise spots readers are supposed to notice.

Here is where we can use the self-sufficiency test. Imagine the same chart without the text. What story would you have learned from the graphical elements themselves? Not much, in my view.


The third chart isn't more insightful. This chart purportedly shows suspended accounts, only among blue-check accounts.


From what I could gather (and what I know about Twitter's API), the chart shows any Twitter Blue account that got suspended at any time. For example, all the black open circles occurring prior to October 27, 2022 represent suspensions by the previous management, and presumably have nothing to do with Elon Musk, or his decision to turn blue checks into a subscription product.

There appears to be a cluster of suspensions since Musk took over. I am not sure what that means. Certainly, it says he's not about "total freedom". Most of these suspended accounts have fewer than 50 followers, and only been around for a few weeks. And as before, I'm not sure why the analyst decided to focus on accounts with fewer than 500 followers.

What could have been? Given the number of suspended accounts are relatively small, an interesting analysis would be to form clusters of suspended accounts, and report on the change in what types of accounts got suspended before and after the change of management.


The online article (link) is longer, filling in some details missing from the printed edition.

There is one view that shows the larger accounts:


While more complete, this view isn't very helpful as the biggest accounts are located in the sparsest area of the chart. The data labels again pick out strange accounts like those of adult film stars and an Arabic news site. It's not clear if the designer is trying to tell us that most of Twitter Blue accounts belong to those categories.

See here for commentary on other New York Times graphics.





Energy efficiency deserves visual efficiency

Long-time contributor Aleksander B. found a good one, in the World Energy Outlook Report, published by IEA (International Energy Agency).


The use of balloons is unusual, although after five minutes, I decided I must do some research to have any hope of understanding this data visualization.

A lot is going on. Below, I trace my own journey through this chart.

The text on the top left explains that the chart concerns emissions and temperature change. The first set of balloons (the grey ones) includes helpful annotations. The left-right position of the balloons indicates time points, in 10-year intervals except for the first.

The trapezoid that sits below the four balloons is more mysterious. It's labelled "median temperature rise in 2100". I debate two possibilities: (a) this trapezoid may serve as the fifth balloon, extending the time series from 2050 to 2100. This interpretation raises a couple of questions: why does the symbol change from balloon to trapezoid? why is the left-right time scale broken? (b) this trapezoid may represent something unrelated to the balloons. This interpretation also raises questions: its position on the horizontal axis still breaks the time series; and  if the new variable is "median temperature rise", then what determines its location on the chart?

That last question is answered if I move my glance all the way to the right edge of the chart where there are vertical axis labels. This axis is untitled but the labels shown in degree Celsius units are appropriate for "median temperature rise".

Turning to the balloons, I wonder what the scale is for the encoded emissions data. This is also puzzling because only a few balloons wear data labels, and a scale is nowhere to be found.


The gridlines suggests that the vertical location of the balloons is meaningful. Tracing those gridlines to the right edge leads me back to the Celsius scale, which seems unrelated to emissions. The amount of emissions is probably encoded in the sizes of the balloons although none of these four balloons have any data labels so I'm rather flustered. My attention shifts to the colored balloons, a few of which are labelled. This confirms that the size of the balloons indeed measures the amount of emissions. Nevertheless, it is still impossible to gauge the change in emissions for the 10-year periods.

The colored balloons rising above, way above, the gridlines is an indication that the gridlines may lack a relationship with the balloons. But in some charts, the designer may deliberately use this device to draw attention to outlier values.

Next, I attempt to divine the informational content of the balloon strings. Presumably, the chart is concerned with drawing the correlation between emissions and temperature rise. Here I'm also stumped.

I start to look at the colored balloons. I've figured out that the amount of emissions is shown by the balloon size but I am still unclear about the elevation of the balloons. The vertical locations of these balloons change over time, hinting that they are data-driven. Yet, there is no axis, gridline, or data label that provides a key to its meaning.

Now I focus my attention on the trapezoids. I notice the labels "NZE", "APS", etc. The red section says "Pre-Paris Agreement" which would indicate these sections denote periods of time. However, I also understand the left-right positions of same-color balloons to indicate time progression. I'm completely lost. Understanding these labels is crucial to understanding the color scheme. Clearly, I have to read the report itself to decipher these acronyms.

The research reveals that NZE means "net zero emissions", which is a forecasting scenario - an utterly unrealistic one - in which every country is assumed to fulfil fully its obligations, a sort of best-case scenario but an unattainable optimum. APS and STEPS embed different assumptions about the level of effort countries would spend on reducing emissions and tackling global warming.

At this stage, I come upon another discovery. The grey section is missing any acronym labels. It's actually the legend of the chart. The balloon sizes, elevations, and left-right positions in the grey section are all arbitrary, and do not represent any real data! Surprisingly, this legend does not contain any numbers so it does not satisfy one of the traditional functions of a legend, which is to provide a scale.

There is still one final itch. Take a look at the green section:


What is this, hmm, caret symbol? It's labeled "Net Zero". Based on what I have been able to learn so far, I associate "net zero" to no "emissions" (this suggests they are talking about net emissions not gross emissions). For some reason, I also want to associate it with zero temperature rise. But this is not to be. The "net zero" line pins the balloon strings to a level of roughly 2.5 Celsius rise in temperature.

Wait, that's a misreading of the chart because the projected net temperature increase is found inside the trapezoid, meaning at "net zero", the scientists expect an increase in 1.5 degrees Celsius. If I accept this, I come face to face with the problem raised above: what is the meaning of the vertical positioning of the balloons? There must be a reason why the balloon strings are pinned at 2.5 degrees. I just have no idea why.

I'm also stealthily presuming that the top and bottom edges of the trapezoids represent confidence intervals around the median temperature rise values. The height of each trapezoid appears identical so I'm not sure.

I have just learned something else about this chart. The green "caret" must have been conceived as a fully deflated balloon since it represents the value zero. Its existence exposes two limitations imposed by the chosen visual design. Bubbles/circles should not be used when the value of zero holds significance. Besides, the use of balloon strings to indicate four discrete time points breaks down when there is a scenario which involves only three buoyant balloons.


The underlying dataset has five values (four emissions, one temperature rise) for four forecasting scenarios. It's taken a lot more time to explain the data visualization than to just show readers those 20 numbers. That's not good!

I'm sure the designer did not set out to confuse. I think what happened might be that the design wasn't shown to potential readers for feedback. Perhaps they were shown only to insiders who bring their domain knowledge. Insiders most likely would not have as much difficulty with reading this chart as did I.

This is an important lesson for using data visualization as a means of communications to the public. It's easy for specialists to assume knowledge that readers won't have.

For the IEA chart, here is a list of things not found explicitly on the chart that readers have to know in order to understand it.

  • Readers have to know about the various forecasting scenarios, and their acronyms (APS, NZE, etc.). This allows them to interpret the colors and section titles on the chart, and to decide whether the grey section is missing a scenario label, or is a legend.
  • Since the legend does not contain any scale information, neither for the balloon sizes nor for the temperatures, readers have to figure out the scales on their own. For temperature, they first learn from the legend that the temperature rise information is encoded in the trapezoid, then find the vertical axis on the right edge, notice that this axis has degree Celsius units, and recognize that the Celsius scale is appropriate for measuring median temperature rise.
  • For the balloon size scale, readers must resist the distracting gridlines around the grey balloons in the legend, notice the several data labels attached to the colored balloons, and accept that the designer has opted not to provide a proper size scale.

Finally, I still have several unresolved questions:

  • The horizontal axis may have no meaning at all, or it may only have meaning for emissions data but not for temperature
  • The vertical positioning of balloons probably has significance, or maybe it doesn't
  • The height of the trapezoids probably has significance, or maybe it doesn't



Finding the right context to interpret household energy data

Bloomberg_energybillBloomberg's recent article on surging UK household energy costs, projected over this winter, contains data about which I have long been intrigued: how much energy does different household items consume?

A twitter follower alerted me to this chart, and she found it informative.

If the goal is to pick out the appliances and estimate the cost of running them, the chart serves its purpose. Because the entire set of data is printed, a data table would have done equally well.

I learned that the mobile phone costs almost nothing to charge: 1 pence for six hours of charging, which is deemed a "single use" which seems double what a full charge requires. The games console costs 14 pence for a "single use" of two hours. That might be an underestimate of how much time gamers spend gaming each day.


Understanding the design of the chart needs a bit more effort. Each appliance is measured by two metrics: the number of hours considered to be "single use", and a currency value.

It took me a while to figure out how to interpret these currency values. Each cost is associated with a single use, and the duration of a single use increases as we move down the list of appliances. Since the designer assumes a fixed cost of electicity (shown in the footnote as 34p per kWh), at first, it seems like the costs should just increase from top to bottom. That's not the case, though.

Something else is driving these numbers behind the scene, namely, the intensity of energy use by appliance. The wifi router listed at the bottom is turned on 24 hours a day, and the daily cost of running it is just 6p. Meanwhile, running the fridge and freezer the whole day costs 41p. Thus, the fridge&freezer consumes electricity at a rate that is almost 7 times higher than the router.

The chart uses a split axis, which artificially reduces the gap between 8 hours and 24 hours. Here is another look at the bottom of the chart:



Let's examine the choice of "single use" as a common basis for comparing appliances. Consider this:

  • Continuous appliances (wifi router, refrigerator, etc.) are denoted as 24 hours, so a daily time window is also implied
  • Repeated-use appliances (e.g. coffee maker, kettle) may be run multiple times a day
  • Infrequent use appliances may be used less than once a day

I prefer standardizing to a "per day" metric. If I use the microwave three times a day, the daily cost is 3 x 3p = 9 p, which is more than I'd spend on the wifi router, run 24 hours. On the other hand, I use the washing machine once a week, so the frequency is 1/7, and the effective daily cost is 1/7 x 36 p = 5p, notably lower than using the microwave.

The choice of metric has key implications on the appearance of the chart. The bubble size encodes the relative energy costs. The biggest bubbles are in the heating category, which is no surprise. The next largest bubbles are tumble dryer, dishwasher, and electric oven. These are generally not used every day so the "per day" calculation would push them lower in rank.


Another noteworthy feature of the Bloomberg chart is the split legend. The colors divide appliances into five groups based on usage category (e.g. cleaning, food, utility). Instead of the usual color legend printed on a corner or side of the chart, the designer spreads the category labels around the chart. Each label is shown the first time a specific usage category appears on the chart. There is a presumption that the reader scans from top to bottom, which is probably true on average.

I like this arrangement as it delivers information to the reader when it's needed.