Swarmed by ants
Apr 03, 2025
Andrew discussed the following chart in a recent blog post:
Alert! A swarm of ants has marched onto a bubble chart.
These overlapping long text labels are dominating the chart; the length of these labels encodes the length of country names, which has nothing to do with the data.
We're waiting - hoping - for the ants to march off the page.
***
Andrew's blog post is about something else, the use of log scales. The chart above is a log-log plot. Both axes have log scales.
Andrew's correspondent doesn't like log scales. Andrew does.
One problem we encounter in practice with log scales is that people without science background can't read them. Andrew's correspondent said as much, while also misinterpreting the log-log chart. He says the log-log chart "visually creates a much stronger correlation than there actually is".
But that's not what happened. It's more appropriate to say that the log transformations allow us to see the correlation that exists. The correlation is not linear which is why the usual scatter plot does not reveal it.
Nevertheless, I agree with the correspondent on avoiding log scales in data displays because most readers don't get it.
***
Consider the following pair of plots.
The underlying data follow the pattern Y = 0.003 * X^2.5 but for what we're talking about, the specific pattern doesn't matter so long as X and Y has a "power" relationship.
The left plot directly shows the relationship between X and Y using regular scales. Readers see that Y is running away from X. The slope of the line increases as X increases. The speed of growth of Y exceeds that of X. This relationship is curved, which can't be described in words succinctly.
The right plot visually shows a linear relationship between X and Y but it's not really between X and Y. It's between log(X) and log(Y). Note that log(Y) = log(0.003*X^2.5) = log(0.003) + 2.5*log(X), which is a straight line with slope 2.5 and intercept log(0.003). The gap between gridlines now represents a 10-fold jump in value (of X or of Y). The linear relationship is between X and Y in log scale; in linear scale, it's a power relationship, not linear.
The practice of printing axis labels in the original scale, rather than log scale, adds to the confusion. On the right plot, the points labeled 5,000 and 50,000 do not actually lie on the line; what fall in line are the points log(5,000) and log(50,000). The reason for this confusing practice is that humans have trouble understanding data in log scale. For example, if $50,000 is the GDP per capita for some country, then log($50,000) = $4.5 which can't be interpreted.
Whether we are talking about the gaps between gridlines or about specific points on the line, what readers see on the log-log chart is only part of the story. Readers must also recognize that for the log-log chart to work, equal gaps between gridlines do not signify equal gaps in the data, while the linear relationship is between the log of the axis labels, not the labels themselves.
The X-Y plot can be interpreted visually in a direct way while the log-log plot requires the reader to transcend the visual representation, entering an abstract realm.