On the interpretability of log-scaled charts
May 05, 2025
A previous post featured the following chart showing stock returns over time:
Unbeknownst to readers, the chart plots one thing but labels it something else.
The designer of the chart explains how to read the chart in a separate note, which I included in my previous post (link). It's a crucial piece of information. Before reading his explanation, I didn't realize the sleight of hand: he made a chart with one time series, then substituted the y-axis labels with another set of values.
As I explored this design choice further, I realize that it has been widely adopted in a common chart form, without fanfare. I'll get to it in due course.
***
Let's start our journey with as simple a chart as possible. Here is a line chart showing constant growth in the revenues of a small business:
For all the charts in this post, the horizontal axis depicts time (x = 0, 1, 2, ...). To simplify further, I describe discrete time steps although nothing changes if time is treated as continuous.
The vertical scale is in dollars, the original units. It's conventional to modify the scale to units of thousands of dollars, like this:
No controversy arises if we treat these two charts as identical. Here I put them onto the same plot, using dual axes, emphasizing the one-to-one correspondence between the two scales.
We can do the same thing for two time series that are linearly related. The following chart shows constant growth in temperature using both Celsius and Fahrenheit scales:
Here is the chart displaying only the Fahrenheit axis:
This chart admits two interpretations: (A) it is a chart constructed using F values directly and (B) it is a chart created using C values, after which the axis labels were replaced by F values. Interpretation B implements the sleight of hand of the log-returns plot. The issue I'm wrestling with in this post is the utility of interpretation B.
Before we move to our next stop, let's stipulate that if we are exposed to that Fahrenheit-scaled chart, either interpretation can apply; readers can't tell them apart.
***
Next, we look at the following line chart:
Notice the vertical axis uses a log10 scale. We know it's a log scale because the equally-spaced tickmarks represent different jumps in value: the first jump is from 1 to 10, the next jump is from 10, not to 20, but to 100.
Just like before, I make a dual-axes version of the chart, putting the log Y values on the left axis, and the original Y values on the right axis.
By convention, we often print the original values as the axis labels of a log chart. Can you recognize that sleight of hand? We make the chart using the log values, after which we replace the log value labels with the original value labels. We adopt this graphical trick because humans don't think in log units, thus, the log value labels are less "interpretable".
As with the temperature chart, we will attempt to interpret the chart two ways. I've already covered interpretation B. For interpretation A, we regard the line chart as a straightforward plot of the values shown on the right axis (i.e., the original values). Alas, this viewpoint fails for the log chart.
If the original data are plotted directly, the chart should look like this:
It's not a straight line but a curve.
What have I just shown? That, after using the sleight of hand, we cannot interpret the chart as if it were directly plotting the data expressed in the original scale.
To nail down this idea, we ask a basic question of any chart showing trendlines. What's the rate of change of Y?
Using the transformed log scale (left axis), we find that the rate of change is 1 unit per unit time. Using the original scale, the rate of change from t=1 to t=2 is (100-10)/1 = 90 units per unit time; from t=2 to t=3, it is (1000-100)/1 = 900 units per unit time. Even though the rate of change varies by time step, the log chart using original value labels sends the misleading picture that the rate of change is constant over time (thus a straight line). The decision to substitute the log value labels backfires!
This is one reason why I use log charts sparingly. (I do like them a lot for exploratory analyses, but I avoid using them as presentation graphics.) This issue of interpretation is why I dislike the sleight of hand used to produce those log stock returns charts, even if the designer offers a note of explanation.
Do we gain or lose "interpretability" when we substitute those axis labels?
***
Let's re-examine the dual-axes temperature chart, building on what we just learned.
The above chart suggests that whichever scale (axis) is chosen, we get the same line, with the same steepness. Thus, the rate of change is the same regardless of scale. This turns out to be an illusion.
Using the left axis, the slope of the line is 10 degrees Celsius per unit time. Using the right axis, the slope is 18 degrees Fahrenheit per unit time. 18 F is different from 10 C, thus, the slopes are not really the same! The rate of change of the temperature is given algebraically by the slope, and visually by the steepness of the line. Since two different slopes result in the same line steepness, the visualization conveys a lie.
This situation here is a bit better than that in the log chart. Here, in either scale, the rate of change is constant over time. Differentiating the temperature conversion formula, we find that the slope of the Fahrenheit line is always 9/5*the slope of the Celsius line. So a rate of 10 Celsius per unit time corresponds to 18 Fahrenheit per unit time.
What if the chart is presented with only the Fahrenheit axis labels although it is built using Celsius data? Since readers only see the F labels, the observed slope is in Fahrenheit units. Meanwhile, the chart creator uses Celsius units. This discrepancy is harmless for the temperature chart but it is egregious for the log chart. The underlying reason is the nonlinearity of the log transform - the slope of log Y vs time is not proportional to the slope of Y vs time; in fact, it depends on the value of Y.
***
The log chart is a sacred cow of scientists, a symbol of our sophistication. Are they as potent as we'd think? In particular, when we put original data values on the log chart, are we making it more intepretable, or less?
P.S. I want to tie this discussion back to my Trifecta Checkup framework. The design decision to substitute those axis labels is an example of an act that moves the visual (V) away from the data (D). If the log units were printed, the visual makes sense; when the original units were dropped in, the visual no longer conveys features of the data - the reader must ignore what the eyes are seeing, and focus instead on the brain's perspective.
I generally avoid log axes, with the exception of cases where we might expect the change in value to be proportional rather than linear. If we think that a 10% change is of equal relevance regardless of the value, then that's a good case for a log axis.
That would apply with the stock market change plot. Presumably not all investments occur in mid-2010. So what we'd want to know is the timescale over which change happens relative to whenever the investment occurred. That can be shown well with a log axis, but not with a linear one.
I agree it was a little duplicitous to hide the log transformation here though - I would put log-spaced tick marks on the graph to make it unambiguous.
Posted by: Bretwood Higman | May 05, 2025 at 06:49 PM
As I see it, there are at least two reasons for using Log 'paper'.
1) In the case of an exponential growth/decay situation, a Straight line shows that the growth/decay is what the theory predicts, and any other line or points suggest noise and perhaps a modification of the theory.
2) If there are two data sets that represent different bases, but theoretically would be exponential, then it is more visible to use logs so that if the lines are parallel, they show that the same exponential growth , but with different bases.
If I were doing the LOG graphs I'd want graphs to also show, on the Y axis, and perhaps in the graph, the 1,2,3,4,5,6,7,8,9 marks so that when a user glances at the graph, they are IMMEDIATELY struck by this compression, and thus by the use of LOGS
.
Posted by: Mike Liveright | May 07, 2025 at 02:10 PM
With the temperature graph are you saying the change in the numerical slope matters? It's linear with respect to temperature regardless. The actual temperature, which the numbers are meant to represent, is changing the same amount regardless. The numerical slope is just a consequence of measurement changes here and doesn't impact the interpretation of temperature changes as long as one knows each scale and the relationship to temperature.
Posted by: John | May 07, 2025 at 06:35 PM
John: the key phrase of yours is "as long as one knows the scale". The sleight of hand involves making the chart using one scale and then hiding that scale, and informing the reader of a different scale. With the linear transform, the negative effect is not evident but with the log transform, it cannot be ignored. After this examination, I'd even think twice before doing this sleight of hand on linear transformations. I feel it's not a good practice.
Posted by: Kaiser | May 07, 2025 at 06:53 PM
Kaiser: One would hope that the only reason to switch the temperature scale would be because the alternate scale was more familiar to the consumer of the data. I agree an arbitrary switch to portray a different numerical effect (higher or lower slope) would be inappropriate.
Posted by: John | May 08, 2025 at 10:10 AM