## If you blink, you'd miss this axis trick

##### Jan 31, 2023

When I set out to write this post, I was intending to make a quick point about the following chart found in the current issue of Harvard Magazine (link):

This chart concerns the "tectonic shift" of undergraduates to STEM majors at the expense of humanities in the last 10 years.

I like the chart. The dot plot is great for showing this data. They placed the long text horizontally. The use of color is crucial, allowing us to visually separate the STEM majors from the humanities majors.

My intended post is to suggest dividing the chart into four horizontal slices, each showing one of the general fields. It's a small change that makes the chart even more readable. (It has the added benefit of not needing a legend box.)

***

Then, the axis announced itself.

I was baffled, then disgusted.

Here is a magnified view of the axis:

It's not a linear scale, as one would have expected. What kind of transformation did they use? It's baffling.

Notice the following features of this transformed scale:

- It can't be a log scale because many of the growth values are negative.
- The interval for 0%-25% is longer than for 25%-50%. The interval for 0%-50% is also longer than for 50%-100%. On the positive side, the larger values are pulled in and the smaller values are pushed out.
- The interval for -20%-0% is the same length as that for 0%-25%. So, the transformation is not symmetric around 0

I have no idea what transformation was applied. I took the growth values, measured the locations of the dots, and asked Excel to fit a polynomial function, and it gave me a quadratic fit, R square > 99%.

This formula fits the values within the range extremely well. I hope this isn't the actual transformation. That would be disgusting. Regardless, they ought to have advised readers of their unusual scale.

***

Without having the fitted formula, there is no way to retrieve the actual growth values except for those that happen to fall on the vertical gridlines. Using the inverse of the quadratic formula, I deduced what the actual values were. The hardest one is for Computer Science, since the dot sits to the right of the last gridline. I checked that value against IPEDS data.

The growth values are not extreme, falling between -50% and 125%. There is nothing to be gained by transforming the scale.

The following chart undoes the transformation, and groups the majors by field as indicated above:

***

Yesterday, I published a version of this post at Andrew's blog. Several readers there figured out that the scale is the log of the relative ratio of the number of degrees granted. In the above notation, it is log10(100%+x), where x is the percent change in number of degrees between 2011 and 2021.

Here is a side-by-side view of the two scales:

The chart on the right spreads the negative growth values further apart while slightly compressing the large positive values. I still don't think there is much benefit to transforming this set of data.

P.S. [1/31/2023]

(1) A reader on Andrew's blog asked what's wrong with using the log relative ratio scale. What's wrong is exactly what this post is about. For any non-linear scale, the reader can't make out the values between gridlines. In the original chart, there are four points that exist between 0% and 25%. What values are those? That chart is even harder because now that we know what the transform is, we'd need to first think in terms of relative ratios, so 1.25 instead of 25%, then think in terms of log, if we want to know what those values are.

(2) The log scale used for change values is often said to have the advantage that equal distances on either side represent counterbalancing values. For example, (1.5) * (0.66) = (3/2)* (2/3) = 1. But this is a very specific scenario that doesn't actually apply to our dataset. Consider these scenarios:

History: # degrees went from 1000 to 666 i.e. Relative ratio = 2/3

Psychology: # degrees went from 2000 to 3000 i.e. Relative ratio = 3/2

The # of History degrees dropped by 334 while the number of Psychology degrees grew by 1000 (Psychology I think is the more popular major)

History: # degrees went from 1000 to 666 i.e. Relative ratio = 2/3

Psychology: from 1000 to 1500, i.e. Relative ratio = 3/2

The # of History degrees dropped by 334 while # of Psychology degrees grew by 500

(Assume same starting values)

History: # degrees went from 1000 to 666 i.e. Relative ratio = 2/3

Psychology: from 666 to 666*3/2 = 999 i.e. Relative ratio = 3/2

The # of History degrees dropped by 334 while # of Psychology degrees grew by 333

(Assume Psychology's starting value to be History's ending value)

Psychology: # degrees went from 1000 to 1500 i.e. Relative ratio = 3/2

History: # degrees went from 1500 to 1000 i.e. Relative ratio = 2/3

The # of Psychology degrees grew by 500 while the # of History degrees dropped by 500

(Assume History's starting value to be Psychology's ending value)

The color required bit makes the graph hard to use for the color blind. All that was needed was to use like, pink boxes, yellow circles, blue squares and *boom* it's fixed.

Posted by: c t | Jan 31, 2023 at 12:30 PM

Some additional data would help interpret this chart:

1. Are all undergraduate degrees that were awarded fit into these majors?

2. What is the magnitude of awards in each major?

3. Do faculty sizes match the changes?

In other words, what are the relative growth magnitudes from these popularities?

Posted by: Richard Krablin | Feb 02, 2023 at 12:48 PM