Color bomb

I found a snapshot of the following leaderboard (link) in a newsletter in my inbox.

Openrouter_leaderboard_stackedcolumns

This chart ranks different AIs (foundational models) by token usage (which is the unit by which AI companies charge users).

It's a standard stacked column chart, with data aggregated by week. The colors represent different foundational models.

In the original webpage, there is a table printed below, listing the top 20 model names, ordered from the most tokens used.

Openrouter_leaderboard_table

Certain AI models have come and gone (e.g. the yellow and blue ones at the bottom of the chart in the first half). The model in pink has been the front runner through all weeks.

Total usage has been rising, although it might be flattening, which is the point made by the newsletter publisher.

***

A curiosity is the gray shaded section on the far right - it represents the projected total token usage for the days that have not yet passed during the current week. This is one of those additions that I like to see more often. If the developer had chosen to plot the raw data and nothing more, then they would have made the same chart except for the gray section. On that chart, the last column should not be compared to any other column as it is the only one that encodes a partial week.

This added gray section addresses the specific question: whether the total token usage for the current week is on pace with prior weeks, or faster or slower. (The accuracy of the projection is a different matter, which I won't discuss.)

This added gray section leaves another set of questions unanswered. The chart suggests that the total token usage is expected to exceed the values for the prior few weeks, at the time it was frozen. We naturally want to know which models are contributing to this projected growth (and which aren't). The current design cannot address this issue because the projected additional usage is aggregated, and not available at the model level.

While it "tops up" the weekly total usage using a projected value, the chart does not show how many days are remaining. That's an important piece of information for interpreting the projection.

***

Now, we come to the good part, for those of us who loves details.

A major weakness of these stacked column charts is of course the dizzy set of colors required, one for each model. Some of the shades are so similar it's hard to tell if they repeated colors. Are these two different blues or the same blue?

Openrouter_leaderboard_blues

Besides, the visualization software has a built-in feature that "softens" a color when it is clicked on. This feature introduces unpleasant surprises as that soft shade might have been used for another category.

Openrouter_aimodels_ranking_mutedcolors

It appears that the series is running sideways (following the superimposed gray line) when in fact the first section is a softened red associated with the series that went higher (following the white line).

It's near impossible to work with so many colors. If you extract the underlying data, you find that they show 10 values per day across 24 weeks. Because the AI companies are busy launching new models, the dataset contains 40 unique model names, which imply they needed 40 different shades on this one chart. (Double that to 80 shades if we add the colors on click variations.)

***

I hope some of you have noticed something else. Earlier, I mentioned the model in pink as the most popular AI model but if you take a closer look, this pink section actually represents a mostly useless catch-all category called "Others," that presumably aggregates the token usages of a range of less popular models. In this design, the Others category is catching an undeserved amount of attention.

It's unclear how the models are ordered within each column. The developer did not group together different generations of models by the same developer. Anthropic Claude has many entries: Sonnet 4 [green], Sonnet 3.5 [blue], Sonnet 3.5 (self-moderated) [yellow], Sonnet 3.7 (thinking) [pink], Sonnet 3.7 [violet], Sonnet 3.7 (self-moderated) [cyan], etc. The same for OpenAI, Google, etc.

This graphical decision may reflect how users of large language models evaluate performance. Perhaps at this time, there is no brand loyalty, or lock-in effect, and users see all these different models as direct substitutes. Therefore, our attention is focused on the larger number of individual models, rather than the smaller set of AI developers.

***

Before ending the post, I must point out that the publisher of this set of rankings offers a platform that allows users to switch between models. They are visualizing their internal data. This means the dataset only describes what customers of Openrouter.ai do on this platform. There should be no expectation that this company's user base is representative of all users of LLMs.


Students demonstrate how analytics underlie strong dataviz

In today's post, I'm delighted to feature work by several students of Ray Vella's data visualization class at NYU. They have been asked to improve the following Economist chart entitled "The Rich Get Richer".

Economist_richgetricher

In my guest lecture to the class, I emphasized the importance of upfront analytics when constructing data visualizations.

One of the key messages is pay attention to definitions. How does the Economist define "rich" and "poor"? (it's not what you think). Instead of using percentiles (e.g. top 1% of the income distribution), they define "rich" as people living in the richest region by average GDP, and "poor" as people living in the poorest region by average GDP. Thus, the "gap" between the rich and the poor is measured by the difference in GDP between the average persons in those two regions.

I don't like this metric at all but we'll just have to accept that that's the data available for the class assignment.

***

Shulin Huang's work is notable in how she clarifies the underlying algebra.

Shulin_rvella_economist_richpoorgap

The middle section classifies the countries into two groups, those with widening vs narrowing gaps. The side panels show the two components of the gap change. The gap change is the sum of the change in the richest region and the change in the poorest region.

If we take the U.S. as an example, the gap increased by 1976 units. This is because the richest region gained 1777 while the poor region lost 199. Germany has a very different experience: the richest region regressed by 2215 while the poorest region improved by 424, leading to the gap narrowing by 2638.

Note how important it is to keep the order of the countries fixed across all three panels. I'm not sure how she decided the order of these countries, which is a small oversight in an otherwise excellent effort.

Shulin's text is very thoughtful throughout. The chart title clearly states "rich regions" rather than "the rich". Take a look at the bottom of the side panels. The label "national AVG" shows that the zero level is the national average. Then, the label "regions pulled further ahead" perfectly captures the positive direction.

Compared to the original, this chart is much more easily understood. The secret is the clarity of thought, the deep understanding of the nature of the data.

***

Michael Unger focuses his work on elucidating the indexing strategy employed by the Economist. In the original, each value of regional average GDP is indexed to the national average of the relevant year. A number like 150 means the region has an average GDP for the given year that is 50% higher than the national average. It's tough to explain how such indices work.

Michael's revision goes back to the raw data. He presents them in two panels. On the left, the absolute change over time in the average GDPs are presented for each of the richest/poorest region while on the right, the relative change is shown.

Mungar_rvella_economist_richpoorgap

(Some of the country labels are incorrect. I'll replace with a corrected version when I receive one.)

Presenting both sides is not redundant. In France, for example, the richest region improved by 17K while the poorest region went up by not quite 6K. But 6K on a much lower base represents a much higher proportional jump as the right side shows.

***

Related to Michael's work, but even simpler, is Debbie Hsieh's effort.

Debbiehsieh_rayvella_economist_richpoorgap

Debbie reduces the entire exercise to one message - the relative change over time in average GDP between the richest and poorest region in each country. In this simplest presentation, if both columns point up, then both the richest and the poorest region increased their average GDP; if both point down, then both regions suffered GDP drops.

If the GDP increased in the richest region while it decreased in the poorest region, then the gap widened by the most. This is represented by the blue column pointing up and the red column pointing down.

In some countries (e.g. Sweden), the poorest region (orange) got worse while the richest region (blue) improved slightly. In Italy and Spain, both the best and worst regions gained in average GDPs although the richest region attained a greater relative gain.

While Debbie's chart is simpler, it hides something that Michael's work shows more clearly. If both the richest and poorest regions increased GDP by the same percentage amount, the average person in the richest region actually experienced a higher absolute increase because the base of the percentage is higher.

***

The numbers across these charts aren't necessarily well aligned. That's actually one of the challenges of this dataset. There are many ways to process the data, and small differences in how each student handles the data lead to differences in the derived values, resulting in differences in the visual effects.


Decluttering charts

Enrico posted about the following chart, addressing the current assault on scientific research funding, and he's worried that poor communications skills are hurting the cause.

Bertini_tiretracks

He's right. You need half an hour to figure out what's going on here.

Let me write down what I have learned so far.

The designer only cares about eight research areas - all within the IT field - listed across the bottom.

Paired with each named research area are those bolded blue labels that run across the top (but not quite). I think they represent the crowning achievement within each field but I'm just guessing here.

It appears that each field experiences a sequence of development stages. Typically, universities get things going, then industry R&D teams enter the game, and eventually, products appear in the market. The orange, blue and black lines show this progression. The black line morphs into green, and may even expand in thickness - indicating progressive market adoption and growth.

For example, the first field from the left, digital communications, is shown to have begun in 1965 at universities. Then in early 1980s, industry started investing in this area. It was not until the 1990s when products became available, and not until the mid 2000s when the market exceeded $10 billion.

Even now, I haven't resolved all its mysteries. It's not explained the difference between a solid black line and a dotted black line. Further, it appears possible to bypass $1 billion and hit $10 billion right away.

***

Next, we must decipher the strange web of gray little arrows.

It appears that the arrows can go from orange to blue, blue to orange, blue to black, orange to black. Under digital communications, I don't see black or green back to blue or orange. However, under computer architecture, I see green to orange; under parallel & distributed systems, I see green to blue. I don't see any black to orange or black to blue, so black is a kind of trapping state (things go in but don't come out). Sometimes, it's better to say which direction is not possible - in this case, I think other than nothing comes out of black, every other direction is possible.

It remains unclear what sort of entity each arrow depicts. Each arrow has a specific start and end time. I'm guessing it has to do with a specific research item. Taking the bottom-most arrow for digital communications, I suppose something begun in academia in 1980 and then attracted industry investment around 1982. An arrow that points backwards from industry to academia indicates that universities pick up new research ideas from industry. Digital communications things tend to have short arrows, suggesting that it takes only a few years to bring a product to market.

To add to this mess, some arrows cross research areas. These are shown as curved arrows, rather than straight arrows. For these curved arrows, the "slope" of the arrow no longer holds any meaning.

The set of gray arrows are trying too hard. They are overstuffed with purposes. On the one hand, the web of arrows - and I'm referring to those between research areas - portray the synergies between different research areas. On the other hand, the arrows within each research area show the development trajectories of anonymized subjects. The arrows going back and forth between the orange and blue bars show the interplay between universities and industry research groups.

***

Lastly, we look at those gray text labels at the very top of the page. That's a grab-bag of corporate names (Motorola, Intel, ...) and product names (iPhone, iRobot, ...). Some companies span several research areas. I'm amused and impressed that apparently a linear sequence can be found for the eight research areas such that every single company has investments in only contiguous areas, precluding the need to "leapfrog" certain research areas!

Actually, no, that's wrong. I do notice Nvidia and HP appearing twice. But why is Google not part of digital communications next to iPhone?

Given that no universities are listed, the company and product labels are related to only the blue, black or green lines below. It might be only related to black and/or green. I'm not sure.

***

So far, I've expended energy only to tease out the structure of the underlying dataset. I haven't actually learned anything about the data!

***

The designer has to make some decisions because the different potential questions that the dataset can address impose conflicting graphical requirements.

If the goal is to surface a general development process that repeats for every research area, then the chart should highlight commonality, rather than difference. By contrast, if one's objective is to illustrate how certain research areas have experiences unique to themselves, one should choose a graphical form that brings out the differences.

If the focus is on larger research areas, then the relevant key dates are really the front ends of each vertical line; nothing else matters. By contrast, if one wants to show individual research items, then many more dates become pertinent.

A linear arrangement of the research areas will not perform if one's goal is to uncover connections between research areas. By contrast, if one attempts to minimize crossovers in a network design, it would be impossible to keep all elements belonging to each research area in close proximity.

A layering approach that involves multiple charts to tell the whole story may be the solution. See for example Gelman's post on ladder of abstraction.


The line-angle illusion

In a recent presentation, Prof. Matthias Schonlau explains his "hammock plot." I wrote about it here. During the talk, he used the hammock plot to illustrate an optical illusion found in plots requiring users to compare angular lines, known as the line-angle illusion. (Others prefer the name "sine" illusion.)

Here is a simple demonstration of the line-angle illusion, extracted from this paper.

Vanderplas_hofmann_sineillusion

Think of the two sine curves as time series, and we're comparing the differences between them. This requires us to assess trend in the vertical distances between the two lines. Weirdly, we perceive the vertical lines on the above chart to have varying lengths, even though they have equal lengths.

***
The link here contains an example of how the line-angle illusion can lead to misreading of trends on line charts:

Sineillusion_twolines

Is there a bigger difference in revenue at Time 1 than Time 2? Many of us will think so but on careful judgment, I think all of us can agree that the difference at Time 2 is in fact larger.

***

Much of the interest in a hammock plot lies in the links between the vertical blocks, and this is where the line-angle illustion can distort our perception. Studies have shown that humans tend to read not the vertical gaps but the angular gaps. Again, this issue is illustrated in the first mentioned paper:

Vanderplas_hofmann_sineillusion_distances

Matthias explained that their implementation of the hammock plot uses a strategy to counteract this line-angle illusion.

I take this to mean they distort the data in such a way that after readers apply the line-angle illusion, the resulting view would convey correctly the correct trend. A kind of double negative strategy. The paper linked above offers one such counter-illusion strategy.

I imagine this is a bit controversial as we are introducing deliberate distortion to counteract an expected perceptual illusion.

I'm not aware of any software that offers built-in functions that perform this type of illusion-busting adjustments. Do you know any?

 

P.S. [5-30-2025] Andrew Gelman has some comments on this topic on his blog. He said:

But to get closer to what Kaiser is asking: the analogy I’ve given is, suppose you’re building a wooden chair but using boards that are warped. In this case, the right thing to do is to incorporate the warp into the design, i.e. cut some pieces shorter than others and at different angles, etc., so that they fit together as is, rather than trying to go all rectilinear and then glue/nail everything together. The trouble with the latter strategy is that the wood will exert pressure on the joints and eventually the chair will break or distort itself in some way.


Hammock plots

Prof. Matthias Schonlau gave a presentation about "hammock plots" in New York recently.

Here is an example of a hammock plot that shows the progression of different rounds of voting during the 1903 papal conclave. (These are taken at the event and thus a little askew.)

Hammockplot_conclave

The chart shows how Cardinal Sarto beat the early favorite Rampolla during later rounds of voting. The chart traces the movement of votes from one round to the next. The Vatican destroys voting records, and apparently, records were unexpectedly retained for this particular conclave.

The dataset has several features that brings out the strengths of such a plot.

There is a fixed number of votes, and a fixed number of candidates. At each stage, the votes are distributed across the subset of candidates. From stage to stage, the support levels for candidate shift. The chart brings out the evolution of the vote.

From the "marginals", i.e. the stacked columns shown at each time point, we learn the relative strengths of the candidates, as they evolve from vote to vote.

The links between the column blocks display the evolution of support from one vote to the next. We can see which candidate received more votes, as well as where the additional votes came from (or, to whom some voters have drifted).

The data are neatly arranged in successive stages, resulting in discrete time steps.

Because the total number of votes are fixed, the relative sizes of the marginals are nicely constrained.

The chart is made much more readable because of binning. Only the top three candidates are shown individually with all the others combined into a single category. This chart would have been quite a mess if it showed, say, 10 candidates.

How precisely we can show the intra-stage movement depends on how the data records were kept. If we have the votes for each person in each round, then it should be simple to execute the above! If we only have the marginals (the vote distribution by candidate) at each round, then we are forced to make some assumptions about which voters switched their votes. We'd likely have to rule out unlikely scenarios, such as that in which all of the previous voters for candidate X switched to someone other candidates while another set of voters switched their votes to candidate X.

***

Matthias also showed examples of hammock plots applied to different types of datasets.

The following chart displays data from course evaluations. Unlike the conclave example, the variables tied to questions on the survey are neither ordered nor sequential. Therefore, there is no natural sorting available for the vertical axes.

Hammockplot_evals

Time is a highly useful organizing element for this type of charts. Without such an organizing element, the designer manually customizes an order.

The vertical axes correspond to specific questions on the course evaluation. Students are aggregated into groups based on the "profile" of grades given for the whole set of questions. It's quite easy to see that opinions are most aligned on the "workload" question while most of the scores are skewed high.

Missing values are handled by plotting them as a new category at the bottom of each vertical axis.

This example is similar to the conclave example in that each survey response is categorical, one of five values (plus missing). Matthias also showed examples of hammock plots in which some or all of the variables are numeric data.

***

Some of you will see some resemblance of the hammock plot with various similar charts, such as the profile chart, the alluvial chart, the parallel coordinates chart, and Sankey diagrams. Matthias discussed all those as well.

Matthias has a book out called "Applied Statistical Learning" (link).

Also, there is a Python package for the hammock plot on github.


On the interpretability of log-scaled charts

A previous post featured the following chart showing stock returns over time:

Gelman_overnightreturns_tsla

Unbeknownst to readers,  the chart plots one thing but labels it something else.

The designer of the chart explains how to read the chart in a separate note, which I included in my previous post (link). It's a crucial piece of information. Before reading his explanation, I didn't realize the sleight of hand: he made a chart with one time series, then substituted the y-axis labels with another set of values.

As I explored this design choice further, I realize that it has been widely adopted in a common chart form, without fanfare. I'll get to it in due course.

***

Let's start our journey with as simple a chart as possible. Here is a line chart showing constant growth in the revenues of a small business:

Junkcharts_dollarchart_origvalues

For all the charts in this post, the horizontal axis depicts time (x = 0, 1, 2, ...). To simplify further, I describe discrete time steps although nothing changes if time is treated as continuous.

The vertical scale is in dollars, the original units. It's conventional to modify the scale to units of thousands of dollars, like this:

Junkcharts_dollarchart_thousands

No controversy arises if we treat these two charts as identical. Here I put them onto the same plot, using dual axes, emphasizing the one-to-one correspondence between the two scales.

Junkcharts_dollarchart_dualaxes

We can do the same thing for two time series that are linearly related. The following chart shows constant growth in temperature using both Celsius and Fahrenheit scales:

Junkcharts_tempchart_dualaxes

Here is the chart displaying only the Fahrenheit axis:

Junkcharts_tempchart_fahrenheit

This chart admits two interpretations: (A) it is a chart constructed using F values directly and (B) it is a chart created using C values, after which the axis labels were replaced by F values. Interpretation B implements the sleight of hand of the log-returns plot. The issue I'm wrestling with in this post is the utility of interpretation B.

Before we move to our next stop, let's stipulate that if we are exposed to that Fahrenheit-scaled chart, either interpretation can apply; readers can't tell them apart.

***

Next, we look at the following line chart:

Junkcharts_trendchart_y

Notice the vertical axis uses a log10 scale. We know it's a log scale because the equally-spaced tickmarks represent different jumps in value: the first jump is from 1 to 10, the next jump is from 10, not to 20, but to 100.

Just like before, I make a dual-axes version of the chart, putting the log Y values on the left axis, and the original Y values on the right axis.

Junkcharts_trendchart_dualaxes
By convention, we often print the original values as the axis labels of a log chart. Can you recognize that sleight of hand? We make the chart using the log values, after which we replace the log value labels with the original value labels. We adopt this graphical trick because humans don't think in log units, thus, the log value labels are less "interpretable".

As with the temperature chart, we will attempt to interpret the chart two ways. I've already covered interpretation B. For interpretation A, we regard the line chart as a straightforward plot of the values shown on the right axis (i.e., the original values). Alas, this viewpoint fails for the log chart.

If the original data are plotted directly, the chart should look like this:

Junkcharts_trendchart_y_origvalues

It's not a straight line but a curve.

What have I just shown? That, after using the sleight of hand, we cannot interpret the chart as if it were directly plotting the data expressed in the original scale.

To nail down this idea, we ask a basic question of any chart showing trendlines. What's the rate of change of Y?

Using the transformed log scale (left axis), we find that the rate of change is 1 unit per unit time. Using the original scale, the rate of change from t=1 to t=2 is (100-10)/1 = 90 units per unit time; from t=2 to t=3, it is (1000-100)/1 = 900 units per unit time. Even though the rate of change varies by time step, the log chart using original value labels sends the misleading picture that the rate of change is constant over time (thus a straight line). The decision to substitute the log value labels backfires!

This is one reason why I use log charts sparingly. (I do like them a lot for exploratory analyses, but I avoid using them as presentation graphics.) This issue of interpretation is why I dislike the sleight of hand used to produce those log stock returns charts, even if the designer offers a note of explanation.

Do we gain or lose "interpretability" when we substitute those axis labels?

***

Let's re-examine the dual-axes temperature chart, building on what we just learned.

Junkcharts_tempchart_dualaxes

The above chart suggests that whichever scale (axis) is chosen, we get the same line, with the same steepness. Thus, the rate of change is the same regardless of scale. This turns out to be an illusion.

Using the left axis, the slope of the line is 10 degrees Celsius per unit time. Using the right axis, the slope is 18 degrees Fahrenheit per unit time. 18 F is different from 10 C, thus, the slopes are not really the same! The rate of change of the temperature is given algebraically by the slope, and visually by the steepness of the line. Since two different slopes result in the same line steepness, the visualization conveys a lie.

This situation here is a bit better than that in the log chart. Here, in either scale, the rate of change is constant over time. Differentiating the temperature conversion formula, we find that the slope of the Fahrenheit line is always 9/5*the slope of the Celsius line. So a rate of 10 Celsius per unit time corresponds to 18 Fahrenheit per unit time.

What if the chart is presented with only the Fahrenheit axis labels although it is built using Celsius data? Since readers only see the F labels, the observed slope is in Fahrenheit units. Meanwhile, the chart creator uses Celsius units. This discrepancy is harmless for the temperature chart but it is egregious for the log chart. The underlying reason is the nonlinearity of the log transform - the slope of log Y vs time is not proportional to the slope of Y vs time; in fact, it depends on the value of Y.  

***

The log chart is a sacred cow of scientists, a symbol of our sophistication. Are they as potent as we'd think? In particular, when we put original data values on the log chart, are we making it more intepretable, or less?

 

P.S. I want to tie this discussion back to my Trifecta Checkup framework. The design decision to substitute those axis labels is an example of an act that moves the visual (V) away from the data (D). If the log units were printed, the visual makes sense; when the original units were dropped in, the visual no longer conveys features of the data - the reader must ignore what the eyes are seeing, and focus instead on the brain's perspective.


Logging a sleight of hand

Andrew puts up an interesting chart submitted by one of his readers (link):

Gelman_overnightreturns_tsla

Bruce Knuteson who created this chart is pursuing a theory that there is some fishy going on in the stock markets over night (i.e. between the close of one day and the open of the next day). He split the price data into two interleaving parts: the blue line represents returns overnight and the green line represents returns intraday (from open of one day to the close of the same day). In this example related to Tesla's stock, the overnight "return" is an eyepopping 36850% while the intraday "return" is -46%.

This is an example of an average masking interesting details in the data. One typically looks at the entire sequence of values at once, while this analysis breaks it up into two subsequences. I'll write more about the data analysis at a later point. This post will be purely about the visualization.

***

It turns out that while the chart looks like a standard time series, it isn't. Bruce wrote out the following essential explanation:

Gelman_overnightreturns

The chart can't be interpreted without first reading this note.

The left chart (a) is the standard time-series chart we're thinking about. It plots the relative cumulative percentage change in the value of the investment over time. Imagine one buys $1 of Apple stock on day 1. It shows the cumulative return on day X, expressed as a percent relative to the initial investment amount. As mentioned above, the data series was split into two: the intraday return series (green) is dwarfed by the overnight return series (blue), and is barely visiable hugging the horizontal axis.

Almost without thinking, a graphics designer applies a log transform to the vertical axis. This has the effect of "taming" the extreme values in the blue line. This is the key design change in the middle chart (b). The other change is to switch back to absolute values. The day 1 number is now $1 so the day X number shows the cumulative value of the investment on day X if one started with $1 on day 1.

There's a reason why I emphasized the log transform over the switch to absolute values. That's because the relationship between absolute and relative values here is a linear one. If y(t) is the absolute cumulative value of $1 at time t, then the percent change r(t) = 100(y(t) -1). (Note that y(0) = 1 by definition.)  The shape of the middle chart is primarily conditioned by the log transform.

In the right chart (c), which is the design that Bruce features in all his work, the visual elements of chart (b) are retained while he replaced the vertical axis labels with those from chart (a). In other words, the lines show the cumulative absolute values while the labels show the relative cumulative percent returns.

I left this note on Gelman's blog (corrected a mislabeling of the chart indices):

I'm interested in the the sleight of hand related to the plots, also tying this back to the recent post about log scales. In plot (b) (a) [middle of the panel], he transformed the data to show the cumulative value of the investment assuming one puts $1 in the stock on day 1. He applied a log scale on the vertical axis. This is fine. Then in plot (c) (b), he retained the chart but changed the vertical axis labels so instead of absolute value of the investment, he shows percent changes relative to the initial value.

Why didn't he just plot the relative percent changes? Let y(t) be the absolute values and r(t) = the percent change = 100*(y(t) -1) is a simple linear transformation of y(t). This is where the log transform creates problems! The y(t) series is guaranteed to be positive since hitting y(t) = 0 means the entire investment is lost. However, the r(t) series can hit negative values and also cross over zero many times over time. Thus, log r(t) is inoperable. The problem is using the log transform for data that are not always positive, and the sleight of hand does not fix it!

Just pick any day in which the absolute return fell below $1, e.g. the last day of the plot in which the absolute value of the investment was down to $0.80. In the middle plot (b), the value depicted is ln(0.8) = -0.22. Note that the plot is in log scale, so what is labeled as $1 is really ln(1) = 0. If we instead try to plot the relative percent changes, then the day 1 number should be ln(0) which is undefined while the last number should be ln(-20%) which is also undefined.

This is another example of something umcomfortable about using log scales which I pointed out in this post. It's this idea that when we do log plots, we can freely substitute axis labels which are not directly proportional to the actual labels. It's plotting one thing, and labelling it something else. These labels are then disconnected from the visual encoding. It's against the goal of visualizing data.

 


The message left the visual

The following chart showed up in Princeton Alumni Weekly, in a report about China's population:

Sciam_chinapop_19802020

This chart was one of several that appeared in a related Scientific American article.

The story itself is not surprising. As China develops, its birth rate declines, while the death rate also falls, thus, the population ages. The same story has played out in all advanced economies.

***

From a Trifecta Checkup perspective, this chart suffers from several problems.

The text annotation on the top right suggests what message the authors intended to deliver. Pointing to the group of people aged between 30 and 59 in 2020, they remarked that this large cohort would likely cause "a crisis" when they age. There would be fewer youngsters to support them.

Unfortunately, the data and visual elements of the chart do not align with this message. Instead of looking forward in time, the chart compares the 2020 population pyramid with that from 1980, looking back 40 years. The chart shows an insight from the data, just not the right one.

A major feature of a population pyramid is the split by gender. The trouble is gender isn't part of the story here.

In terms of age groups, the chart treats each subgroup "fairly". As a result, the reader isn't shown which of the 22 subgroups to focus on. There are really 44 subgroups if we count each gender separately, and 88 subgroups if we include the year split.

***

The following redesign traces the "crisis" subgroup (those who were 30-59 in 2020) both backwards and forwards.

Junkcharts_redo_chinapopulationpyramids

The gender split has been removed; here, the columns show the total population. Color is used to focus attention to one cohort as it moves through time.

Notice I switched up the sample times. I pulled the population data for 1990 and 2060 (from this website). The original design used the population data from 1980 instead of 1990. However, this choice is at odds with the message. People who were 30 in 2020 were not yet born in 1980! They started showing up in the 1990 dataset.

At the other end of the "crisis" cohort, the oldest (59 year old in 2020) would have deceased by 2100 as 59+80 = 139. Even the youngest (30 in 2020) would be 110 by 2100 so almost everyone in the pink section of the 2020 chart would have fallen off the right side of the chart by 2100.

These design decisions insert a gap between the visual and the message.

 

 


Aligning the visual and the message

Today's post is about work by Diane Barnhart, who is a product manager at Bloomberg, and is taking Ray Vella's infographics class at NYU. The class is given a chart from the Economist, as well as some data on GDP per capita in selected countries at the regional level. The students are asked to produce data visualization that explores the change in income inequality (as indicated by GDP per capita).

Here is Diane's work:

Diane Barnhart_Rich Get Richer

In this chart, the key measure is the GDP per capita of different regions in Germany relative to the national average GDP. Hamburg, for example, has a GDP per capita that was 80% above the national average in 2000 while Leipzig's GDP per capita was 30% below the national average in 2000. (This metric is a bit of a head scratcher, and forms the basis of the Economist chart.)

***

Diane made several insightful design choices.

The key insight of this graph is also one of the easiest to see. It's the narrowing of the range of possible values. In 2000, the top value is about 90% while the bottom is under -40%, making a range of 130%. In 2020, the range has narrowed to 90%, with the values falling between 60% and -30%. In other words, the gap between rich and poor regions in Germany has reduced over these two decades.

The chosen chart form makes this message come alive.

Diane divided the regions into three groups, mapped to the black, red and yellow colors of the German flag. Black are for those regions that have GDP per capita above the average; yellow for those regions with GDP per capita over 25% below the average.

Instead of applying color to individual lines that trace the GDP metric over time for each region, she divided the area between the lines into three, and painted them. This necessitates a definition of the boundary line between colored areas over time. I gathered that she classified the regions using the latest GDP data (2020) and then traced the GDP trend lines back in time. Other definitions are also possible.

The two-column data table shown on the right provides further details that aren't found in the data visualization. The table is nicely enhanced with colors. They represent an augmentation of the information in the main chart, not a repetition.

All in all, this is a delightful project, and worthy of a top grade!


Anti-encoding

Howie H., sometime contributor to our blog, found this chart in a doctor's office:

WhenToExpectAReturnCall_sm

Howie writes:

Among the multitude of data visualization sins here, I think the worst is that the chart *anti*-encodes the data; the longest wait time has the shortest arc!

While I waited I thought about a redesign.  Obviously a simple bar chart would work.  A properly encoded radial bar could work, or small multiple pie charts.  But I think the design brief here probably calls for a bit of responsible data art, as this is supposed to be an eye-catching poster.

I came up with a sort of bar superimposed on a calendar for reference.  To quickly draft the design it was easier to do small multiples, but maybe all three arrows could be placed on a two-week grid and the labels could be inside the arrows, or something like that.  It’s a very rough draft but I think it points toward a win-win of encoding the actual data while retaining the eye-catching poster-ness that I’m guessing was a design goal.

Here is his sketch:

JunkCharts-redo_howardh_WhenToExpectAReturnCall redesign sm

***

I found a couple of interesting ideas from Howie's re-design.

First, he tried to embody the concept of a week's wait by visual reference to a weekly calendar.

Second, in the third section, he wanted readers to experience "hardship"  by making them wrap their eyes to a second row.

He wanted the chart to be both accurate and eye-catching.

It's a nice attempt that will improve as he fiddles more with it.

***

Based on Howie's ideas, I came up with two sketches myself.

In the first sketch, instead of the arrows, I put numbers into the cells.

Junkcharts_redo_whentoexpectareturncall_1

In the second sketch, I emphasized eye-catching while sacrificing accuracy. It uses a spiral imagery, and I think it does a good job showing the extra pain of a week-long wait. Each trip around the circle represents 24 hours.

Junkcharts_redo_whentoexpectacall_2

The wait time is actually encoded in the traversal of angles, rather than the length of the spiral. I call this creation less than accurate because most readers will assume the spiral length to be the wait time, and thus misread the data.

Which one(s) do you like?