« September 2021 | Main | November 2021 »

Surging gas prices

A reader finds this chart hard to parse:


The chart shows the trend in gas prices in New York in the past two years.

This is a case in which the simple line chart works very well.


I added annotations as the reasons behind the decline and rise in prices are reasonably clear. 

One should be careful when formatting dates. The legend of the original chart looks like this:


In the U.S., dates typically use a M/D/Y format. The above dates are ambiguous. "Aug 19" can be August 19th or August, xx19.

Asymmetry and orientation

An author in Significance claims that a single season of Premier League football without live spectators is enough to prove that the so-called home field advantage is really a live-spectator advantage.

The following chart depicts the data going back many seasons:


I find this bar chart challenging.

It plots the ratio of home wins to away wins using an odds scale, which is not intuitive. The odds scale (probability of success divided by probability of failure) runs from 0 to positive infinity, with 1 being a special value indicating equal odds. But all the values for which away wins exceed home wins are squeezed into the interval between 0 and 1 while the values for which home wins exceed away wins are laid out between 1 and infinity. So it's an inherently asymmetric graphic for a symmetric formula.

The section labeled "more away wins than home wins" are filled with red bars for all those seasons with positive home field advantage while the most recent season, the outlier, has a shorter bar in that section than the rest.

Here's an alternative view:


I have incorporated dual axes here - but both axes are different only by scaling. There are 380 games in a Premier League season so the percentage scale is just a re-expression of the counts.



The gift of small edits and subtraction

While making the chart on fertility rates (link), I came across a problem that pops up quite often, and is  ignored by most software programs.

Here is an earlier version of the chart I later discarded:


Compare this to the version I published in the blog post:


Aside from adding the chart title, there is one major change. I removed the empty plots from the grid. This is a visualization trick that should be called adding by subtracting. The empty scaffolding on the first chart increases our cognitive load without yielding any benefit. The whitespace brings out the message that only countries in Asia and Africa have fertility rates above 5.0. 

This is a small edit. But small edits accumulate and deliver a big impact. Bear this in mind the next time you make a chart.



(1) You'd have to use a lower-level coding language to execute this small edit. Most software programs are quite rigid when it comes to making small-multiples (facet) charts.

(2) If there is a next iteration, I'd reverse the Asia and Oceania rows.


Visualizing fertility rates around the globe

The following chart dropped on my Twitter feed.


It's an ambitious chart that tries to do a lot. The underlying data set contains fertility rate data from over 200 countries over 20 years.

The basic chart form is a column chart that is curled up into a ball. The column chart is given colors that map to continents. All countries are grouped into five continents. The column chart can only take a single data series, so the 2019 fertility rate is chosen.

Beyond this basic setup, the designer embellishes the chart with a trove of information. Here's a close up:


The first number is the 2019 fertility rate, which means all the data encoded into the columns are also printed on the chart itself. Then, the flag of each country forms the next ring. Then, the name of the country. Finally, in brackets, the percent change in fertility rate between 2000 and 2019.

That is not all. Some contextual information are injected in those arrows that connect the columns to the data labels. A green arrow indicates that the fertility rate is trending lower - which is the case in most countries around the world. Once in a while, a purple arrow pops up. In the above excerpt, Seychelles gets a purple arrow because this island nation has increased the fertility rate from 2000 to 2019.

Also hiding in the background are several dashed rings. I think only the one that partially overlaps with the column chart contains any information - the other rings are inserted for an artistic reason. To decipher this dashed ring, we must look at the inset in the top left corner. We learn that the value of 2.1 children per woman is known as the replacement fertility rate. So it's also possible to assess whether each country is above or below the replacement fertility rate threshold.


[I'm presuming that this replacement threshold is about the births necessary to avoid a population decline. If that's the case, then comparing each country's fertility rate to a global fertility rate threshold is too simplistic because fertility is only one of several key factors driving a country's population growth. A more sophisticated model should generate country-level thresholds.]


Data graphics serve many functions. This chart works well as an embellished data table. It does take some time to find a specific country because the columns have been sorted by decreasing 2019 fertility rate but once we locate the column, all the other data fields are clearly laid out.

As a generator of data insights, this chart is less effective. The main insight I obtained from it is a rough ranking of continents, with African countries predominantly having higher fertility rates, followed by Asia and Oceania, then Americas, and finally, Europe which has the lowest fertility rates. If this is the key message, a standard choropleth map brings it out more directly.


Here is a small-multiples rendering of the fertility dataset. I chose 1999 values instead of 2000 to make a complete two-decade view.


The columns represent a grouping of countries based on their 1999 fertility rates. The left column contains countries with the lowest number of births per woman, and the fertility rate increases left to right - both within an individual plot and in the grid.

If you're wondering, the hidden vertical axis sorts the countries by their 1999 rank. The lighter colors are 1999 values while the darker colors are 2019 values. For most countries the dots are shifting left over the 20 years. There are some exceptions. I have labeled several of these exceptions (e.g. Kazakhstan and Mongolia), and rendered them in italic.