The Periodic Table, a challenge in information organization

Reader Chris P. points me to this article about the design of the Periodic Table. I then learned that 2019 is the “International Year of the Periodic Table,” according to the United Nations.

Here is the canonical design of the Periodic Table that science students are familiar with.


(Source: Wikipedia.)

The Periodic Table is an exercise of information organization and display. It's about adding structure to over 100 elements, so as to enhance comprehension and lookup. The canonical tabular design has columns and rows. The columns (Groups) impose a primary classification; the rows (Periods) provide a secondary classification. The elements also follow an aggregate order, which is traced by reading from top left to bottom right. The row structure makes clear the "periodicity" of the elements: the "period" of recurrence is not constant, tending to increase with the heavier elements at the bottom.

As with most complex datasets, these elements defy simple organization, due to a curse of dimensionality. The general goal is to put the similar elements closer together. Similarity can be defined in an infinite number of ways, such as chemical, physical or statistical properties. The canonical design, usually attributed to Russian chemist Mendeleev, attained its status because the community accepted his organizing principles, that is, his definitions of similarity (subsequently modified).


Of interest, there is a list of unsettled issues. According to Wikipedia, the most common arguments concern:

  • Hydrogen: typically shown as a member of Group 1 (first column), some argue that it doesn’t belong there since it is a gas not a metal. It is sometimes placed in Group 17 (halogens), where it forms a nice “triad” with fluorine and chlorine. Other designers just float hydrogen up top.
  • Helium: typically shown as a member of Group 18 (rightmost column), the  halogens noble gases, it may also be placed in Group 2.
  • Mercury: usually found in Group 12, some argue that it is not a metal like cadmium and zinc.
  • Group 3: other than the first two elements , there are various voices about how to place the other elements in Group 3. In particular, the pairs of lanthanum / actinium and lutetium / lawrencium are sometimes shown in the main table, sometimes shown in the ‘f-orbital’ sub-table usually placed below the main table.


Over the years, there have been numerous attempts to re-design the Periodic table. Some of these are featured in the article that Chris sent me (link).

I checked how these alternative designs deal with those unsettled issues. The short answer is they don't settle the issues.

Wide Table (Janet)

The key change is to remove the separation between the main table and the f-orbital (pink) section shown below, as a "footnote". This change clarifies the periodicity of the elements, especially the elongating periods as one moves down the table. This form is also called "long step".


As a tradeoff, this table requires more space and has an awkward aspect ratio.

In this version of the wide table, the designer chooses to stack lutetium / lawrencium in Group 3 as part of the main table. Other versions place lanthanum / actinium in Group 3 as part of the main table. There are even versions that leave Group 3 with two elements.

Hydrogen, helium and mercury retain their conventional positions.


Spiral Design (Hyde)

There are many attempts at spiral designs. Here is one I found on this tumblr:


The spiral leverages the correspondence between periodic and circular. It is visually more pleasing than a tabular arrangement. But there is a tradeoff. Because of the increasing "diameter" from inner to outer rings, the inner elements are visually constrained compared to the outer ones.

In these spiral diagrams, the designer solves the aspect-ratio problem by creating local loops, sometimes called peninsulas. This is analogous to the footnote table solution, and visually distorts the longer periodicity of the heavier elements.

For Hyde's diagram, hydrogen is floated, helium is assigned to Group 2, and mercury stays in Group 12.



I also found this design on the same tumblr, but unattributed. It may have come from Life magazine.


It's a variant of the spiral. Instead of peninsulas, the designer squeezes the f-orbital section under Group 3, so this is analogous to the wide table solution.

The circular diagrams convey the sense of periodic return but the wide table displays the magnitudes more clearly.

This designer places hydrogen in group 18 forming a triad with fluorine and chlorine. Helium is in Group 17 and mercury in the usual Group 12 .


Cartogram (Sheehan)

This version is different.


The designer chooses a statistical property (abundance) as the primary organizing principle. The key insight is that the lighter elements in the top few rows are generally more abundant - thus more important in a sense. The cartogram reveals a key weakness of the spiral diagrams that draw the reader's attention to the outer (heavier) elements.

Because of the distorted shapes, the cartogram form obscures much of the other data. In terms of the unsettled issues, hydrogen and helium are placed in Groups 1 and 2. Mercury is in Group 12. Group 3 is squeezed inside the main table rather than shown below.



The centerpiece of the article Chris sent me is a network graph.


This is a complete redesign, de-emphasizing the periodicity. It's a result of radically changing the definition of similarity between elements. One barrier when introducing entirely new displays is the tendency of readers to expect the familiar.


I found the following articles useful when researching this post:

The Conversation

Royal Chemistry Society


GDPR: justice for data visualization

Reader LG found the following chart, tweeted by @EU_Justice.


This chart is a part of a larger infographic, which is found here.

The following points out a few issues with this effort:


The time axis is quite embarrassing. The first six months or so are squeezed into less than half the axis while the distance between Nov and Dec is not the same as that between Dec and Jan. So the slope of each line segment is what the designer wants it to be!

The straight edges of the area chart imply that there were only three data points, with straight lines drawn between each measurement. Sadly, the month labels are not aligned to the data on the line.

Lastly, the dots between May and November intended to facilitate reading this chart backfire. There are 6 dots dividing the May-Nov segment when there should only be five.


The chart looks like this when not distorted:



The French takes back cinema but can you see it?

I like independent cinema, and here are three French films that come to mind as I write this post: Delicatessen, The Class (Entre les murs), and 8 Women (8 femmes). 

The French people are taking back cinema. Even though they purchased more tickets to U.S. movies than French movies, the gap has been narrowing in the last two decades. How do I know? It's the subject of this infographic


How do I know? That's not easy to say, given how complicated this infographic is. Here is a zoomed-in view of the top of the chart:



You've got the slice of orange, which doubles as the imagery of a film roll. The chart uses five legend items to explain the two layers of data. The solid donut chart presents the mix of ticket sales by country of origin, comparing U.S. movies, French movies, and "others". Then, there are two thin arcs showing the mix of movies by country of origin. 

The donut chart has an usual feature. Typically, the data are coded in the angles at the donut's center. Here, the data are coded twice: once at the center, and again in the width of the ring. This is a self-defeating feature because it draws even more attention to the area of the donut slices except that the areas are highly distorted. If the ratios of the areas are accurate when all three pieces have the same width, then varying those widths causes the ratios to shift from the correct ones!

The best thing about this chart is found in the little blue star, which adds context to the statistics. The 61% number is unusually high, which demands an explanation. The designer tells us it's due to the popularity of The Lion King.


The one donut is for the year 1994. The infographic actually shows an entire time series from 1994 to 2014.

The design is most unusual. The years 1994, 1999, 2004, 2009, 2014 receive special attention. The in-between years are split into two pairs, shrunk, and placed alternately to the right and left of the highlighted years. So your eyes are asked to zig-zag down the page in order to understand the trend. 

To see the change of U.S. movie ticket sales over time, you have to estimate the sizes of the red-orange donut slices from one pie chart to another. 

Here is an alternative visual design that brings out the two messages in this data: that French movie-goers are increasingly preferring French movies, and that U.S. movies no longer account for the majority of ticket sales.


A long-term linear trend exists for both U.S. and French ticket sales. The "outlier" values are highlighted and explained by the blockbuster that drove them.



1. You can register for the free seminar in Lyon here. To register for live streaming, go here.
2. Thanks Carla Paquet at JMP for help translating from French.

The tech world in which everyone is below average

Laura pointed me to an infographic about tech worker salaries in major tech hubs (link).

What's wrong with this map?


The box "Global average" is doubly false. It is not global, and it is not the average!

The only non-American cities included in this survey are Toronto, Paris and London.

The only city with average salary above the "Global average" is San Francisco Bay Area. Since the Bay Area does not outweigh all other cities combined in the number of tech workers, it is impossible to get an average of $135,000.


Here is the second chart.

What's wrong with these lines?


This chart frustrates the reader's expectations. The reader interprets it as a simple line chart, based on three strong hints:

  • time along the horizontal axis
  • data labels show dollar units
  • lines linking time

Each line seems to show the trend of average tech worker salary, in dollar units.

However, that isn't the designer's intention. Let's zoom in on Chicago and Denver:


The number $112,000 (Denver) sits below the number $107,000 (Chicago). It appears that each chart has its own scale. But that's not the case either.

For a small-multiples setup, we expect all charts should use the same scale. Even though the data labels are absolute dollar amounts, the vertical axis is on a relative scale (percent change). To make things even more complicated, the percent change is computed relative to the minimum of the three annual values, no matter which year it occurs.


That's why $106,000 (Chicago) is at the same level as $112,000 (Denver). Those are the minimum values in the respective time series. As shown above, these line charts are easier to understand if the axis is displayed in its true units of percent change.

The choice of using the minimum value as the reference level interferes with comparing one city to the next. For Chicago, the line chart tells us 2015 is about 2 percent above 2016 while 2017 is 6 percent above. For Denver, the line chart tells us that 2016 is about 2 percent above the 2015 and 2017 values. Now what's the message again?

Here I index all lines to the earliest year.


In a Trifecta Checkup analysis (link), I'd be suspicious of the data. Did tech salaries in London really drop by 15-20 percent in the last three years?



Diverging paths for rich and poor, infographically

Ray Vella (link) asked me to comment on a chart about regional wealth distribution, which I wrote about here. He also asked students in his NYU infographics class to create their own versions.

This effort caught my eye:


This work is creative, and I like the concept of using two staircases to illustrate the diverging fortunes of the two groups. This is worlds away from the original Economist chart.

The infographic does have a serious problem. In one of my dataviz talks, I talk about three qualifications of work called "data visualization." The first qualification is that the data visualization has to display the data. This is an example of an infographic that is invariant to the data.

Is it possible to salvage the concept? I tried. Here is an idea:


I abandoned the time axis so the data plotted are only for 2015, and the countries are shown horizontally from most to least equal. I'm sure there are ways to do it even better.

Infographics can be done while respecting the data. Ray is one of the designers who appreciate this. And thanks Ray for letting me blog about this.




Egregious chart brings back bad memories

My friend Alberto Cairo said it best: if you see bullshit, say "bullshit!"

He was very incensed by this egregious "infographic": (link to his post)


Emily Schuch provided a re-visualization:


The new version provides a much richer story of how Planned Parenthood has shifted priorities over the last few years.

It also exposed what the AUL (American United for Life) organization distorted the story.

The designer extracted only two of the lines, thus readers do not see that the category of services that has really replaced the loss of cancer screening was STI/STD testing and treatment. This is a bit ironic given the other story that has circulated this week - the big jump in STD among Americans (link).

Then, the designer placed the two lines on dual axes, which is a dead giveaway that something awful lies beneath.

Further, this designer dumped the data from intervening years, and drew a straight line from the first to the last year. The straight arrow misleads by pretending that there has been a linear trend, and that it would go on forever.

But the masterstroke is in the treatment of the axes. Let's look at the axes, one at a time:

The horizontal axis: Let me recap. The designer dumped all but the starting and ending years, and drew a straight line between the endpoints. While the data are no longer there, the axis labels are retained. So, our attention is drawn to an area of the chart that is void of data.

The vertical axes: Let me recap. The designer has two series of data with the same units (number of people served) and decided to plot each series on a different scale with dual axes. But readers are not supposed to notice the scales, so they do not show up on the chart.

To summarize, where there are no data, we have a set of functionless labels; where labels are needed to differentiate the scales, we have no axes.


This is a tried-and-true tactic employed by propagandists. The egregious chart brings back some bad memories.

Here is a long-ago post on dual axes.

Here is Thomas Friedman's use of the same trick.

Bewildering baseball math

Over Twitter, someone asked me about this chart:


It's called the MLB pipeline. The text at the top helpfully tells us what the chart is about: how the playoff teams in baseball are built. That's the good part.

It then took me half a day to understand what is going on below. There are four ways for a player to be on a team: homegrown, trades and free agents, wherein homegrown includes drafted players or international players.

Each row is a type of player. You can look up which teams have exactly X players of a specific type. It gets harder if you want to know how many players team Y has of a given type. It is even harder if you don't know the logos of every team (e.g. Toronto Blue Jays).

Some fishy business is going on with the threesomes and foursomes. Here is the red threesome:


Didn't know baseball employs half a player. The green section has a different way to play threesomes:


The blue section takes inspiration from both and shows us a foursome:


I was stuck literally in the middle for quite a while:


Eventually, I realized that this is a summary of the first two sections on the page. I still don't understand why there is no gap between 11 and 14 but then the 14 and 15 arrows are twice as large as 9, 10 and 11 even though every arrow contains exactly one team.


The biggest problem in the above chart is the hidden base: each team's roster has a total of 25 players.

Here is a different view of the data:


With this chart, I want to emphasize two points: first, addressing the most interesting question of which team(s) emphasize which particular player acquisition tactic; second, providing the proper reference level to interpret the data.

Regarding the vertical, reference lines: take the top left chart about players arriving through trade. If every team equally emphasizes this tactic, then each team should have the same number of traded players on the 25-person roster. This would mean every team has approximately 11 traded players. This is clearly not the case. Several teams, especially Cubs and Blue Jays, utilized trades more often than teams like Mets and Royals.



Shortchanging and subverting the message

Reader Michael N. calls this an "unusual" marketing bar chart--because the designer distorted the data in a way that weakens, rather than strengthen, the story!


The infographic is pitching savings if the family switches to Republic. The savings is about 70% off and yet the height of the $40 bar is more than 50% of the $150 bar. 

The entire infographic is a case of misplaced emphases. (Click here to see original.)

Ranked by size of font from largest to smallest, this poster gives us the following information:

Average cellphone bill for a family of four

Penetration rate of Wifi

Price comparison between average plan and Republic Wireless plan

Median national household income

Cellphone bill growth versus inflation rate

Cost of wireless data split from the bill

Growth of global Wifi hotspots

Actual amount of wireless data used by cellphone users


The intended message is families are paying for a lot of unused wireless data, and Republic Wireless has a Wifi solution to save you 70% of their bill.