Comparing chance of death of coronavirus and flu

The COVID-19 charts are proving one thing. When the topic of a dataviz is timely and impactful, readers will study the graphics and ask questions. I've been sent some of these charts lately, and will be featuring them here.

A former student saw this chart from Business Insider (link) and didn't like it.

Businesinsider_coronavirus_flu_compare

My initial reaction was generally positive. It's clear the chart addresses a comparison between death rates of the flu and COVID19, an important current question. The side-by-side panel is effective at allowing such a comparison. The column charts look decent, and there aren't excessive gridlines.

Sure, one sees a few simple design fixes, like removing the vertical axis altogether (since the entire dataset has already been printed). I'd also un-slant the age labels.

***

I'd like to discuss some subtler improvements.

A primary challenge is dealing with the different definitions of age groups across the two datasets. While the side-by-side column charts prompt readers to go left-right, right-left in comparing death rates, it's not easy to identify which column to compare to which. This is not fixable in the datasets because the organizations that compile them define their own age groups.

Also, I prefer to superimpose the death rates on the same chart, using something like a dot plot rather than a column chart. This makes the comparison even easier.

Here is a revised visualization:

Redo_businessinsider_covid19fatalitybyage

The contents of this chart raise several challenges to public health officials. Clearly, hospital resources should be preferentially offered to older patients. But young people could be spreading the virus among the community.

Caution is advised as the data for COVID19 suffers from many types of inaccuracies, as outlined here.


Gazing at petals

Reader Murphy pointed me to the following infographic developed by Altmetric to explain their analytics of citations of journal papers. These metrics are alternative in that they arise from non-academic media sources, such as news outlets, blogs, twitter, and reddit.

The key graphic is the petal diagram with a number in the middle.

Altmetric_tetanus

I have a hard time thinking of this object as “data visualization”. Data visualization should visualize the data. Here, the connection between the data and the visual design is tenuous.

There are eight petals arranged around the circle. The legend below the diagram maps the color of each petal to a source of data. Red, for example, represents mentions in news outlets, and green represents mentions in videos.

Each petal is the same size, even though the counts given below differ. So, the petals are like a duplicative legend.

The order of the colors around the circle does not align with its order in the table below, for a mysterious reason.

Then comes another puzzle. The bluish-gray petal appears three times in the diagram. This color is mapped to tweets. Does the number of petals represent the much higher counts of tweets compared to other mentions?

To confirm, I pulled up the graphic for a different paper.

Altmetric_worldwidedeclineofentomofauna

Here, each petal has a different color. Eight petals, eight colors. The count of tweets is still much larger than the frequencies of the other sources. So, the rule of construction appears to be one petal for each relevant data source, and if the total number of data sources fall below eight, then let Twitter claim all the unclaimed petals.

A third sample paper confirms this rule:

Altmetric_dnananodevices

None of the places we were hoping to find data – size of petals, color of petals, number of petals – actually contain any data. Anything the reader wants to learn can be directly read. The “score” that reflects the aggregate “importance” of the corresponding paper is found at the center of the circle. The legend provides the raw data.

***

Some years ago, one of my NYU students worked on a project relating to paper citations. He eventually presented the work at a conference. I featured it previously.

Michaelbales_citationimpact

Notice how the visual design provides context for interpretation – by placing each paper/researcher among its peers, and by using a relative scale (percentiles).

***

I’m ignoring the D corner of the Trifecta Checkup in this post. For any visualization to be meaningful, the data must be meaningful. The type of counting used by Altmetric treats every tweet, every mention, etc. as a tally, making everything worth the same. A mention on CNN counts as much as a mention by a pseudonymous redditor. A pan is the same as a rave. Let’s not forget the fake data menace (link), which  affects all performance metrics.


How to read this cost-benefit chart, and why it is so confusing

Long-time reader Antonio R. found today's chart hard to follow, and he isn't alone. It took two of us multiple emails and some Web searching before we think we "got it".

Ar_submit_Fig-3-2-The-policy-cost-curve-525

 

Antonio first encountered the chart in a book review (link) of Hal Harvey et. al, Designing Climate Solutions. It addresses the general topic of costs and benefits of various programs to abate CO2 emissions. The reviewer praised the "wealth of graphics [in the book] which present complex information in visually effective formats." He presented the above chart as evidence, and described its function as:

policy-makers can focus on the areas which make the most difference in emissions, while also being mindful of the cost issues that can be so important in getting political buy-in.

(This description is much more informative than the original chart title, which states "The policy cost curve shows the cost-effectiveness and emission reduction potential of different policies.")

Spend a little time with the chart now before you read the discussion below.

Warning: this is a long read but well worth it.

 

***

 

If your experience is anything like ours, scraps of information flew at you from different parts of the chart, and you had a hard time piecing together a story.

What are the reasons why this data graphic is so confusing?

Everyone recognizes that this is a column chart. For a column chart, we interpret the heights of the columns so we look first at the vertical axis. The axis title informs us that the height represents "cost effectiveness" measured in dollars per million metric tons of CO2. In a cost-benefit sense, that appears to mean the cost to society of obtaining the benefit of reducing CO2 by a given amount.

That's how far I went before hitting the first roadblock.

For environmental policies, opponents frequently object to the high price of implementation. For example, we can't have higher fuel efficiency in cars because it would raise the price of gasoline too much. Asking about cost-effectiveness makes sense: a cost-benefit trade-off analysis encapsulates the something-for-something principle. What doesn't follow is that the vertical scale sinks far into the negative. The chart depicts the majority of the emissions abatement programs as having negative cost effectiveness.

What does it mean to be negatively cost-effective? Does it mean society saves money (makes a profit) while also reducing CO2 emissions? Wouldn't those policies - more than half of the programs shown - be slam dunks? Who can object to programs that improve the environment at no cost?

I tabled that thought, and proceeded to the horizontal axis.

I noticed that this isn't a standard column chart, in which the width of the columns is fixed and uneventful. Here, the widths of the columns are varying.

***

In the meantime, my eyes are distracted by the constellation of text labels. The viewing area of this column chart is occupied - at least 50% - by text. These labels tell me that each column represents a program to reduce CO2 emissions.

The dominance of text labels is a feature of this design. For a conventional column chart, the labels are situated below each column. Since the width does not usually carry any data, we tend to keep the columns narrow - Tufte, ever the minimalist, has even advocated reducing columns to vertical lines. That leaves insufficient room for long labels. Have you noticed that government programs hold long titles? It's tough to capture even the outline of a program with fewer than three big words, e.g. "Renewable Portfolio Standard" (what?).

The design solution here is to let the column labels run horizontally. So the graphical element for each program is a vertical column coupled with a horizontal label that invades the territories of the next few programs. Like this:

Redo_fueleconomystandardscars

The horror of this design constraint is fully realized in the following chart, a similar design produced for the state of Oregon (lifted from the Plan Washington webpage listed as a resource below):

Figure 2 oregon greenhouse

In a re-design, horizontal labeling should be a priority.

 

***

Realizing that I've been distracted by the text labels, back to the horizontal axis I went.

This is where I encountered the next roadblock.

The axis title says "Average Annual Emissions Abatement" measured in millions metric tons. The unit matches the second part of the vertical scale, which is comforting. But how does one reconcile the widths of columns with a continuous scale? I was expecting each program to have a projected annual abatement benefit, and those would fall as dots on a line, like this:

Redo_abatement_benefit_dotplot

Instead, we have line segments sitting on a line, like this:

Redo_abatement_benefit_bars_end2end_annuallabel

Think of these bars as the bottom edges of the columns. These line segments can be better compared to each other if structured as a bar chart:

Redo_abatement_benefit_bars

Instead, the design arranges these lines end-to-end.

To unravel this mystery, we go back to the objective of the chart, as announced by the book reviewer. Here it is again:

policy-makers can focus on the areas which make the most difference in emissions, while also being mindful of the cost issues that can be so important in getting political buy-in.

The primary goal of the chart is a decision-making tool for policy-makers who are evaluating programs. Each program has a cost and also a benefit. The cost is shown on the vertical axis and the benefit is shown on the horizontal. The decision-maker will select some subset of these programs based on the cost-benefit analysis. That subset of programs will have a projected total expected benefit (CO2 abatement) and a projected total cost.

By stacking the line segments end to end on top of the horizontal axis, the chart designer elevates the task of computing the total benefits of a subset of programs, relative to the task of learning the benefits of any individual program. Thus, the horizontal axis is better labeled "Cumulative annual emissions abatement".

 

Look at that axis again. Imagine you are required to learn the specific benefit of program titled "Fuel Economy Standards: Cars & SUVs".  

Redo_abatement_benefit_bars_end2end_cumlabel

This is impossible to do without pulling out a ruler and a calculator. What the axis labels do tell us is that if all the programs to the left of Fuel Economy Standards: Cars & SUVs were adopted, the cumulative benefits would be 285 million metric tons of CO2 per year. And if Fuel Economy Standards: Cars & SUVs were also implemented, the cumulative benefits would rise to 375 million metric tons.

***

At long last, we have arrived at a reasonable interpretation of the cost-benefit chart.

Policy-makers are considering throwing their support behind specific programs aimed at abating CO2 emissions. Different organizations have come up with different ways to achieve this goal. This goal may even have specific benchmarks; the government may have committed to an international agreement, for example, to reduce emissions by some set amount by 2030. Each candidate abatement program is evaluated on both cost and benefit dimensions. Benefit is given by the amount of CO2 abated. Cost is measured as a "marginal cost," the amount of dollars required to achieve each million metric ton of abatement.

This "marginal abatement cost curve" aids the decision-making. It lines up the programs from the most cost-effective to the least cost-effective. The decision-maker is presumed to prefer a more cost-effective program than a less cost-effective program. The chart answers the following question: for any given subset of programs (so long as we select them left to right contiguously), we can read off the cumulative amount of CO2 abated.

***

There are still more limitations of the chart design.

  • We can't directly read off the cumulative cost of the selected subset of programs because the vertical axis is not cumulative. The cumulative cost turns out to be the total area of all the columns that correspond to the selected programs. (Area is height x width, which is cost per benefit multiplied by benefit, which leaves us with the cost.) Unfortunately, it takes rulers and calculators to compute this total area.

  • We have presumed that policy-makers will make the Go-No-go decision based on cost effectiveness alone. This point of view has already been contradicted. Remember the mystery around negatively cost-effective programs - their existence shows that some programs are stalled even when they reduce emissions in addition to making money!

  • Since many, if not most, programs have negative cost-effectiveness (by the way they measured it), I'd flip the metric over and call it profitability (or return on investment). Doing so removes another barrier to our understanding. With the current cost-effectiveness metric, policy-makers are selecting the "negative" programs before the "positive" programs. It makes more sense to select the "positive" programs before the "negative" ones!

***

In a Trifecta Checkup (guide), I rate this chart Type V. The chart has a great purpose, and the design reveals a keen sense of the decision-making process. It's not a data dump for sure. In addition, an impressive amount of data gathering and analysis - and synthesis - went into preparing the two data series required to construct the chart. (Sure, for something so subjective and speculative, the analysis methodology will inevitably be challenged by wonks.) Those two data series are reasonable measures for the stated purpose of the chart.

The chart form, though, has various shortcomings, as shown here.  

***

In our email exchange, Antonio and I found the Plan Washington website useful. This is where we learned that this chart is called the marginal abatement cost curve.

Also, the consulting firm McKinsey is responsible for popularizing this chart form. They have published this long report that explains even more of the analysis behind constructing this chart, for those who want further details.


Choosing between individuals and aggregates

Friend/reader Thomas B. alerted me to this paper that describes some of the key chart forms used by cancer researchers.

It strikes me that many of the "new" charts plot granular data at the individual level. This heatmap showing gene expressions show one column per patient:

Jnci_genemap

This so-called swimmer plot shows one bar per patient:

Jnci_swimlanes

This spider plot shows the progression of individual patients over time. Key events are marked with symbols.

Jnci_spaghetti

These chart forms are distinguished from other ones that plot aggregated statistics: statistical averages, medians, subgroup averages, and so on.

One obvious limitation of such charts is their lack of scalability. The number of patients, the variability of the metric, and the timing of trends all drive up the amount of messiness.

I am left wondering what Question is being addressed by these plots. If we are concerned about treatment of an individual patient, then showing each line by itself would be clearer. If we are interested in the average trends of patients, then a chart that plots the overall average, or subgroup averages would be more accurate. If the interpretation of the individual's trend requires comparing with similar patients, then showing that individual's line against the subgroup average would be preferred.

When shown these charts of individual lines, readers are tempted to play the statistician - without using appropriate tools! Readers draw aggregate conclusions, performing the aggregation in their heads.

The authors of the paper note: "Spider plots only provide good visual qualitative assessment but do not allow for formal statistical inference." I agree with the second part. The first part is a fallacy - if the visual qualitative assessment is good enough, then no formal inference is necessary! The same argument is often made when people say they don't need advanced analysis because their simple analysis is "directionally accurate". When is something "directionally inaccurate"? How would one know?

Reference: Chia, Gedye, et. al., "Current and Evolving Methods to Visualize Biological Data in Cancer Research", JNCI, 2016, 108(8). (link)

***

Meteoreologists, whom I featured in the previous post, also have their own spider-like chart for hurricanes. They call it a spaghetti map:

Dorian_spaghetti

Compare this to the "cone of uncertainty" map that was featured in the prior post:

AL052019_5day_cone_with_line_and_wind

These two charts build upon the same dataset. The cone map, as we discussed, shows the range of probable paths of the storm center, based on all simulations of all acceptable models for projection. The spaghetti map shows selected individual simulations. Each line is the most likely trajectory of the storm center as predicted by a single simulation from a single model.

The problem is that each predictive model type has its own historical accuracy (known as "skill"), and so the lines embody different levels of importance. Further, it's not immediately clear if all possible lines are drawn so any reader making conclusions of, say, the envelope containing x percent of these lines is likely to be fooled. Eyeballing the "cone" that contains x percent of the lines is not trivial either. We tend to naturally drift toward aggregate statistical conclusions without the benefit of appropriate tools.

Plots of individuals should be used to address the specific problem of assessing individuals.


As Dorian confounds meteorologists, we keep our minds clear on hurricane graphics, and discover correlation as our friend

As Hurricane Dorian threatens the southeastern coast of the U.S., forecasters are fretting about the lack of consensus among various predictive models used to predict the storm’s trajectory. The uncertainty of these models, as reflected in graphical displays, has been a controversial issue in the visualization community for some time.

Let’s start by reviewing a visual design that has captured meteorologists in recent years, something known as the cone map.

Charley_oldconemap

If asked to explain this map, most of us trace a line through the middle of the cone understood to be the center of the storm, the “cone” as the areas near the storm center that are affected, and the warmer colors (red, orange) as indicating higher levels of impact. [Note: We will  design for this type of map circa 2000s.]

The above interpretation is complete, and feasible. Nevertheless, the data used to make the map are forward-looking, not historical. It is still possible to stick to the same interpretation by substituting historical measurement of impact with its projection. As such, the “warmer” regions are projected to suffer worse damage from the storm than the “cooler” regions (yellow).

After I replace the text that was removed from the map (see below), you may notice the color legend, which discloses that the colors on the map encode probabilities, not storm intensity. The text further explains that the chart shows the most probable path of the center of the storm – while the coloring shows the probability that the storm center will reach specific areas.

Charley_oldconemap

***

When reading a data graphic, we rarely first look for text about how to read the chart. In the case of the cone map, those who didn’t seek out the instructions may form one of these misunderstandings:

  1. For someone living in the yellow-shaded areas, the map does not say that the impact of the storm is projected to be lighter; it’s that the center of the storm has a lower chance of passing right through. If, however, the storm does pay a visit, the intensity of the winds will reach hurricane grade.
  2. For someone living outside the cone, the map does not say that the storm will definitely bypass you; it’s that the chance of a direct hit is below the threshold needed to show up on the cone map. Thee threshold is set to attain 66% accurate. The actual paths of storms are expected to stay inside the cone two out of three times.

Adding to the confusion, other designers have produced cone maps in which color is encoding projections of wind speeds. Here is the one for Dorian.

AL052019_wind_probs_64_F120

This map displays essentially what we thought the first cone map was showing.

One way to differentiate the two maps is to roll time forward, and imagine what the maps should look like after the storm has passed through. In the wind-speed map (shown below right), we will see a cone of damage, with warmer colors indicating regions that experienced stronger winds.

Projectedactualwinds_irma

In the storm-center map (below right), we should see a single curve, showing the exact trajectory of the center of the storm. In other words, the cone of uncertainty dissipates over time, just like the storm itself.

Projectedactualstormcenter_irma

 

After scientists learned that readers were misinterpreting the cone maps, they started to issue warnings, and also re-designed the cone map. The cone map now comes with a black-box health warning right up top. Also, in the storm-center cone map, color is no longer used. The National Hurricane Center even made a youtube pointing out the dos and donts of using the cone map.

AL052019_5day_cone_with_line_and_wind

***

The conclusion drawn from misreading the cone map isn’t as devastating as it’s made out to be. This is because the two issues are correlated. Since wind speeds are likely to be stronger nearer to the center of the storm, if one lives in a region that has a low chance of being a direct hit, then that region is also likely to experience lower average wind speeds than those nearer to the projected center of the storm’s path.

Alberto Cairo has written often about these maps, and in his upcoming book, How Charts Lie, there is a nice section addressing his work with colleagues at the University of Miami on improving public understanding of these hurricane graphics. I highly recommended Cairo’s book here.

P.S. [9/5/2019] Alberto also put out a post about the hurricane cone map.

 

 

 


Too much of a good thing

Several of us discussed this data visualization over twitter last week. The dataviz by Aero Data Lab is called “A Bird’s Eye View of Pharmaceutical Research and Development”. There is a separate discussion on STAT News.

Here is the top section of the chart:

Aerodatalab_research_top

We faced a number of hurdles in understanding this chart as there is so much going on. The size of the shapes is perhaps the first thing readers notice, followed by where the shapes are located along the horizontal (time) axis. After that, readers may see the color of the shapes, and finally, the different shapes (circles, triangles,...).

It would help to have a legend explaining the sizes, shapes and colors. These were explained within the text. The size encodes the number of test subjects in the clinical trials. The color encodes pharmaceutical companies, of which the graphic focuses on 10 major ones. Circles represent completed trials, crosses inside circles represent terminated trials, triangles represent trials that are still active and recruiting, and squares for other statuses.

The vertical axis presents another challenge. It shows the disease conditions being investigated. As a lay-person, I cannot comprehend the logic of the order. With over 800 conditions, it became impossible to find a particular condition. The search function on my browser skipped over the entire graphic. I believe the order is based on some established taxonomy.

***

In creating the alternative shown below, I stayed close to the original intent of the dataviz, retaining all the dimensions of the dataset. Instead of the fancy dot plot, I used an enhanced data table. The encoding methods reflect what I’d like my readers to notice first. The color shading reflects the size of each clinical trial. The pharmaceutical companies are represented by their first initials. The status of the trial is shown by a dot, a cross or a square.

Here is a sketch of this concept showing just the top 10 rows.

Redo_aero_pharmard

Certain conditions attracted much more investment. Certain pharmas are placing bets on cures for certain conditions. For example, Novartis is heavily into research on Meningnitis, meningococcal while GSK has spent quite a bit on researching "bacterial infections."


It's hot even in Alaska

A twitter user pointed to the following chart, which shows that Alaska has experienced extreme heat this summer, with the July statewide average temperature shattering the previous record;

Alaskaheat

This column chart is clear in its primary message: the red column shows that the average temperature this year is quite a bit higher than the next highest temperature, recorded in July 2004. The error bar is useful for statistically-literate people - the uncertainty is (presumably) due to measurement errors. (If a similar error bar is drawn for the July 2004 column, these bars probably overlap a bit.)

The chart violates one of the rules of making column charts - the vertical axis is truncated at 53F, thus the heights or areas of the columns shouldn't be compared. This violation was recently nominated by two dataviz bloggers when asked about "bad charts" (see here).

Now look at the horizontal axis. These are the years of the top 20 temperature records, ordered from highest to lowest. The months are almost always July except for the year 2004 when all three summer months entered the top 20. I find it hard to make sense of these dates when they are jumping around.

In the following version, I plotted the 20 temperatures on a chronological axis. Color is used to divide the 20 data points into four groups. The chart is meant to be read top to bottom. 

Redo_junkcharts_alaska_heat

 


Tightening the bond between the message and the visual: hello stats-cats

The editors of ASA's Amstat News certainly got my attention, in a recent article on school counselling. A research team asked two questions. The first was HOW ARE YOU FELINE?

Stats and cats. The pun got my attention and presumably also made others stop and wonder. The second question was HOW DO YOU REMEMBER FEELING while you were taking a college statistics course? Well, it's hard to imagine the average response to that question would be positive.

What also drew me to the article was this pair of charts:

Counselors_Figure1small

Surely, ASA can do better. (I'm happy to volunteer my time!)

Rotate the chart, clean up the colors, remove the decimals, put the chart titles up top, etc.

***

The above remedies fall into the V corner of my Trifecta checkup.

Trifectacheckup_junkcharts_imageThe key to fixing this chart is to tighten the bond between the message and the visual. This means working that green link between the Q and V corners.

This much became clear after reading the article. The following paragraphs are central to the research (bolding is mine):

Responses indicated the majority of school counselors recalled experiences of studying statistics in college that they described with words associated with more unpleasant affect (i.e., alarm, anger, distress, fear, misery, gloom, depression, sadness, and tiredness; n = 93; 66%). By contrast, a majority of counselors reported same-day (i.e., current) emotions that appeared to be associated with more pleasant affect (i.e., pleasure, happiness, excitement, astonishment, sleepiness, satisfaction, and calm; n = 123; 88%).

Both recalled emotive experiences and current emotional states appeared approximately balanced on dimensions of arousal: recalled experiences associated with lower arousal (i.e., pleasure, misery, gloom, depression, sadness, tiredness, sleepiness, satisfaction, and calm, n = 65, 46%); recalled experiences associated with higher arousal (i.e., happiness, excitement, astonishment, alarm, anger, distress, fear, n = 70, 50%); current emotions associated with lower arousal (n = 60, 43%); current experiences associated with higher arousal (i.e., n = 79, 56%).

These paragraphs convey two crucial pieces of information: the structure of the analysis, and its insights.

The two survey questions measure two states of experiences, described as current versus recalled. Then the individual affects (of which there were 16 plus an option of "other") are scored on two dimensions, pleasure and arousal. Each affect maps to high or low pleasure, and separately to high or low arousal.

The research insight is that current experience was noticably higher than recalled experience on the pleasure dimension but both experiences were similar on the arousal dimension.

Any visualization of this research must bring out this insight.

***

Here is an attempt to illustrate those paragraphs:

Redo_junkcharts_amstat_feline

The primary conclusion can be read from the four simple pie charts in the middle of the page. The color scheme shines light on which affects are coded as high or low for each dimension. For example, "distressed" is scored as showing low pleasure and high arousal.

A successful data visualization for this situation has to bring out the conclusion drawn at the aggregated level, while explaining the connection between individual affects and their aggregates.


Inspiration from a waterfall of pie charts: illustrating hierarchies

Reader Antonio R. forwarded a tweet about the following "waterfall of pie charts" to me:

Water-stats-pie21

Maarten Lamberts loved these charts (source: here).

I am immediately attracted to the visual thinking behind this chart. The data are presented in a hierarchy with three levels. The levels are nested in the sense that the pieces in each pie chart add up to 100%. From the first level to the second, the category of freshwater is sub-divided into three parts. From the second level to the third, the "others" subgroup under freshwater is sub-divided into five further categories.

The designer faces a twofold challenge: presenting the proportions at each level, and integrating the three levels into one graphic. The second challenge is harder to master.

The solution here is quite ingenious. A waterfall/waterdrop metaphor is used to link each layer to the one below. It visually conveys the hierarchical structure.

***

There remains a little problem. There is a confusion related to the part and the whole. The link between levels should be that one part of the upper level becomes the whole of the lower level. Because of the color scheme, it appears that the part above does not account for the entirety of the pie below. For example, water in lakes is plotted on both the second and third layers while water in soil suddenly enters the diagram at the third level even though it should be part of the "drop" from the second layer.

***

I started playing around with various related forms. I like the concept of linking the layers and want to retain it. Here is one graphic inspired by the waterfall pies from above:

Redo_waterfall_pies

 


The Periodic Table, a challenge in information organization

Reader Chris P. points me to this article about the design of the Periodic Table. I then learned that 2019 is the “International Year of the Periodic Table,” according to the United Nations.

Here is the canonical design of the Periodic Table that science students are familiar with.

Wiki-Simple_Periodic_Table_Chart-en.svg

(Source: Wikipedia.)

The Periodic Table is an exercise of information organization and display. It's about adding structure to over 100 elements, so as to enhance comprehension and lookup. The canonical tabular design has columns and rows. The columns (Groups) impose a primary classification; the rows (Periods) provide a secondary classification. The elements also follow an aggregate order, which is traced by reading from top left to bottom right. The row structure makes clear the "periodicity" of the elements: the "period" of recurrence is not constant, tending to increase with the heavier elements at the bottom.

As with most complex datasets, these elements defy simple organization, due to a curse of dimensionality. The general goal is to put the similar elements closer together. Similarity can be defined in an infinite number of ways, such as chemical, physical or statistical properties. The canonical design, usually attributed to Russian chemist Mendeleev, attained its status because the community accepted his organizing principles, that is, his definitions of similarity (subsequently modified).

***

Of interest, there is a list of unsettled issues. According to Wikipedia, the most common arguments concern:

  • Hydrogen: typically shown as a member of Group 1 (first column), some argue that it doesn’t belong there since it is a gas not a metal. It is sometimes placed in Group 17 (halogens), where it forms a nice “triad” with fluorine and chlorine. Other designers just float hydrogen up top.
  • Helium: typically shown as a member of Group 18 (rightmost column), the  halogens noble gases, it may also be placed in Group 2.
  • Mercury: usually found in Group 12, some argue that it is not a metal like cadmium and zinc.
  • Group 3: other than the first two elements , there are various voices about how to place the other elements in Group 3. In particular, the pairs of lanthanum / actinium and lutetium / lawrencium are sometimes shown in the main table, sometimes shown in the ‘f-orbital’ sub-table usually placed below the main table.

***

Over the years, there have been numerous attempts to re-design the Periodic table. Some of these are featured in the article that Chris sent me (link).

I checked how these alternative designs deal with those unsettled issues. The short answer is they don't settle the issues.

Wide Table (Janet)

The key change is to remove the separation between the main table and the f-orbital (pink) section shown below, as a "footnote". This change clarifies the periodicity of the elements, especially the elongating periods as one moves down the table. This form is also called "long step".

Mg32190402_long_conventional

As a tradeoff, this table requires more space and has an awkward aspect ratio.

In this version of the wide table, the designer chooses to stack lutetium / lawrencium in Group 3 as part of the main table. Other versions place lanthanum / actinium in Group 3 as part of the main table. There are even versions that leave Group 3 with two elements.

Hydrogen, helium and mercury retain their conventional positions.

 

Spiral Design (Hyde)

There are many attempts at spiral designs. Here is one I found on this tumblr:

Hyde_periodictable

The spiral leverages the correspondence between periodic and circular. It is visually more pleasing than a tabular arrangement. But there is a tradeoff. Because of the increasing "diameter" from inner to outer rings, the inner elements are visually constrained compared to the outer ones.

In these spiral diagrams, the designer solves the aspect-ratio problem by creating local loops, sometimes called peninsulas. This is analogous to the footnote table solution, and visually distorts the longer periodicity of the heavier elements.

For Hyde's diagram, hydrogen is floated, helium is assigned to Group 2, and mercury stays in Group 12.

 

Racetrack

I also found this design on the same tumblr, but unattributed. It may have come from Life magazine.

Tumblr_n3tbz5rIKk1s3r80lo3_1280

It's a variant of the spiral. Instead of peninsulas, the designer squeezes the f-orbital section under Group 3, so this is analogous to the wide table solution.

The circular diagrams convey the sense of periodic return but the wide table displays the magnitudes more clearly.

This designer places hydrogen in group 18 forming a triad with fluorine and chlorine. Helium is in Group 17 and mercury in the usual Group 12 .

 

Cartogram (Sheehan)

This version is different.

Elements_relative_abundance

The designer chooses a statistical property (abundance) as the primary organizing principle. The key insight is that the lighter elements in the top few rows are generally more abundant - thus more important in a sense. The cartogram reveals a key weakness of the spiral diagrams that draw the reader's attention to the outer (heavier) elements.

Because of the distorted shapes, the cartogram form obscures much of the other data. In terms of the unsettled issues, hydrogen and helium are placed in Groups 1 and 2. Mercury is in Group 12. Group 3 is squeezed inside the main table rather than shown below.

 

Network

The centerpiece of the article Chris sent me is a network graph.

Periodic-bonds_1024

This is a complete redesign, de-emphasizing the periodicity. It's a result of radically changing the definition of similarity between elements. One barrier when introducing entirely new displays is the tendency of readers to expect the familiar.

***

I found the following articles useful when researching this post:

The Conversation

Royal Chemistry Society