Superb tile map offering multiple avenues for exploration

Here's a beauty by WSJ Graphics:

Wsj_powerproduction

The article is here.

This data graphic illustrates the power of the visual medium. The underlying dataset is complex: power production by type of source by state by month by year. That's more than 90,000 numbers. They all reside on this graphic.

Readers amazingly make sense of all these numbers without much effort.

It starts with the summary chart on top.

Wsj_powerproduction_us_summary

The designer made decisions. The data are presented in relative terms, as proportion of total power production. Only the first and last years are labeled, thus drawing our attention to the long-term trend. The order of the color blocks is carefully selected so that the cleaner sources are listed at the top and the dirtier sources at the bottom. The order of the legend labels mirrors the color blocks in the area chart.

It takes only a few seconds to learn that U.S. power production has largely shifted away from coal with most of it substituted by natural gas. Other than wind, the green sources of power have not gained much ground during these years - in a relative sense.

This summary chart serves as a reading guide for the rest of the chart, which is a tile map of all fifty states. Embedded in the tile map is a small-multiples arrangement.

***

The map offers multiple avenues for exploration.

Some readers may look at specific states. For example, California.

Wsj_powerproduction_california

Currently, about half of the power production in California come from natural gas. Notably, there is no coal at all in any of these years. In addition to wind, solar energy has also gained. All of these insights come without the need for any labels or gridlines!

Wsj_powerproduction_westernstatesBrowsing around California, readers find different patterns in other Western states like Oregon and Washington.

Hydroelectric energy is the dominant source in those two states, with wind gradually taking share.

At this point, readers realize that the summary chart up top hides remarkable state-level variations.

***

There are other paths through the map.

Some readers may scan the whole map, seeking patterns that pop out.

One such pattern is the cluster of states that use coal. In most of these states, the proportion of coal has declined.

Yet another path exists for those interested in specific sources of power.

For example, the trend in nuclear power usage is easily followed by tracking the purple. South Carolina, Illinois and New Hampshire are three states that rely on nuclear for more than half of its power.

Wsj_powerproduction_vermontI wonder what happened in Vermont about 8 years ago.

The chart says they renounced nuclear energy. Here is some history. This one-time event caused a disruption in the time series, unique on the entire map.

***

This work is wonderful. Enjoy it!


To explain or to eliminate, that is the question

Today, I take a look at another project from Ray Vella's class at NYU.

Rich Get Richer Assigment 2 top

(The above image is a honeypot for "smart" algorithms that don't know how to handle image dimensions which don't fit their shadow "requirement". Human beings should proceed to the full image below.)

As explained in this post, the students visualized data about regional average incomes in a selection of countries. It turns out that remarkable differences persist in regional income disparity between countries, almost all of which are more advanced economies.

Rich Get Richer Assigment 2 Danielle Curran_1

The graphic is by Danielle Curran.

I noticed two smart decisions.

First, she came up with a different main metric for gauging regional disparity, landing on a metric that is simple to grasp.

Based on hints given on the chart, I surmised that Danielle computed the change in per-capita income in the richest and poorest regions separately for each country between 2000 and 2015. These regional income growth values are expressed in currency, not indiced. Then, she computed the ratio of these growth rates, for each country. The end result is a simple metric for each country that describes how fast income has been growing in the richest region relative to the poorest region.

One of the challenges of this dataset is the complex indexing scheme (discussed here). Carlos' solution keeps the indices but uses design to facilitate comparisons. Danielle avoids the indices altogether.

The reader is relieved of the need to make comparisons, and so can focus on differences in magnitude. We see clearly that regional disparity is by far the highest in the U.K.

***

The second smart decision Danielle made is organizing the countries into clusters. She took advantage of the horizontal axis which does not encode any data. The branching structure places different clusters of countries along the axis, making it simple to navigate. The locations of these clusters are cleverly aligned to the map below.

***

Danielle's effort is stronger on communications while Carlos' effort provides more information. The key is to understand who your readers are. What proportion of your readers would want to know the values for each country, each region and each year?

***

A couple of suggestions

a) The reference line should be set at 1, not 0, for a ratio scale. The value of 1 happens when the richest region and the poorest region have identical per-capita incomes.

b) The vertical scale should be fixed.


Check your presumptions while you're reading this chart about Israel's vaccination campaign

On July 30, Israel began administering third doses of mRNA vaccines to targeted groups of people. This decision was controversial since there is no science to support it. The policymakers do have educated guesses by experts based on best-available information. By science, I mean actual evidence. Since no one has previously been given three shots, there can be no data on which anyone can root such a decision. Nevertheless, the pandemic does not always give us time to collect relevant data, and so speculative analysis has found its calling.

Dvir Aran, at Technion, has been diligently tracking the situation in Israel on his Twitter. Ten days after July 30, he posted the following chart, which immediately led many commentators to bounce out of their seats crowning the third shot as a magic bullet. Notably, Dvir himself did not endorse such a claim. (See here to learn how other hasty conclusions by experts have fared.)

When you look at Dvir's chart, what do we see?

Dvir_aran_chart

Possibly one of the following two things, depending on what concern you have in your head.

1) The red line sits far above the other two lines, showing that unvaccinated people are much more likely to get infected.

2) The blue line diverges from the green line almost immediately after the 3rd shots started getting into arms, showing that the 3rd shot is super effective.

If you take another moment to look, you might start asking questions, as many in Twitter world did. Dvir was startlingly efficient at answering these queries.

A) Does the green line represent people with 2 or 3 doses, or is it strictly 2 doses? Aron asked this question and got the answer (the former):

AronBrand_israelcases_twoorthreedoses

It's time to check our presumptions. When you read that chart, did you presume it's exactly 2 doses or did you presume it's 2 or 3 doses? Or did you immediately spot the ambiguity? As I said in this article, graphs attain efficiency at communication because the designer leverages unspoken rules - the chart conveys certain information without explicitly placing it on the chart. But this can backfire. In this case, I presumed the three lines to display three non-overlapping groups of people, and thus the green line indicates those with 2 doses but not 3. That presumption led me to misinterpret what's on the chart.

B) What is the denominator of the case rates? Is it literal - by that I mean, all unvaccinated people for the red line, and all people with 3 doses for the blue line? Or is the denominator the population of Israel, the same number for all three lines? Lukas asked this question, and got the answer (the former).

Lukas_denominator

C) Since third shots are recommended for 60 year olds and over who were vaccinated at least 5 months ago, and most unvaccinated Israelis are below 60, this answer opens the possibility that the lines compare apples and oranges. Joe. S. asked about this, and received an answer (all lines display only 60 year olds and over.)

Joescholar_basepopulationquestion

Jason P. asked, and learned that the 5-month-out criterion is immaterial since 90% of the vaccinated have already reached that time point.

JasonPogue_5monthsout

D) We have even more presumptions. Like me, did you presume that the red line represents the "unvaccinated," meaning people who have not had any vaccine shots? If so, we may both be wrong about this. It has become the norm by vaccine researchers to lump "partially vaccinated" people with "unvaccinated", and call this combined group "unvaccinated". Here is an excerpt from a recent report from Public Health Ontario (link to PDF), which clearly states this unintuitive counting rule:

Ontario_case_definition

Notice that in this definition, someone who got infected within 14 days of the first shot is classified as an "unvaccinated" case and not a "partially vaccinated case".

In the following tweet, Dvir gave a hint of what he plotted:

Dvir_group_definition

In a previous analysis, he averaged the rates of people with 0 doses and 1 dose, which is equivalent to combining them and calling them unvaccinated. It's unclear to me what he did to the 1-dose subgroup in our featured chart - did it just vanish from the chart? (How people and cases are classified into these groups is a major factor in all vaccine effectiveness calculations - a topic I covered here. Unfortunately, most published reports do a poor job explaining what the analysts did).

E) Did you presume that all three lines are equally important? That's far from true. Since Israel is the world champion in vaccination, the bulk of the 60+ population form the green line. I asked Dvir and he responded that only 7.5%, or roughly 100K are unvaccinated.

DvirAran_proportionofunvaccinated

That means 1.2 million people are part of the green line, 12 times higher. There are roughly 50 cases per day among unvaccinated, and 370 daily cases among those with 2 or 3 doses. In other words, vaccinated people account for almost 90% of all cases.

Yes, this is inevitable when over 90% of the age group have been vaccinated (but it is predictable on the first day someone blasted everywhere that real-world VE is proved by the fact that almost all new cases were in the unvaccinated.)

If your job is to minimize infections, you should be spending most of your time thinking about the 370 cases among vaccinated than the 50 cases among unvaccinated. If you halve the case rate, that would be a difference of 185 cases vs 25. In Israel, the vaccination campaign has already succeeded; it's time to look forward, which is exactly why they are re-focusing on the already vaccinated.

***

If what you worry about most is the effectiveness of the original two-dose regimen, Dvir's chart raises a puzzle. Ignore the blue line, and remember that the green line already includes everybody represented by the blue line.

In the following chart, I removed the blue line, and added reference lines in dashed purple that correspond to 25%, 50% and 75% vaccine effectiveness. The data plotted on this chart are unadjusted case rates. A 75% effective vaccine cuts case rate by three quarters.

Junkcharts_dviraran_israel_threeshotschart

This chart shows the 2-dose mRNA vaccine was nowhere near 90% effective. (As regular readers know, I don't endorse this simplistic calculation and have outlined the problems here, but this style of calculation keeps getting published and passed around. Those who use it to claim real-world studies confirm prior clinical trial outcomes can either (a) insist on using it and retract their earlier conclusions, or (b) admit that such a calculation was, and is, a bad take.)

Also observe how the vaccinated (green) line is moving away from the unvaccinated (red) line. The vaccine apparently is becoming more effective, which runs counter to the trend used by the Israeli government to justify third doses. This improvement also precedes the start of the third-shot campaign. When the analytical method is bad, it generates all sorts of spurious findings.

***

As Dvir said, it is premature to comment on the third doses based on 10 days of data. For one thing, the vaccine developers insist that their vaccines must be given 14 days to work. In a typical calculation, all of the cases in the blue line fall outside the case-counting window. The effective number of cases that would be attributed to the 3-dose group right now is zero, and the vaccine effectiveness using the standard methodology is 100%, even better than shown in the chart.

There is an alternative interpretation of this graph. Statisticians call this the selection effect. On July 30, the blue line split out of the green: some people were selected to receive the 3rd dose - this includes an official selection (the government makes certain subgroups eligible) as well as a self-selection (within the eligible subgroup, certain people decide to get the 3rd shot earlier.) If those who are less exposed to the virus, or more risk averse, get the shots first, then all that is happening may be that we have split off a high VE subgroup from the green line. Even if the third shot were useless, the selection effect itself could explain the gap.

Statistics is about grays. It's not either-or. It's usually some of each. If you feel like Groundhog Day, you're getting the picture. When they rolled out two doses, we lived through an optimistic period in which most experts rejoiced about 90-100% real-world effectiveness, and then as more people get vaccinated, the effect washed away. The selection effect gradually disappears when vaccination becomes widespread. Are we starting a new cycle of hope and despair? We'll find out soon enough.


Did the pandemic drive mass migration?

The Wall Street Journal ran this nice compact piece about migration patterns during the pandemic in the U.S. (link to article)

Wsj_migration

I'd look at the chart on the right first. It shows the greatest net flow of people out of the Northeast to the South. This sankey diagram is nicely done. The designer shows restraint in not printing the entire dataset on the chart. If a reader really cares about the net migration from one region to a specific other region, it's easy to estimate the number even though it's not printed.

The maps succinctly provide readers the definition of the regions.

To keep things in perspective, we are talking around 100,000 when the death toll of Covid-19 is nearing 600,000. Some people have moved but almost everyone else haven't.

***

The chart on the left breaks down the data in a different way - by urbanicity. This is a variant of the stacked column chart. It is a chart form that fits the particular instance of the dataset. It works only because in every month of the last three years, there was a net outflow from "large metro cores". Thus, the entire series for large metro cores can be pointed downwards.

The fact that this design is sensitive to the dataset is revealed in the footnote, which said that the May 2018 data for "small/medium metro" was omitted from the chart. Why didn't they plot that number?

It's the one datum that sticks out like a sore thumb. It's the only negative number in the entire dataset that is not associated with "large metro cores". I suppose they could have inserted a tiny medium green slither in the bottom half of that chart for May 2018. I don't think it hurts the interpretation of the chart. Maybe the designer thinks it might draw unnecessary attention to one data point that really doesn't warrant it.

***

See my collection of posts about Wall Street Journal graphics.


The time has arrived for cumulative charts

Long-time reader Scott S. asked me about this Washington Post chart that shows the disappearance of pediatric flu deaths in the U.S. this season:

Washingtonpost_pediatricfludeaths

The dataset behind this chart is highly favorable to the designer, because the signal in the data is so strong. This is a good chart. The key point is shown clearly right at the top, with an informative title. Gridlines are very restrained. I'd draw attention to the horizontal axis. The master stroke here is omitting the week labels, which are likely confusing to all but the people familiar with this dataset.

Scott suggested using a line chart. I agree. And especially if we plot cumulative counts, rather than weekly deaths. Here's a quick sketch of such a chart:

Junkcharts_redo_wppedflu_panel

(On second thought, I'd remove the week numbers from the horizontal axis, and just go with the month labels. The Washington Post designer is right in realizing that those week numbers are meaningless to most readers.)

The vaccine trials have brought this cumulative count chart form to the mainstream. For anyone who have seen the vaccine efficacy charts, the interpretation of the panel of line charts should come naturally.

Instead of four plots, I prefer one plot with four superimposed lines. Like this:

Junkcharts_redo_wppeddeaths_superpose2

 

 

 


Locating the political center

I mentioned the September special edition of Bloomberg Businessweek on the election in this prior post. Today, I'm featuring another data visualization from the magazine.

Bloomberg_politicalcenter_print_sm

***

Here are the rightmost two charts.

Bloomberg_politicalcenter_rightside Time runs from top to bottom, spanning four decades.

Each chart covers a political issue. These two charts concern abortion and marijuana.

The marijuana question (far right) has only two answers, legalize or don't legalize. The underlying data measure the proportions of people agreeing to each point of view. Roughly three-quarters of the population disagreed with legalization in 1980 while two-thirds agree with it in 2020.

Notice that there are no horizontal axis labels. This is a great editorial decision. Only coarse trends are of interest here. It's not hard to figure out the relative proportions. Adding labels would just clutter up the display.

By contrast, the abortion question has three answer choices. The middle option is "Sometimes," which is represented by a white color, with a dot pattern. This is an issue on which public opinion in aggregate has barely shifted over time.

The charts are organized in a small-multiples format. It's likely that readers are consuming each chart individually.

***

What about the dashed line that splits each chart in half? Why is it there?

The vertical line assists our perception of the proportions. Think of it as a single gridline.

In fact, this line is underplayed. The headline of the article is "tracking the political center." Where is the center?

Until now, we've paid attention to the boundaries between the differently colored areas. But those boundaries do not locate the political center!

The vertical dashed line is the political center; it represents the view of the median American. In 1980, the line sat inside the gray section, meaning the median American opposed legalizing marijuana. But the prevalent view was losing support over time and by 2010, there wer more Americans wanting to legalize marijuana than not. This is when the vertical line crossed into the green zone.

The following charts draw attention to the middle line, instead of the color boundaries:

Junkcharts_redo_bloombergpoliticalcenterrightsideOn these charts, as you glance down the middle line, you can see that for abortion, the political center has never exited the middle category while for marijuana, the median American didn't want to legalize it until an inflection point was reached around 2010.

I highlight these inflection points with yellow dots.

***

The effect on readers is entirely changed. The original charts draw attention to the areas first while the new charts pull your eyes to the vertical line.

 


Election visuals: three views of FiveThirtyEight's probabilistic forecasts

As anyone who is familiar with Nate Silver's forecasting of U.S. presidential elections knows, he runs a simulation that explores the space of possible scenarios. The polls that provide a baseline forecast make certain assumptions, such as who's a likely voter. Nate's model unshackles these assumptions from the polling data, exploring how the outcomes vary as these assumptions shift.

In the most recent simulation, his computer explores 40,000 scenarios, each of which predicts a split of the electoral vote, from which the winner of the election can be determined. The model's outcome is usually summarized by a winning probability, which is just the proportion of scenarios under which one candidate wins.

This type of forecasting was responsible for the infamous meltdown in 2016 when most of these models - Nate's being an exception - issued extremely confident predictions that Hillary Clinton wins with 95% or higher probability. Essentially, the probability distribution collapses to a point. This is analogous to an extremely narrow confidence band, indicating almost zero uncertainty about the event. It was as if almost all of the 40,000 scenarios predicted Clinton to be the winner.

The 538 data team has come up with various ways of visualizing the outputs of the model (link). The entire post is worth reading. Here, I'll highlight the most scientific, and direct visual representation, which is the third display.

538_pdf_pair

We start by looking at the bottom of the two charts, showing the predicted electoral votes won  by Democratic challenger Joe Biden, in each of the 40,000 scenarios. Our attention is directed to the thick line that gives the relative chance of Biden's electoral-vote tally. This line is a smoothed summary of the columns in the background, which show the number of times the simulation produces each electoral-vote count.

The highlighted, right side of the chart recounts scenarios in which Biden becomes President, that is to say, he wins more than 270 electoral votes (out of 538, doh). The faded, left side represents scenarios in which Biden is defeated and Trump wins a second term.

The reason I focused on the bottom chart is that the top chart is merely a mirror image of this one. Just reflect the bottom chart around the vertical axis of 270 electoral votes, change the color scheme to red, and swap annotations related to Trump and Biden, and you get the other chart. This is because the narrative has excluded third-party and write-in candidates, leaving us with a zero-sum situation.

Alternatively, one can jam both charts into one, while supplying extra labels, like this:

Redo_junkcharts_538forecastpdf_1

I prefer the denser single chart because my mind wanders away searching for extra meaning when chart elements are mirrored.

One advantage of the mirrored presentation is that the probability profiles of the potential Trump or Biden wins can be directly compared. We learn that Trump's winning margins are smaller, rarely above 150, and never above 250.

This comparison is made easier by flipping left side of the chart onto the right side:

Redo_junkcharts_538forecastpdf_2

Those are three different visualizations using the same chart form. I'd have to run a poll to figure out which is the best. What's your opinion?


How many details to include in a chart

This graphic by Bloomberg provides the context for understanding the severity of the Atlantic storm season. (link)

Bloomberg_2020storms_vertical

At this point of the season, 2020 appears to be one of the most severe in history.

I was momentarily fascinated by a feature of modern browser-based data visualization: the death of the aspect ratio. When the browser window is stretched sufficiently wide, the chart above is transformed to this look:

Bloomberg_2020storms_horizontal

The chart designer has lost control of the aspect ratio.

***

This Bloomberg chart is an example of the spaghetti-style plots that convey variability by displaying individual units of data (here, storm years). The envelope of the growth curves gives the range of historical counts while the density of curves roughly offers some sense of the most likely counts at different points of the season.

But these spaghetti-style plots are not precise at conveying the variability because the density is hard to gauge. That's where aggregating the individual units helps.

The following chart does not show individual storm years. It shows the counts for the median season at selected points in time, and also a band of variability (for example, you'd include say 90 or 95% of the seasons).

Redo_bloomberg_2020storms

I don't have the raw data so the aggregating is done by eyeballing the spaghetti.

I prefer this presentation even though it does not plot every single data point one has in the dataset.

 

 


On data volume, reliability, uncertainty and confidence bands

This chart from the Economist caught my eye because of the unusual use of color-coded hexagonal tiles.

Economist_lifequalitywealth1

The basic design of the chart is easy to grasp: It relates people's "happiness" to national wealth. The thick black line shows that the average citizen of wealthier countries tends to rate their current life situation better.

For readers alert to graphical details, things can get a little confusing. The horizontal "wealth" axis is shown in log scale, which means that the data on the right side of the chart have been compressed while the data on the left side of the chart have been stretched out. In other words, the curve in linear scale is much flatter than depicted.

Redo_economistlifesatisfaction_linear

One thing you might notice is how poor the fit of the line is at both ends. Singapore and Afghanistan are clearly not explained by the fitted line. (That said, the line is based on many more dots than those eight we can see.) Moreover, because countries are widely spread out on the high end of the wealth axis, the fit is not impressive. Log scales tend to give a false impression of the tightness of fit, as I explained before when discussing coronavirus case curves.

***

The hexagonal tiles replace the more typical dot scatter or contour shading. The raw data consist of results from polls conducted in different countries in different years. For each poll, the analyst computes the average life satisfaction score for that country in that year. From national statistics, the analyst pulls out that country's GDP per capita in that year. Thus, each data point is a dot on the canvass. A few data points are shown as black dots. Those are for eight highlighted countries for the year 2018.

The black line is fitted to the underlying dot scatter and summarizes the correlation between average wealth and average life satisfaction. Instead of showing the scatter, this Economist design aggregates nearby dots into hexagons. The deepest red hexagon, sandwiched between Finland and the US, contains about 60-70 dots, according to the color legend.

These details are tough to take in. It's not clear which dots have been collected into that hexagon: are they all Finland or the U.S. in various years, or do they include other countries? Each country is represented by multiple dots, one for each poll year. It's also not clear how much variation there exists within a country across years.

***

The hexagonal tiles presumably serve the same role as a dot scatter or contour shading. They convey the amount of data supporting the fitted curve along its trajectory. More data confers more reliability.

For this chart, the hexagonal tiles do not add any value. The deepest red regions are those closest to the black line so nothing is actually lost by showing just the line and not the tiles.

Redo_economistlifesatisfaction_nohex

Using the line chart obviates the need for readers to figure out the hexagons, the polls, the aggregation, and the inevitable unanswered questions.

***

An alternative concept is to show the "confidence band" or "error bar" around the black line. These bars display the uncertainty of the data. The wider the band, the less certain the analyst is of the estimate. Typically, the band expands near the edges where we have less data.

Here is conceptually what we should see (I don't have the underlying dataset so can't compute the confidence band precisely)

Redo_economistlifesatisfaction_confband

The confidence band picture is the mirror image of the hexagonal tiles. Where the poll density is high, the confidence band narrows, and where poll density is low, the band expands.

A simple way to interpret the confidence band is to find the country's wealth on the horizontal axis, and look at the range of life satisfaction rating for that value of wealth. Now pick any number between the range, and imagine that you've just conducted a survey and computed the average rating. That number you picked is a possible survey result, and thus a valid value. (For those who know some probability, you should pick a number not at random within the range but in accordance with a Bell curve, meaning picking a number closer to the fitted line with much higher probability than a number at either edge.)

Visualizing data involves a series of choices. For this dataset, one such choice is displaying data density or uncertainty or neither.


Visualizing black unemployment in the U.S.

In a prior post, I explained how the aggregate unemployment rate paints a misleading picture of the employment situation in the United States. Even though the U3 unemployment rate in 2019 has returned to the lowest level we have seen in decades, the aggregate statistic hides some concerning trends. There is an alarming rise in the proportion of people considered "not in labor force" by the Bureau of Labor Statistics - these forgotten people are not counted as "employable": when a worker drops out of the labor force, the unemployment rate ironically improves.

In that post, I looked at the difference between men and women. This post will examine the racial divide, whites and blacks.

I did not anticipate how many obstacles I'd encounter. It's hard to locate a specific data series, and it's harder to know whether the lack of search results indicates the non-existence of the data, or the incompetence of the search engine. Race-related data tend not to be offered in as much granularity. I was only able to find quarterly data for the racial analysis while I had monthly data for the gender analysis. Also, I only have data from 2000, instead of 1990.

***

As before, I looked at the official unemployment rate first, this time presented by race. Because whites form the majority of the labor force, the overall unemployment rate (not shown) is roughly the same as that for whites, just pulled up slightly toward the line for blacks.

Jc_unemploybyrace

The racial divide is clear as day. Throughout the past two decades, black Americans are much more likely to be unemployed, and worse during recessions.

The above chart determines the color encoding for all the other graphics. Notice that the best employment situations occurred on either end of this period, right before the dotcom bust in 2000, and in 2019 before the Covid-19 pandemic. As explained before, despite the headline unemployment rate being the same in those years, the employment situation was not the same.

***

Here is the scatter plot for white Americans:

Jc_unemploybyrace_scatter_whites

Even though both ends of the trajectory are marked with the same shade of blue, indicating almost identical (low) rates of unemployment, we find that the trajectory has failed to return to its starting point after veering off course during the recession of the early 2010s. While the proportion of part-time workers (counted as employed) returned to 17.5% in 2019, as in 2000, about 15 percent more whites are now excluded from the unemployment rate calculation.

The experience of black Americans appears different:

Jc_unemploybyrace_scatter_blacks

During the first decade, the proportion of black Americans dropping out of the labor force accelerated while among those considered employed, the proportion holding part-time jobs kept increasing. As the U.S. recovered from the Great Recession, we've seen a boomerang pattern. By 2019, the situation was halfway back to 2000. The last available datum for the first quarter of 2020 is before Covid-19; it actually showed a halt of the boomerang.

If the pattern we saw in the prior post holds for the Covid-19 world, we would see a marked spike in the out-of-labor-force statistic, coupled with a drop in part-time employment. It appeared that employers were eliminating part-time workers first.

***

One reader asked about placing both patterns on the same chart. Here is an example of this:

Jc_unemploybyrace_scatter_both

This graphic turns out okay because the two strings of dots fit tightly into the grid while not overlapping. There is a lot going on here; I prefer a multi-step story than throwing everything on the wall.

There is one insight that this chart provides that is not easily observed in two separate plots. Over the two decades, the racial gap has narrowed in these two statistics. Both groups have traveled to the top right corner, which is the worst corner to reside -- where more people are classified as not employable, and more of the employed are part-time workers.

The biggest challenge with making this combined scatter plot is properly controlling the color. I want the color to represent the overall unemployment rate, which is a third data series. I don't want the line for blacks to be all red, and the line for whites to be all blue, just because black Americans face a tough labor market always. The color scheme here facilitates cross-referencing time between the two dot strings.