Graphing the economic crisis of coronavirus 2

Last week, I discussed Ray's chart that compares the S&P 500 performance in this crisis against previous crises.

A reminder:

Tcb_stockmarketindices_fourcrises

Another useful feature is the halo around the right edge of the COVID-19 line. This device directs our eyes to where he wants us to look.

In the same series, he made the following for The Conference Board (link):

TCB-COVID-19-impact-oil-prices-640

Two things I learned from this chart:

The oil market takes a much longer time to recover after crises, compared to the S&P. None of these lines reached above 100 in the first 150 days (5 months).

Just like the S&P, the current crisis is most similar in severity to the 2008 Great Recession, only worse, and currently, the price collapse in oil is quite a bit worse than in 2008.

***
The drop of oil is going to be contentious. This is a drop too many for a Tufte purist. It might as well symbolize a tear shed.

The presence of the icon tells me these lines depict the oil market without having to read text. And I approve.


Graphing the economic crisis of Covid-19

My friend Ray Vella at The Conference Board has a few charts up on their coronavirus website. TCB is a trusted advisor and consultant to large businesses and thus is a good place to learn how the business community is thinking about this crisis.

I particularly like the following chart:

Tcb_stockmarketindices_fourcrises

This puts the turmoil in the stock market in perspective. We are roughly tracking the decline of the Great Recession of the late 2000s. It's interesting that 9/11 caused very mild gyrations in the S&P index compared to any of the other events. 

The chart uses an index with value 100 at Day 0. Day 0 is defined by the trigger event for each crisis. About three weeks into the current crisis, the S&P has lost over 30% of its value.

The device of a gray background for the bottom half of the chart is surprisingly effective.

***

Here is a chart showing the impact of the Covid-19 crisis on different sectors.

Tcb-COVID-19-manual-services-1170

So the full-service restaurant industry is a huge employer. Restaurants employ 7-8 times more people than airlines. Airlines employ about the same numbers of people as "beverage bars" (which I suppose is the same as "bars" which apparently is different from "drinking places"). Bars employ 7 times more people than "Cafeterias, etc.".

The chart describes where the jobs are, and which sectors they believe will be most impacted. It's not clear yet how deeply these will be impacted. Being in NYC, the complete shutdown is going to impact 100% of these jobs in certain sectors like bars, restaurants and coffee shops.


Food coma and self-sufficiency in dataviz

The Hustle wrote a strong analysis of the business of buffets. If you've read my analysis of Groupon's business model in Numbersense (link), you'll find some similarities. A key is to not think of every customer as an average customer; there are segments of customers who behave differently, and creating a proper mix of different types of customers is the management's challenge. I will make further comments on the statistics in a future post on the sister blog.

At Junk Charts, we'll focus on visualizing and communciating data. The article in The Hustle comes with the following dataviz:

Hustle_buffetcost

This dataviz fails my self-sufficiency test. Recall: self-sufficiency is a basic requirement of visualizing data - that the graphical elements should be sufficient to convey the gist of the data. Otherwise, there is no point in augmenting the data with graphical elements.

The self-sufficiency test is to remove the dataset from the dataviz, and ask whether the graphic can stand on its own. So here:

Redo_hustlebuffetcost_selfsufficiency

The entire set of ingredient costs appears on the original graphic. When these numbers are removed, the reader gets the wrong message - that the cost is equally split between these five ingredients.

This chart reminds me of the pizza chart that everyone thought was a pie chart except its designer! I wrote about it here. Food coma is a thing.

The original chart may be regarded as an illustration rather than data visualization. If so, it's just a few steps from becoming a dataviz. Like this:

Redo_hustlebuffetcost

P.S. A preview of what I'll be talking about at the sister blog. The above diagram illustrates the average case - for the average buffet diner. Underneath these costs is an assumption about the relative amounts of each food that is eaten. But eaten by whom?

Also, if you have Numbersense (link), the chapter on measuring the inflation rate is relevant here. Any inflation metric must assume a basket of goods, but then the goods within the basket have to be weighted by the amount of expenditure. It's much harder to get the ratio of expenditures correct compared to getting price data.

 

 


Bubble charts, ratios and proportionality

A recent article in the Wall Street Journal about a challenger to the dominant weedkiller, Roundup, contains a nice selection of graphics. (Dicamba is the up-and-comer.)

Wsj_roundup_img1


The change in usage of three brands of weedkillers is rendered as a small-multiples of choropleth maps. This graphic displays geographical and time changes simultaneously.

The staircase chart shows weeds have become resistant to Roundup over time. This is considered a weakness in the Roundup business.

***

In this post, my focus is on the chart at the bottom, which shows complaints about Dicamba by state in 2019. This is a bubble chart, with the bubbles sorted along the horizontal axis by the acreage of farmland by state.

Wsj_roundup_img2

Below left is a more standard version of such a chart, in which the bubbles are allowed to overlap. (I only included the bubbles that were labeled in the original chart).

Redo_roundupwsj0

The WSJ’s twist is to use the vertical spacing to avoid overlapping bubbles. The vertical axis serves a design perogative and does not encode data.  

I’m going to stick with the more traditional overlapping bubbles here – I’m getting to a different matter.

***

The question being addressed by this chart is: which states have the most serious Dicamba problem, as revealed by the frequency of complaints? The designer recognizes that the amount of farmland matters. One should expect the more acres, the more complaints.

Let's consider computing directly the number of complaints per million acres.

The resulting chart (shown below right) – while retaining the design – gives a wholly different feeling. Arkansas now owns the largest bubble even though it has the least acreage among the included states. The huge Illinois bubble is still large but is no longer a loner.

Redo_dicambacomplaints1

Now return to the original design for a moment (the chart on the left). In theory, this should work in the following manner: if complaints grow purely as a function of acreage, then the bubbles should grow proportionally from left to right. The trouble is that proportional areas are not as easily detected as proportional lengths.

The pair of charts below depict made-up data in which all states have 30 complaints for each million acres of farmland. It’s not intuitive that the bubbles on the left chart are growing proportionally.

Redo_dicambacomplaints2

Now if you look at the right chart, which shows the relative metric of complaints per million acres, it’s impossible not to notice that all bubbles are the same size.


Taking small steps to bring out the message

Happy new year! Good luck and best wishes!

***

We'll start 2020 with something lighter. On a recent flight, I saw a chart in The Economist that shows the proportion of operating income derived from overseas markets by major grocery chains - the headline said that some of these chains are withdrawing from international markets.

Econ_internationalgroceries_sm

The designer used one color for each grocery chain, and two shades within each color. The legend describes the shades as "total" and "of which: overseas". As with all stacked bar charts, it's a bit confusing where to find the data. The "total" is actually the entire bar, not just the darker shaded part. The darker shaded part is better labeled "home market" as shown below:

Redo_econgroceriesintl_1

The designer's instinct to bring out the importance of international markets to each company's income is well placed. A second small edit helps: plot the international income amounts first, so they line up with the vertical zero axis. Like this:

Redo_econgroceriesintl_2

This is essentially the same chart. The order of international and home market is reversed. I also reversed the shading, so that the international share of income is displayed darker. This shading draws the readers' attention to the key message of the chart.

A stacked bar chart of the absolute dollar amounts is not ideal for showing proportions, because each bar is a different length. Sometimes, plotting relative values summing to 100% for each company may work better.

As it stands, the chart above calls attention to a different message: that Walmart dwarfs the other three global chains. Just the international income of Walmart is larger than the total income of Costco.

***

Please comment below or write me directly if you have ideas for this blog as we enter a new decade. What do you want to see more of? less of?


Revisiting global car sales

We looked at the following chart in the previous blog. The data concern the growth rates of car sales in different regions of the world over time.

Cnbc zh global car sales

Here is a different visualization of the same data.

Redo_cnbc_globalcarsales

Well, it's not quite the same data. I divided the global average growth rate by four to yield an approximation of the true global average. (The reason for this is explained in the other day's post.)

The chart emphasizes how each region was helping or hurting the global growth. It also features the trend in growth within each region.

 


This Excel chart looks standard but gets everything wrong

The following CNBC chart (link) shows the trend of global car sales by region (or so we think).

Cnbc zh global car sales

This type of chart is quite common in finance/business circles, and has the fingerprint of Excel. After examining it, I nominate it for the Hall of Shame.

***

The chart has three major components vying for our attention: (1) the stacked columns, (2) the yellow line, and (3) the big red dashed arrow.

The easiest to interpret is the yellow line, which is labeled "Total" in the legend. It displays the annual growth rate of car sales around the globe. The data consist of annual percentage changes in car sales, so the slope of the yellow line represents a change of change, which is not particularly useful.

The big red arrow is making the point that the projected decline in global car sales in 2019 will return the world to the slowdown of 2008-9 after almost a decade of growth.

The stacked columns appear to provide a breakdown of the global growth rate by region. Looked at carefully, you'll soon learn that the visual form has hopelessly mangled the data.

Cnbc_globalcarsales_2006

What is the growth rate for Chinese car sales in 2006? Is it 2.5%, the top edge of China's part of the column? Between 1.5% and 2.5%, the extant of China's section? The answer is neither. Because of the stacking, China's growth rate is actually the height of the relevant section, that is to say, 1 percent. So the labels on the vertical axis are not directly useful to learning regional growth rates for most sections of the chart.

Can we read the vertical axis as global growth rate? That's not proper either. The different markets are not equal in size so growth rates cannot be aggregated by simple summing - they must be weighted by relative size.

The negative growth rates present another problem. Even if we agree to sum growth rates ignoring relative market sizes, we still can't get directly to the global growth rate. We would have to take the total of the positive rates and subtract the total of the negative rates.  

***

At this point, you may begin to question everything you thought you knew about this chart. Remember the yellow line, which we thought measures the global growth rate. Take a look at the 2006 column again.

The global growth rate is depicted as 2 percent. And yet every region experienced growth rates below 2 percent! No matter how you aggregate the regions, it's not possible for the world average to be larger than the value of each region.

For 2006, the regional growth rates are: China, 1%; Rest of the World, 1%; Western Europe, 0.1%; United States, -0.25%. A simple sum of those four rates yields 2%, which is shown on the yellow line.

But this number must be divided by four. If we give the four regions equal weight, each is worth a quarter of the total. So the overall average is the sum of each growth rate weighted by 1/4, which is 0.5%. [In reality, the weights of each region should be scaled to reflect its market size.]

***

tldr; The stacked column chart with a line overlay not only fails to communicate the contents of the car sales data but it also leads to misinterpretation.

I discussed several serious problems of this chart form: 

  • stacking the columns make it hard to learn the regional data

  • the trend by region takes a super effort to decipher

  • column stacking promotes reading meaning into the height of the column but the total height is meaningless (because of the negative section) while the net height (positive minus negative) also misleads due to presumptive equal weighting

  • the yellow line shows the sum of the regional data, which is four times the global growth rate that it purports to represent

 

***

PS. [12/4/2019: New post up with a different visualization.]


This chart tells you how rich is rich - if you can read it

Via twitter, John B. sent me the following YouGov chart (link) that he finds difficult to read:

Yougov_whoisrich

The title is clear enough: the higher your income, the higher you set the bar.

When one then moves from the title to the chart, one gets misdirected. The horizontal axis shows pound values, so the axis naturally maps to "the higher your income". But it doesn't. Those pound values are the "cutoff" values - the line between "rich" and "not rich". Even after one realizes this detail, the axis  presents further challenges: the cutoff values are arbitrary numbers such as "45,001" sterling; and these continuous numbers are treated as discrete categories, with irregular intervals between each category.

There is some very interesting and hard to obtain data sitting behind this chart but the visual form suppresses them. The best way to understand this dataset is to first think about each income group. Say, people who make between 20 to 30 thousand sterling a year. Roughly 10% of these people think "rich" starts at 25,000. Forty percent of this income group think "rich" start at 40,000.

For each income group, we have data on Z percent think "rich" starts at X. I put all of these data points into a heatmap, like this:

Redo_junkcharts_yougovuk_whoisrich

Technical note: in order to restore the horizontal axis to a continuous scale, you can take the discrete data from the original chart, then fit a smoothed curve through those points, and finally compute the interpolated values for any income level using the smoothing model.

***

There are some concerns about the survey design. It's hard to get enough samples for higher-income people. This is probably why the highest income segment starts at 50,000. But notice that 50,ooo is around the level at which lower-income people consider "rich". So, this survey is primarily about how low-income people perceive "rich" people.

The curve for the highest income group is much straighter and smoother than the other lines - that's because it's really the average of a number of curves (for each 10,000 sterling segment).

 

P.S. The YouGov tweet that publicized the small-multiples chart shown above links to a page that no longer contains the chart. They may have replaced it due to feedback.

 

 


Marketers want millennials to know they're millennials

When I posted about the lack of a standard definition of "millennials", Dean Eckles tweeted about the arbitrary division of age into generational categories. His view is further reinforced by the following chart, courtesy of PewResearch by way of MarketingCharts.com.

PewResearch-Generational-Identification-Sept2015

Pew asked people what generation they belong to. The amount of people who fail to place themselves in the right category is remarkable. One way to interpret this finding is that these are marketing categories created by the marketing profession. We learned in my other post that even people who use the term millennial do not have a consensus definition of it. Perhaps the 8 percent of "millennials" who identify as "boomers" are handing in a protest vote!

The chart is best read row by row - the use of stacked bar charts provides a clue. Forty percent of millennials identified as millennials, which leaves sixty percent identifying as some other generation (with about 5 percent indicating "other" responses). 

While this chart is not pretty, and may confuse some readers, it actually shows a healthy degree of analytical thinking. Arranging for the row-first interpretation is a good start. The designer also realizes the importance of the diagonal entries - what proportion of each generation self-identify as a member of that generation. Dotted borders are deployed to draw eyes to the diagonal.

***

The design doesn't do full justice for the analytical intelligence. Despite the use of the bar chart form, readers may be tempted to read column by column due to the color scheme. The chart doesn't have an easy column-by-column interpretation.

It's not obvious which axis has the true category and which, the self-identified category. The designer adds a hint in the sub-title to counteract this problem.

Finally, the dotted borders are no match for the differential colors. So a key message of the chart is buried.

Here is a revised chart, using a grouped bar chart format:

Redo_junkcharts_millennial_id

***

In a Trifecta checkup (link), the original chart is a Type V chart. It addresses a popular, pertinent question, and it shows mature analytical thinking but the visual design does not do full justice to the data story.

 

 


Does this chart tell the sordid tale of TI's decline?

The Hustle has an interesting article on the demise of the TI calculator, which is popular in business circles. The article uses this bar chart:

Hustle_ti_calculator_chart

From a Trifecta Checkup perspective, this is a Type DV chart. (See this guide to the Trifecta Checkup.)

The chart addresses a nice question: is the TI graphing calculator a victim of new technologies?

The visual design is marred by the use of the calculator images. The images add nothing to our understanding and create potential for confusion. Here is a version without the images for comparison.

Redo_junkcharts_hustlet1calc

The gridlines are placed to reveal the steepness of the decline. The sales in 2019 will likely be half those of 2014.

What about the Data? This would have been straightforward if the revenues shown are sales of the TI calculator. But according to the subtitle, the data include a whole lot more than calculators - it's the "other revenues" category in the financial reports of Texas Instrument which markets the TI. 

It requires a leap of faith to believe this data. It is entirely possible that TI calculator sales increased while total "other revenues" decreased! The decline of TI calculator could be more drastic than shown here. We simply don't have enough data to say for sure.

 

P.S. [10/3/2019] Fixed TI.