Wayward legend takes sides in a chart of two sides, plus data woes

Reader Chris P. submitted the following graph, found on Axios:

Axios_newstopics

From a Trifecta Checkup perspective, the chart has a clear question: are consumers getting what they wanted to read in the news they are reading?

Nevertheless, the chart is a visual mess, and the underlying data analytics fail to convince. So, it’s a Type DV chart. (See this overview of the Trifecta Checkup for the taxonomy.)

***

The designer did something tricky with the axis but the trick went off the rails. The underlying data consist of two set of ranks, one for news people consumed and the other for news people wanted covered. With 14 topics included in the study, the two data series contain the same values, 1 to 14. The trick is to collapse both axes onto one. The trouble is that the same value occurs twice, and the reader must differentiate the plot symbols (triangle or circle) to figure out which is which.

It does not help that the lines look like arrows suggesting movement. Without first reading the text, readers may assume that topics change in rank between two periods of time. Some topics moved right, increasing in importance while others shifted left.

The design wisely separated the 14 topics into three logical groups. The blue group comprises news topics for which “want covered” ranking exceeds the “read” ranking. The orange group has the opposite disposition such that the data for “read” sit to the right side of the data for “want covered”. Unfortunately, the legend up top does more harm than good: it literally takes sides!

**

Here, I've put the data onto a scatter plot:

Redo_junkcharts_aiosnewstopics_1

The two sets of ranks are basically uncorrelated, as the regression line is almost flat, with “R-squared” of 0.02.

The analyst tried to "rescue" the data in the following way. Draw the 45-degree line, and color the points above the diagonal blue, and those below the diagonal orange. Color the points on the line gray. Then, write stories about those three subgroups.

Redo_junkcharts_aiosnewstopics_2

Further, the ranking of what was read came from Parse.ly, which appears to be surveillance data (“traffic analytics”) while the ranking of what people want covered came from an Axios/SurveyMonkey poll. As for as I could tell, there was no attempt to establish that the two populations are compatible and comparable.

 

 

 

 

 


Clearing a forest of labels

This chart by the Financial Times has a strong message, and I like a lot about it:

Ft-europe-growth

The countries are by and large aligned along a diagonal, with the poorer countries growing strongly between 2007-2019 while the richer countries suffered negative growth.

A small issue with the chart is the thick forest of text - redundant text. The sub-title, the axis titles, the quadrant labels, and the left-right-half labels all repeat the same things. In the following chart, I simplify the text:

Redo_fteuropegrowth_text

Typically, I don't put axis titles as a sub-header (or, header of the graphic) but as this may be part of the FT style, I respected it.


A data graphic that solves a consumer problem

Saw this great little sign at Ippudo, the ramen shop, the other day:

Ippudo_board

It's a great example of highly effective data visualization. The names on the board are sake brands. 

The menu (a version of a data table) is the conventional way of displaying this information.

The Question

Customers are selecting a sake. They don't have a favorite, or don't recognize many of these brands. They know a bit about their preferences: I like full-bodied, or I want the dry one. 

The Data

On a menu, the key data are missing. So the first order of business is to find data on full- and light-bodied, and dry and sweet. The pricing data are omitted, possibly because it clutters up the design, or because the shop doesn't want customers to focus on price - or both.

The Visual

The design uses a scatter plot. The customer finds the right quartet, thus narrowing the choices to three or four brands. Then, the positions on the two axes allow the customer to drill down further. 

This user experience is leaps and bounds above scanning a list of names, and asking someone who may or may not be an expert.

Back to the Data

The success of the design depends crucially on selecting the right data. Baked into the scatter plot is the assumption that the designer knows the two factors most influential to the customer's decision. Technically, this is a "variable selection" problem: of all factors determining the brand choice, which two are the most important? 

Think about the downside of selecting the wrong factors. Then, the scatter plot makes it harder to choose the sake compared to the menu. 

 


Visual Exploration of Unemployment Data

The charts on unemployment data I put up last week are best viewed as a collection. 

I have put them up on the (still in beta) JMP Public website. You can find the project here

Screen Shot 2019-01-20 at 1.47.59 PM

I believe that if you make an account, you can grab the underlying dataset.

 


Men and women faced different experiences in the labor market

Last week, I showed how the aggregate statistics, unemployment rate, masked some unusual trends in the labor market in the U.S. Despite the unemployment rate in 2018 being equal, and even a little below, that in 2000, the peak of the last tech boom, there are now significantly more people "not in the labor force," and these people are not counted in the unemployment rate statistic.

The analysis focuses on two factors that are not visible in the unemployment rate aggregate: the proportion of people considered not in labor force, and the proportion of employees who have part-time positions. The analysis itself masks a difference across genders.

It turns out that men and women had very different experiences in the labor market.

For men, things have looked progressively worse with each recession and recovery since 1990. After each recovery, more men exit the labor force, and more men become part-timers. The Great Recession, however, hit men even worse than previous recessions, as seen below:

Jc_unemployment_rate_explained_men

For women, it's a story of impressive gains in the 1990s, and a sad reversal since 2008.

Jc_unemployment_rate_explained_women

P.S. See here for Part 1 of this series. In particular, the color scheme is explained there. Also, the entire collection can be viewed here


What to make of the historically low unemployment rate

One of the amazing economic stories of the moment is the unemployment rate, which at around 4% has returned to the level last reached during the peak of the tech boom in 2000. The story is much more complex than it seems.

I devoted a chapter of Numbersense (link) to explain how the government computes unemployment rates. The most important thing to realize is that an unemployment rate of 4 percent does NOT mean that four out of 100 people in the U.S. are unemployed, and 96 out of 100 are employed.

It doesn't even mean that four out of 100 people of working age are unemployed, and 96 out of 100 of working age are employed.

What it means is of the people that the government decides are "employable", 96 out of 100 are employed. Officially, this employability is known as "in labor force." There are many ways to be disqualified from the labor force; one example is if the government decides that the person is not looking for a job.

On the flip side, who the government counts as "employed" also matters! Part-timers are considered employed. They are counted just like a full-time employee in the unemployment metric. Part-time, according to the government, is one to 34 hours worked during the week the survey is administered.

***

So two factors can affect the unemployment rate a lot - the proportion of the population considered "not in labor force" (thus not counted at all); and the proportion of those considered employed who are part-timers. (Those are two disjoint groups.)

The following chart then shows that despite the unemployment rate looking great, the U.S. labor market in 2018 looks nothing like what it looked like from 1990 to 2008.

Jc_unemployment_rate_explained

Technical notes: all the data are seasonally adjusted by the Bureau of Labor Statistics. I used a spline to smooth the data first - the top chart shows the smoothed version of the unemployment rates. Smoothing removes month-to-month sharp edges from the second chart. The color scale is based on standardized values of the smoothed data.

 

P.S. See Part 2 of this series explores the different experiences of male and female workers. Also, the entire collection can be viewed here.


Message-first visualization

Sneaky Pete via Twitter sent me the following chart, asking for guidance:

Sneakypete_twitter

This is a pretty standard dataset, frequently used in industry. It shows a breakdown of a company's profit by business unit, here classified by "state". The profit projection for the next year is measured on both absolute dollar terms and year-on-year growth.

Since those two metrics have completely different scales, in both magnitude and unit, it is common to use dual axes. In the case of the Economist, they don't use dual axes; they usually just print the second data series in its own column.

***

I first recommended looking at the scatter plot to see if there are any bivariate patterns. In this case, not much insights are provided via the scatter.

From there, I looked at the data again, and ended up with the following pair of bumps charts (slopegraphs):

Redo_jc_sneakypete

A key principle I used is message-first. That is to say, the designer should figure out what message s/he wants to convey via the visualization, and then design the visualization to convey that message.

A second key observation is that the business units are divided into two groups, the two large states (A and F) and the small states (B to E). This is a Pareto principle that very often applies to real-world businesses, i.e. a small number of entities contribute most of the revenues (or profits). It is very likely that these businesses are structured to serve the large and small states differently, and so the separation onto two charts mirrors the internal structure.

Then, within each chart, there is a message. For the large states, it looks like state F is projected to overtake state A next year. That is a big deal because we're talking about the largest unit in the entire company.

For the small states, the standout is state B, decidedly more rosy than the other three small states with similar projected growth rates.

Note also I chose to highlight the actual dollar profits, letting the growth rates be implied in the slopes. Usually, executives are much more concerned about hitting a dollar value than a growth rate target. But that, of course, depends on your management's preference.

 


Crazy rich Asians inspire some rich graphics

On the occasion of the hit movie Crazy Rich Asians, the New York Times did a very nice report on Asian immigration in the U.S.

The first two graphics will be of great interest to those who have attended my free dataviz seminar (coming to Lyon, France in October, by the way. Register here.), as it deals with a related issue.

The first chart shows an income gap widening between 1970 and 2016.

Nyt_crazyrichasians_incomegap1

This uses a two-lines design in a small-multiples setting. The distance between the two lines is labeled the "income gap". The clear story here is that the income gap is widening over time across the board, but especially rapidly among Asians, and then followed by whites.

The second graphic is a bumps chart (slopegraph) that compares the endpoints of 1970 and 2016, but using an "income ratio" metric, that is to say, the ratio of the 90th-percentile income to the 10th-percentile income.

Nyt_crazyrichasians_incomeratio2

Asians are still a key story on this chart, as income inequality has ballooned from 6.1 to 10.7. That is where the similarity ends.

Notice how whites now appears at the bottom of the list while blacks shows up as the second "worse" in terms of income inequality. Even though the underlying data are the same, what can be seen in the Bumps chart is hidden in the two-lines design!

In short, the reason is that the scale of the two-lines design is such that the small numbers are squashed. The bottom 10 percent did see an increase in income over time but because those increases pale in comparison to the large incomes, they do not show up.

What else do not show up in the two-lines design? Notice that in 1970, the income ratio for blacks was 9.1, way above other racial groups.

Kudos to the NYT team to realize that the two-lines design provides an incomplete, potentially misleading picture.

***

The third chart in the series is a marvellous scatter plot (with one small snafu, which I'd get t0).

Nyt_crazyrichasians_byethnicity

What are all the things one can learn from this chart?

  • There is, as expected, a strong correlation between having college degrees and earning higher salaries.
  • The Asian immigrant population is diverse, from the perspectives of both education attainment and median household income.
  • The largest source countries are China, India and the Philippines, followed by Korea and Vietnam.
  • The Indian immigrants are on average professionals with college degrees and high salaries, and form an outlier group among the subgroups.

Through careful design decisions, those points are clearly conveyed.

Here's the snafu. The designer forgot to say which year is being depicted. I suspect it is 2016.

Dating the data is very important here because of the following excerpt from the article:

Asian immigrants make up a less monolithic group than they once did. In 1970, Asian immigrants came mostly from East Asia, but South Asian immigrants are fueling the growth that makes Asian-Americans the fastest-expanding group in the country.

This means that a key driver of the rapid increase in income inequality among Asian-Americans is the shift in composition of the ethnicities. More and more South Asian (most of whom are Indians) arrivals push up the education attainment and household income of the average Asian-American. Not only are Indians becoming more numerous, but they are also richer.

An alternative design is to show two bubbles per ethnicity (one for 1970, one for 2016). To reduce clutter, the smaller ethnicites can be aggregated into Other or South Asian Other. This chart may help explain the driver behind the jump in income inequality.

 

 

 

 

 


Two thousand five hundred ways to say the same thing

Wallethub published a credit card debt study, which includes the following map:

Wallethub_creditcardpaydownbyCity

Let's describe what's going on here.

The map plots cities (N = 2,562) in the U.S. Each city is represented by a bubble. The color of the bubble ranges from purple to green, encoding the percentile ranking based on the amount of credit card debt that was paid down by consumers. Purple represents 1st percentile, the lowest amount of paydown while green represents 99th percentile, the highest amount of paydown.

The bubble size is encoding exactly the same data, apparently in a coarser gradation. The more purple the color, the smaller the bubble. The more green the color, the larger the bubble.

***

The design decisions are baffling.

Purple is more noticeable than the green, but signifies the less important cities, with the lesser paydowns.

With over 2,500 bubbles crowding onto the map, over-plotting is inevitable. The purple bubbles are printed last, dominating the attention but those are the least important cities (1st percentile). The green bubbles, despite being larger, lie underneath the smaller, purple bubbles.

What might be the message of this chart? Our best guess is: the map explores the regional variation in the paydown rate of credit card debt.

The analyst provides all the data beneath the map. 

Wallethub_paydownbyCity_data

From this table, we learn that the ranking is not based on total amount of debt paydown, but the amount of paydown per household in each city (last column). That makes sense.

Shouldn't it be ranked by the paydown rate instead of the per-household number? Divide the "Total Credit Card Paydown by City" by "Total Credit Card Debt Q1 2018" should yield the paydown rate. Surprise! This formula yields a column entirely consisting of 4.16%.

What does this mean? They applied the national paydown rate of 4.16% to every one of 2,562 cities in the country. If they had plotted the paydown rate, every city would attain the same color. To create "variability," they plotted the per-household debt paydown amount. Said differently, the color scale encodes not credit card paydown as asserted but amount of credit card debt per household by city.

Here is a scatter plot of the credit card amount against the paydown amount.

Redo_creditcardpaydown_scatter

A perfect alignment!

This credit card debt paydown map is an example of a QDV chart, in which there isn't a clear question, there is almost no data, and the visual contains several flaws. (See our Trifecta checkup guide.) We are presented 2,562 ways of saying the same thing: 4.16%.

 

P.S. [6/22/2018] Added scatter plot, and cleaned up some language.

 

 

 


Foodies say, add dataviz spice please

This Buzzfeed article proves that foodies love their food served with dataviz (tip: Chris P.). Menus are an undertapped resource when it comes to data visualization.

There are several examples worth discussing.

Buzzfeed-venn-menu

Venn diagrams are not easy to read, people.

Plus they are hard to construct well... note the asymmetric areas.

Here is one without circles:

Jc_redo_vennmenu_1

Then, I pared it down to its essence:

Jc_redo_vennmenu_2

***

This beer map is pretty great:

Buzzfeed-beer-menu

Some of its virtues:

  • The spacious layout utilizing two dimensions, instead of a one-dimensional list of dense text
  • Ordering using two dimensions relevant to the decision problem (assuming those two dimensions are the most important for their clients)
  • Unconventional, attention-grabbing
  • More equitable: different readers will read the chart in different orders. I'll hypothesize that they will end up with a more even distribution of drink orders than with a list in which everyone reads top to bottom

Potential problems:

  • Not enough space to explain the drinks. Don't the clients want to know what's in them?
  • I wonder how they measured the degree of "classic"-ness.

***

This next menu contains an error:

Buzzfeed-coffee-menu

When the drink comes in one size, only one price is listed. If it comes in two sizes, two prices should be listed.

Is the cafe owner shading Americans as not good at math?