Re-thinking a standard business chart of stock purchases and sales

Here is a typical business chart.

Cetera_amd_chart

A possible story here: institutional investors are generally buying AMD stock, except in Q3 2018.

Let's give this chart a three-step treatment.

STEP 1: The Basics

Remove the data labels, which stand sideways awkwardly, and are redundant given the axis labels. If the audience includes people who want to take the underlying data, then supply a separate data table. It's easier to copy and paste from, and doing so removes clutter from the visual.

The value axis is probably created by an algorithm - hard to imagine someone deliberately placing axis labels  $262 million apart.

The gridlines are optional.

Redo_amdinstitution_1

STEP 2: Intermediate

Simplify and re-organize the time axis labels; show the quarter and year structure. The years need not repeat.

Align the vocabulary on the chart. The legend mentions "inflows and outflows" while the chart title uses the words "buying and selling". Inflows is buying; outflows is selling.

Redo_amdinstitution_2

STEP 3: Advanced

This type of data presents an interesting design challenge. Arguably the most important metric is the net purchases (or the net flow), i.e. inflows minus outflows. And yet, the chart form leaves this element in the gaps, visually.

The outflows are numerically opposite to inflows. The sign of the flow is encoded in the color scheme. An outflow still points upwards. This isn't a criticism, but rather a limitation of the chart form. If the red bars are made to point downwards to indicate negative flow, then the "net flow" is almost impossible to visually compute!

Putting the columns side by side allows the reader to visually compute the gap, but it is hard to visually compare gaps from quarter to quarter because each gap is hanging off a different baseline.

The following graphic solves this issue by focusing the chart on the net flows. The buying and selling are still plotted but are deliberately pushed to the side:

Redo_amd_1

The structure of the data is such that the gray and pink sections are "symmetric" around the brown columns. A purist may consider removing one of these columns. In other words:

Redo_amd_2

Here, the gray columns represent gross purchases while the brown columns display net purchases. The reader is then asked to infer the gross selling, which is the difference between the two column heights.

We are almost back to the original chart, except that the net buying is brought to the foreground while the gross selling is pushed to the background.

 


Watching a valiant effort to rescue the pie chart

Today we return to the basics. In a twitter exchange with Dean E., I found the following pie chart in an Atlantic article about who's buying San Francisco real estate:

Atlantic_sfrealestatepie

The pie chart is great at one thing, showing how workers in the software industry accounted for half of the real estate purchases. (Dean and I both want to see more details of the analysis as we have many questions about the underlying data. In this post, I ignore these questions.)

After that, if we want to learn anything else from the pie chart, we have to read the data labels. This calls for one of my key recommendations: make your charts sufficient. The principle of self-sufficiency is that the visual elements of the data graphic should by themselves say something about the data. The test of self-sufficiency is executed by removing the data printed on the chart so that one can assess how much work the visual elements are performing. If the visual elements require data labels to work, then the data graphic is effectively a lookup table.

This is the same pie chart, minus the data:

Redo_atlanticsfrealestate_sufficiency

Almost all pie charts with a large number of slices are packed with data labels. Think of the labeling as a corrective action to fix the shortcoming of the form.

Here is a bar chart showing the same data:

Junkcharts_redo_atlanticsfrealestatebar

***

Let's look at all the efforts made to overcome the lack of self-sufficiency.

Here is a zoom-in on the left side of the chart:

Redo_atlanticsfrealestate_labeling_1

Data labels are necessary to help readers perceive the sizes of the slices. But as the slices are getting smaller, the labels are getting too dense, so the guiding lines are being stretched.

Eventually, the designer gave up on labeling every slice. You can see that some slices are missing labels:

Redo_atlanticsfrealestate_labeling_3

The designer also had to give up on sequencing the slices by the data. For example, hardware with a value of 2.4% should be placed between Education and Law. It is shifted to the top left side to make the labeling easier.

Redo_atlanticsfrealestate_labeling_2

Fitting all the data labels to the slices becomes the singular task at hand.

 


Visual Exploration of Unemployment Data

The charts on unemployment data I put up last week are best viewed as a collection. 

I have put them up on the (still in beta) JMP Public website. You can find the project here

Screen Shot 2019-01-20 at 1.47.59 PM

I believe that if you make an account, you can grab the underlying dataset.

 


What to make of the historically low unemployment rate

One of the amazing economic stories of the moment is the unemployment rate, which at around 4% has returned to the level last reached during the peak of the tech boom in 2000. The story is much more complex than it seems.

I devoted a chapter of Numbersense (link) to explain how the government computes unemployment rates. The most important thing to realize is that an unemployment rate of 4 percent does NOT mean that four out of 100 people in the U.S. are unemployed, and 96 out of 100 are employed.

It doesn't even mean that four out of 100 people of working age are unemployed, and 96 out of 100 of working age are employed.

What it means is of the people that the government decides are "employable", 96 out of 100 are employed. Officially, this employability is known as "in labor force." There are many ways to be disqualified from the labor force; one example is if the government decides that the person is not looking for a job.

On the flip side, who the government counts as "employed" also matters! Part-timers are considered employed. They are counted just like a full-time employee in the unemployment metric. Part-time, according to the government, is one to 34 hours worked during the week the survey is administered.

***

So two factors can affect the unemployment rate a lot - the proportion of the population considered "not in labor force" (thus not counted at all); and the proportion of those considered employed who are part-timers. (Those are two disjoint groups.)

The following chart then shows that despite the unemployment rate looking great, the U.S. labor market in 2018 looks nothing like what it looked like from 1990 to 2008.

Jc_unemployment_rate_explained

Technical notes: all the data are seasonally adjusted by the Bureau of Labor Statistics. I used a spline to smooth the data first - the top chart shows the smoothed version of the unemployment rates. Smoothing removes month-to-month sharp edges from the second chart. The color scale is based on standardized values of the smoothed data.

 

P.S. See Part 2 of this series explores the different experiences of male and female workers. Also, the entire collection can be viewed here.


NYT hits the trifecta with this market correction chart

Yesterday, in the front page of the Business section, the New York Times published a pair of charts that perfectly captures the story of the ongoing turbulence in the stock market.

Here is the first chart:

Nyt_marketcorrection_1

Most market observers are very concerned about the S&P entering "correction" territory, which the industry arbitrarily defines as a drop of 10% or more from a peak. This corresponds to the shortest line on the above chart.

The chart promotes a longer-term reflection on the recent turbulence, using two reference points: the index has returned to the level even with that at the start of 2018, and about 16 percent higher since the beginning of 2017.

This is all done tastefully in a clear, understandable graphic.

Then, in a bit of a rhetorical flourish, the bottom of the page makes another point:

Myt_marketcorrection2

When viewed back to a 10-year period, this chart shows that the S&P has exploded by 300% since 2009.

A connection is made between the two charts via the color of the lines, plus the simple, effective annotation "Chart above".

The second chart adds even more context, through vertical bands indicating previous corrections (drops of at least 10%). These moments are connected to the first graphic via the beige color. The extra material conveys the message that the market has survived multiple corrections during this long bull period.

Together, the pair of charts addresses a pressing current issue, and presents a direct, insightful answer in a simple, effective visual design, so it hits the Trifecta!

***

There are a couple of interesting challenges related to connecting plots within a multiple-plot framework.

While the beige color connects the concept of "market correction" in the top and bottom charts, it can also be a source of confusion. The orientation and the visual interpretation of those bands differ. The first chart uses one horizontal band while the chart below shows multiple vertical bands. In the first chart, the horizontal band refers to a definition of correction while in the second chart, the vertical bands indicate experienced corrections.

Is there a solution in which the bands have the same orientation and same meaning?

***

These graphs solve a visual problem concerning the visualization of growth over time. Growth rates are anchored to some starting time. A ten-percent reduction means nothing unless you are told ten-percent of what.

Using different starting times as reference points, one gets different values of growth rates. With highly variable series of data like stock prices, picking starting times even a day apart can lead to vastly different growth rates.

The designer here picked several obvious reference times, and superimposes multiple lines on the same plotting canvass. Instead of having four lines on one chart, we have three lines on one, and four lines on the other. This limits the number of messages per chart, which speeds up cognition.

The first chart depicts this visual challenge well. Look at the start of 2018. This second line appears as if you can just reset the start point to 0, and drag the remaining portion of the line down. The part of the top line (to the right of Jan 2018) looks just like the second line that starts at Jan 2018.

Jc_marketcorrection1

However, a closer look reveals that the shape may be the same but the magnitude isn't. There is a subtle re-scaling in addition to the re-set to zero.

The same thing happens at the starting moment of the third line. You can't just drag the portion of the first or second line down - there is also a needed re-scaling.


The merry-go-round of investment bankers

Here is the start of my blog post about the chart I teased the other day:

Businessinsider_ibankers

 

Today's post deals with the following chart, which appeared recently at Business Insider (hat tip: my sister).

It's immediately obvious that this chart requires a heroic effort to decipher. The question shown in the chart title "How many senior investment bankers left their firms?" is the easiest to answer, as the designer places the number of exits in the central circle of each plot relating to a top-tier investment bank (aka "featured bank"). Note that the visual design plays no role in delivering the message, as readers just scan the data from those circles.

Anyone persistent enough to explore the rest of the chart will eventually discover these features...

***

The entire post including an alternative view of the dataset is a guest blog at the JMP Blog here. This is a situation in which plotting everything will make an unreadable chart, and the designer has to think hard about what s/he is really trying to accomplish.


Message-first visualization

Sneaky Pete via Twitter sent me the following chart, asking for guidance:

Sneakypete_twitter

This is a pretty standard dataset, frequently used in industry. It shows a breakdown of a company's profit by business unit, here classified by "state". The profit projection for the next year is measured on both absolute dollar terms and year-on-year growth.

Since those two metrics have completely different scales, in both magnitude and unit, it is common to use dual axes. In the case of the Economist, they don't use dual axes; they usually just print the second data series in its own column.

***

I first recommended looking at the scatter plot to see if there are any bivariate patterns. In this case, not much insights are provided via the scatter.

From there, I looked at the data again, and ended up with the following pair of bumps charts (slopegraphs):

Redo_jc_sneakypete

A key principle I used is message-first. That is to say, the designer should figure out what message s/he wants to convey via the visualization, and then design the visualization to convey that message.

A second key observation is that the business units are divided into two groups, the two large states (A and F) and the small states (B to E). This is a Pareto principle that very often applies to real-world businesses, i.e. a small number of entities contribute most of the revenues (or profits). It is very likely that these businesses are structured to serve the large and small states differently, and so the separation onto two charts mirrors the internal structure.

Then, within each chart, there is a message. For the large states, it looks like state F is projected to overtake state A next year. That is a big deal because we're talking about the largest unit in the entire company.

For the small states, the standout is state B, decidedly more rosy than the other three small states with similar projected growth rates.

Note also I chose to highlight the actual dollar profits, letting the growth rates be implied in the slopes. Usually, executives are much more concerned about hitting a dollar value than a growth rate target. But that, of course, depends on your management's preference.

 


Plotted performance guaranteed not to predict future performance

On my flight back from Lyon, I picked up a French magazine, and found the following chart:

French interest rates chart small

A quick visit to Bing Translate tells me that this chart illustrates the rates of return of different types of investments. The headline supposedly says "Only the risk pays". In many investment brochures, after presenting some glaringly optimistic projections of future returns, the vendor legally protects itself by proclaiming "Past performance does not guarantee future performance."

For this chart, an appropriate warning is PLOTTED PERFORMANCE GUARANTEED NOT TO PREDICT THE FUTURE!

***

Two unusual decisions set this chart apart:

1. The tree ring imagery, which codes the data in the widths of concentric rings around a common core

2. The placement of larger numbers toward the middle, and smaller numbers in the periphery.

When a reader takes in the visual design of this chart, what is s/he drawn to?

The designer evidently hopes the reader will focus on comparing the widths of the rings (A), while ignoring the areas or the circumferences. I think it is more likely that the reader will see one of the following:

(B) the relative areas of the tree rings

(C) the areas of the full circles bounded by the circumferences

(D) the lengths of the outer rings

(E) the lengths of the inner rings

(F) the lengths of the "middle" rings (defined as the average of the outer and inner rings)

Here is a visualization of six ways to "see" what is on the French rates of return chart:

Redo_jc_frenchinterestrates_1

Recall the Trifecta Checkup (link). This is an example where "What does the visual say" and "What does the data say" may be at variance. In case (A), if the reader is seeing the ring widths, then those two aspects are in sync. In every other case, the two aspects are disconcordant. 

The level of distortion is visualized in the following chart:

Redo_jc_frenchinterestrates_2

Here, I normalized everything to the size of the SCPI data. The true data is presented by the ring width column, represented by the vertical stripes on the left. If the comparisons are not distorted, the other symbols should stay close to the vertical stripes. One notices there is always distortion in cases (B)-(F). This is primarily due to the placement of the large numbers near the center and the small numbers near the edge. In other words, the radius is inversely proportional to the data!

 The amount of distortion for most cases ranges from 2 to 6 times. 

While the "ring area" (B) version is least distorted on average, it is perhaps the worst of the six representations. The level of distortion is not a regular function of the size of the data. The "sicav monetaries" (smallest data) is the least distorted while the data of medium value are the most distorted.

***

To improve this chart, take a hint from the headline. Someone recognizes that there is a tradeoff between risk and return. The data series shown, which is an annualized return, only paints the return part of the relationship. 

 

 

 


The downside of discouraging pie charts

It's no secret most dataviz experts do not like pie charts.

Our disdain for pie charts causes people to look for alternatives.

Sometimes, the alternative is worse. Witness:

Schwab_bloombergaggregatebondindex

This chart comes from the Spring 2018 issue of On Investing, the magazine for Charles Schwab customers.

It's not a pie chart.

Redo_jc_bondindex

I'm forced to say the pie chart is preferred.

The original chart fails the self-sufficiency test. Here is the 2007 chart with the data removed.

Bloombergbondindex_sufficiency

It's very hard to figure out how large are those pieces, so any reader trying to understand this chart will resort to reading the data, which means the visual representation does no work!

Or, you can use a dot plot.

Redo_jc_bondindex2

This version emphasizes the change over time.

 


Two thousand five hundred ways to say the same thing

Wallethub published a credit card debt study, which includes the following map:

Wallethub_creditcardpaydownbyCity

Let's describe what's going on here.

The map plots cities (N = 2,562) in the U.S. Each city is represented by a bubble. The color of the bubble ranges from purple to green, encoding the percentile ranking based on the amount of credit card debt that was paid down by consumers. Purple represents 1st percentile, the lowest amount of paydown while green represents 99th percentile, the highest amount of paydown.

The bubble size is encoding exactly the same data, apparently in a coarser gradation. The more purple the color, the smaller the bubble. The more green the color, the larger the bubble.

***

The design decisions are baffling.

Purple is more noticeable than the green, but signifies the less important cities, with the lesser paydowns.

With over 2,500 bubbles crowding onto the map, over-plotting is inevitable. The purple bubbles are printed last, dominating the attention but those are the least important cities (1st percentile). The green bubbles, despite being larger, lie underneath the smaller, purple bubbles.

What might be the message of this chart? Our best guess is: the map explores the regional variation in the paydown rate of credit card debt.

The analyst provides all the data beneath the map. 

Wallethub_paydownbyCity_data

From this table, we learn that the ranking is not based on total amount of debt paydown, but the amount of paydown per household in each city (last column). That makes sense.

Shouldn't it be ranked by the paydown rate instead of the per-household number? Divide the "Total Credit Card Paydown by City" by "Total Credit Card Debt Q1 2018" should yield the paydown rate. Surprise! This formula yields a column entirely consisting of 4.16%.

What does this mean? They applied the national paydown rate of 4.16% to every one of 2,562 cities in the country. If they had plotted the paydown rate, every city would attain the same color. To create "variability," they plotted the per-household debt paydown amount. Said differently, the color scale encodes not credit card paydown as asserted but amount of credit card debt per household by city.

Here is a scatter plot of the credit card amount against the paydown amount.

Redo_creditcardpaydown_scatter

A perfect alignment!

This credit card debt paydown map is an example of a QDV chart, in which there isn't a clear question, there is almost no data, and the visual contains several flaws. (See our Trifecta checkup guide.) We are presented 2,562 ways of saying the same thing: 4.16%.

 

P.S. [6/22/2018] Added scatter plot, and cleaned up some language.