NYT hits the trifecta with this market correction chart

Yesterday, in the front page of the Business section, the New York Times published a pair of charts that perfectly captures the story of the ongoing turbulence in the stock market.

Here is the first chart:

Nyt_marketcorrection_1

Most market observers are very concerned about the S&P entering "correction" territory, which the industry arbitrarily defines as a drop of 10% or more from a peak. This corresponds to the shortest line on the above chart.

The chart promotes a longer-term reflection on the recent turbulence, using two reference points: the index has returned to the level even with that at the start of 2018, and about 16 percent higher since the beginning of 2017.

This is all done tastefully in a clear, understandable graphic.

Then, in a bit of a rhetorical flourish, the bottom of the page makes another point:

Myt_marketcorrection2

When viewed back to a 10-year period, this chart shows that the S&P has exploded by 300% since 2009.

A connection is made between the two charts via the color of the lines, plus the simple, effective annotation "Chart above".

The second chart adds even more context, through vertical bands indicating previous corrections (drops of at least 10%). These moments are connected to the first graphic via the beige color. The extra material conveys the message that the market has survived multiple corrections during this long bull period.

Together, the pair of charts addresses a pressing current issue, and presents a direct, insightful answer in a simple, effective visual design, so it hits the Trifecta!

***

There are a couple of interesting challenges related to connecting plots within a multiple-plot framework.

While the beige color connects the concept of "market correction" in the top and bottom charts, it can also be a source of confusion. The orientation and the visual interpretation of those bands differ. The first chart uses one horizontal band while the chart below shows multiple vertical bands. In the first chart, the horizontal band refers to a definition of correction while in the second chart, the vertical bands indicate experienced corrections.

Is there a solution in which the bands have the same orientation and same meaning?

***

These graphs solve a visual problem concerning the visualization of growth over time. Growth rates are anchored to some starting time. A ten-percent reduction means nothing unless you are told ten-percent of what.

Using different starting times as reference points, one gets different values of growth rates. With highly variable series of data like stock prices, picking starting times even a day apart can lead to vastly different growth rates.

The designer here picked several obvious reference times, and superimposes multiple lines on the same plotting canvass. Instead of having four lines on one chart, we have three lines on one, and four lines on the other. This limits the number of messages per chart, which speeds up cognition.

The first chart depicts this visual challenge well. Look at the start of 2018. This second line appears as if you can just reset the start point to 0, and drag the remaining portion of the line down. The part of the top line (to the right of Jan 2018) looks just like the second line that starts at Jan 2018.

Jc_marketcorrection1

However, a closer look reveals that the shape may be the same but the magnitude isn't. There is a subtle re-scaling in addition to the re-set to zero.

The same thing happens at the starting moment of the third line. You can't just drag the portion of the first or second line down - there is also a needed re-scaling.


The merry-go-round of investment bankers

Here is the start of my blog post about the chart I teased the other day:

Businessinsider_ibankers

 

Today's post deals with the following chart, which appeared recently at Business Insider (hat tip: my sister).

It's immediately obvious that this chart requires a heroic effort to decipher. The question shown in the chart title "How many senior investment bankers left their firms?" is the easiest to answer, as the designer places the number of exits in the central circle of each plot relating to a top-tier investment bank (aka "featured bank"). Note that the visual design plays no role in delivering the message, as readers just scan the data from those circles.

Anyone persistent enough to explore the rest of the chart will eventually discover these features...

***

The entire post including an alternative view of the dataset is a guest blog at the JMP Blog here. This is a situation in which plotting everything will make an unreadable chart, and the designer has to think hard about what s/he is really trying to accomplish.


Big Macs in Switzerland are amazing, according to my friend

Bigmac_chNote for those in or near Zurich: I'm giving a Keynote Speech tomorrow morning at the Swiss Statistics Meeting (link). Here is the abstract:

The best and the worst of data visualization share something in common: these graphics provoke emotions. In this talk, I connect the emotional response of readers of data graphics to the design choices made by their creators. Using a plethora of examples, collected over a dozen years of writing online dataviz criticism, I discuss how some design choices generate negative emotions such as confusion and disbelief while other choices elicit positive feelings including pleasure and eureka. Important design choices include how much data to show; which data to highlight, hide or smudge; what research question to address; whether to introduce imagery, or playfulness; and so on. Examples extend from graphics in print, to online interactive graphics, to visual experiences in society.

***

The Big Mac index seems to never want to go away. Here is the latest graphic from the Economist, saying what it says:

Econ_bigmacindex

The index never made much sense to me. I'm in Switzerland, and everything here is expensive. My friend, who is a U.S. transplant, seems to have adopted McDonald's as his main eating-out venue. Online reviews indicate that the quality of the burger served in Switzerland is much better than the same thing in the States. So, part of the price differential can be explained by quality. The index also confounds several other issues, such as local inflation and exchange rate

Now, on to the data visualization, which is primarily an exercise in rolling one's eyeballs. In order to understand the red and blue line segments, our eyes have to hop over the price bubbles to the top of the page. Then, in order to understand the vertical axis labels, unconventionally placed on the right side, our eyes have to zoom over to the left of the page, and search for the line below the header of the graph. Next, if we want to know about a particular country, our eyes must turn sideways and scan from bottom up.

Here is a different take on the same data:

Redo_jc_econbigmac2018

I transformed the data as I don't find it compelling to learn that Russian Big Macs are 60% less than American Big Macs. Instead, on my chart, the reader learns that the price paid for a U.S. Big Mac will buy him/her almost 2 and a half Big Macs in Russia.

The arrows pointing left indicate that in most countries, the values of their currencies are declining relative to the dollar from 2017 to 2018 (at least by the Big Mac Index point of view). The only exception is Turkey, where in 2018, one can buy more Big Macs equivalent to the price paid for one U.S. Big Mac. compared to 2017.

The decimal differences are immaterial so I have grouped the countries by half Big Macs.

This example demonstrates yet again, to make good data visualization, one has to describe an interesting question, make appropriate transformations of the data, and then choose the right visual form. I describe this framework as the Trifecta - a guide to it is here.

(P.S. I noticed that Bitly just decided unilaterally to deactivate my customized Bitly link that was configured years and years ago, when it switched design (?). So I had to re-create the custom link. I have never grasped  why "unreliability" is a feature of the offering by most Tech companies.)


Two good charts can use better titles

NPR has this chart, which I like:

Npr_votersgunpolicy

It's a small multiples of bumps charts. Nice, clear labels. No unnecessary things like axis labels. Intuitive organization by Major Factor, Minor Factor, and Not a Factor.

Above all, the data convey a strong, surprising, message - despite many high-profile gun violence incidents this year, some Democratic voters are actually much less likely to see guns as a "major factor" in deciding their vote!

Of course, the overall importance of gun policy is down but the story of the chart is really about the collapse on the Democratic side, in a matter of two months.

The one missing thing about this chart is a nice, informative title: In two months, gun policy went from a major to a minor issue for some Democratic voters.

***

 I am impressed by this Financial Times effort:

Ft_millennialunemploy

The key here is the analysis. Most lazy analyses compare millennials to other generations but at current ages but this analyst looked at each generation at the same age range of 18 to 33 (i.e. controlling for age).

Again, the data convey a strong message - millennials have significantly higher un(der)employment than previous generations at their age range. Similar to the NPR chart above, the overall story is not nearly as interesting as the specific story - it is the pink area ("not in labour force") that is driving this trend.

Specifically, millennial unemployment rate is high because the proportion of people classified as "not in labour force" has doubled in 2014, compared to all previous generations depicted here. I really like this chart because it lays waste to a prevailing theory spread around by reputable economists - that somehow after the Great Recession, demographics trends are causing the explosion in people classified as "not in labor force". These people are nobodies when it comes to computing the unemployment rate. They literally do not count! There is simply no reason why someone just graduated from college should not be in the labour force by choice. (Dean Baker has a discussion of the theory that people not wanting to work is a long term trend.)

The legend would be better placed to the right of the columns, rather than the top.

Again, this chart benefits from a stronger headline: BLS Finds Millennials are twice as likely as previous generations to have dropped out of the labour force.

 

 

 

 


Hog wild about dot maps

Reader Chris P. sent me this chart.

This was meant to be "light entertainment." See the Twitter discussion below.

9gag_hogsmap

***

Let's think a bit about the dot map as a data graphic.

Dot maps are one dimensional. The dot's location is used to indicate the latitude and longitude and therefore the x,y coordinates cannot encode any other data. If we have basically a black/white chart, as in this hog map, the dot can only encode binary data (yes/no).

The legend says "each dot represents 5,000 hogs." Think about how that statement applies to these scenarios:

  • Do you expect to see something different between the dot representing 4,200 and the one showing 4,900?
  • Do you expect to see something different between the dot representing 400 and 4,000?
  • Do you expect to see something different between the location with 4,800 hogs and 9,600 hogs?


Based on the legend, the designer would need two dots to represent 10,000 hogs. But those two dots pertain to the same location. Sometimes, "jitter" is added, and the two dots are placed side by side. However, with the scale of the map of the U.S., and the dots representing seemingly small neighborhoods, jitter creates more confusion than anything. Also, what about 3, 4, 5, .. dots in the same location?

 9gag_hogmap_inset

Looking at the details above, are the dots jittered or do they represent neighboring locations?

Sometimes, colors are used to encode data on a dot map. But each dot can only contain one color, so it only typically shows the top category in each location.

Dot maps are very limited. Think before you use them.

 


A gem among the snowpack of Olympics data journalism

It's not often I come across a piece of data journalism that pleases me so much. Here it is, the "Happy 700" article by Washington Post is amazing.

Wpost_happy700_map2

 

When data journalism and dataviz are done right, the designers have made good decisions. Here are some of the key elements that make this article work:

(1) Unique

The topic is timely but timeliness heightens both the demand and supply of articles, which means only the unique and relevant pieces get the readers' attention.

(2) Fun

The tone is light-hearted. It's a fun read. A little bit informative - when they describe the towns that few have heard of. The notion is slightly silly but the reader won't care.

(3) Data

It's always a challenge to make data come alive, and these authors succeeded. Most of the data work involves finding, collecting and processing the data. There isn't any sophisticated analysis. But a powerful demonstration that complex analysis is not always necessary.

(4) Organization

The structure of the data is three criteria (elevation, population, and terrain) by cities. A typical way of showing such data might be an annotated table, or a Bumps-type chart, grouped columns, and so on. All these formats try to stuff the entire dataset onto one chart. The designers chose to highlight one variable at a time, cumulatively, on three separate maps. This presentation fits perfectly with the flow of the writing. 

(5) Details

The execution involves some smart choices. I am a big fan of legend/axis labels that are informative, for example, note that the legend doesn't say "Elevation in Meters":

Wpost_happy700_legend

The color scheme across all three maps shows a keen awareness of background/foreground concerns. 


Verging on trust

I’m not quite done with that Verge survey on social media popularity. Last time, I discussed one of the stacked bar charts about how much users like or dislike specific brands such as Facebook and Twitter. Today, I look at the very first chart in the article.

Verge_socialmediavbank

This chart supposedly says users trust Amazon the most among those technology brands, just about the same level as customers trust their bank.

The problems of this chart jump out if we place it side by side with the chart I discussed last time.

Verge_twocharts

The chart on the right has six categories while the one on the left has only five categories. The missing category is “somewhat distrust.” It’s not likely that the "trust" question has asymmetric choices of “greatly distrust”, “somewhat distrust”, “neither trust nor distrust”, and “greatly trust” so I suspect the omission is unintended.

On the "trust" chart (left), the “no opinion/don’t use” category is painted yellow while on the right chart, it is colored gray. The yellow bars represent different things on each chart.

Also inconsistent is the order of the bars. Chart designers should be aware that readers develop certain expectations when going from one chart to the next.

The greatest mystery concerns the lengths of the yellow bars on the left side. I suspect that the brand labels have been wrongly applied. If we believe these labels, then almost 50% of users have “no opinion or don’t use” a bank. That seems highly unlikely. Further, the more popular services such as Amazon and Google apparently have almost double the proportion of users who have “no opinion or don’t use” versus Twitter.

I'd guess that the yellow actually stands for "greatly trust" and the "no opinion/Don't use" has been inadvertently dropped from the chart so part of the legend is incorrect.

 


Using a bardot chart for survey data

Aleks J. wasn't amused by the graphs included in Verge's report about user attitudes toward the major Web brands such as Google, Facebook, and Twitter.

Let's use this one as an example:

Verge_survey_fb

Survey respondents are asked to rate how much they like or dislike the products and services from each of six companies, on a five-point scale. There is a sixth category for "No opinion/Don't use."

In making this set of charts, the designer uses six different colors for the six categories. This means he/she thinks of these categories as discrete so that the difference between categories carries no meaning. In a bipolar, five-point scale, it is more common to pick two extreme colors and then use shades to indicate the degree of liking or disliking. The middle category can be shown in a neutral color to express the neutrality of opinion.

The color choice baffles me. The two most prominent colors, gray and dark blue, correspond to two minor categories (no opinion and neutral) while the most important category - "greatly like" - is painted the modest yellow, paling away.

Verge sees the popularity of Facebook as the key message, which explains its top position among the six brands. However, readers familar with the stacked bar chart form are likely looking to make sense of the order, and frustrated.

***

In revising this chart, I introduce a second level of grouping: the six categories fit into three color groups: red for dislike, gray for no opinion/neutral, and orange for like. The like and dislike groups are plotted at the left and right ends of the chart while the two less informative categories are lumped toward the middle.

Redo_vergesurveyfb_1

I take great pleasure in dumping the legend box.

***

Now, when a five-point scale is used, many analysts like to analyze the Top 2, or Bottom 2 boxes. The choice of colors in the above chart facilitates this analysis. Adding some subtle dots makes it even better!

Redo_vergesurveyfb_2

Because this chart is a superposition of a stacked bar chart and a dot plot, I am calling this a bardot chart.

Also notice that the brands are re-ordered by Top 2 box popularity.

 

 


Details, details, details: giving Zillow a pie treatment

Delinquent_homes_chart
This chart (shown right), published by Zillow in a report on housing in 2012, looks quite standard, apparently avoiding the worst of Excel defaults.

In real estate, it’s all about location. In dataviz, it’s all about details.

What are some details that I caught my eye on this chart?

Readers have to get over the hurdle that “negative equity” is the same as “underwater homes.” This is not readily understood unless one reads the surrounding text. For example, the first row for the U.S. average proclaims that 31% of U.S. homes are “underwater” and among these underwater homes, 10% of the mortgages are delinquent. The former is concerned with the valuation of the property while the latter deals with payments or lack thereof.

According to the legend, the blue segments stand for the proportions of underwater homes in different metro areas but it’s not quite true – the blue part represents underwater but not delinquent mortgages while the red and blue combined represents all underwater mortgages. This is a common problem in stacked bar charts.

The metro areas are in alphabetical order by city, which means an opportunity is missed to help readers discern patterns. Patterns related to city-name alphabets is not of interest to most (except certain econometrics journal editors). Try arranging by region, or by decreasing level of negative equity, or some other meaningful variable.

The designer tried to do something clever with the horizontal axis labels and I don't think it succeeds. To see what is going on, read the note below the chart. The trick is to let readers look at the number of underwater and delinquent mortgages in two ways, as a proportion of underwater mortgages (through the white data labels) and as a proportion of all mortgages (through the axis labels). That's a mess, sorry to say.

Finally, I like the horizontal axis to extend to 100% because underlying the proportions shown in blue and on the horizontal axis is the population of all mortgages.

***

Perhaps a shock to many readers. The task of showing underwater delinquent mortgages simultaneously as a proportion of underwater mortgages and as a proportion of all mortgages is solved using .... pie charts.

I just created a couple of examples here:

Redo_zillowunderwater

The deep orange sector can be compared to the entire circle, or to the larger orange sector. Readers usually don't have a problem with pies with only three slices.


Visualizing electoral college politics: exercise in displaying relationships between variables

Reader Berry B. sent in a tip quite some months ago that I just pulled out of my inbox. He really liked the Washington Post's visualization of the electoral college in the Presidential election. (link)

One of the strengths of this project is the analysis that went on behind the visualization. The authors point out that there are three variables at play: the population of each state, the votes casted by state, and the number of electoral votes by state. A side-by-side comparison of the two tile maps gives a perspective of the story:

Wp_electoralcollege_maps

The under/over representation of electoral votes is much less pronounced if we take into account the propensity to vote. With three metrics at play, there is quite a bit going on. On these maps, orange and blue are used to indicate the direction of difference. Then the shade of the color codes the degree of difference, which was classified into severe versus slight (but only for one direction). Finally, solid squares are used for the comparison with population, and square outlines are for comparison with votes cast.

Pick Florida (FL) for example. On the left side, we have a solid, dark orange square while on the right, we have a square outline in dark orange. From that, we are asked to match the dark orange with the dark orange and to contrast the solid versus the outline. It works to some extent but the required effort seems more than desirable.

***

I'd like to make it easier for readers to see the interplay between all three metrics.

In the following effort, I ditch the map aesthetic, and focus on three transformed measures: share of population, share of popular vote, and share of electoral vote. The share of popular vote is a re-interpretation of what Washington Post calls "votes cast".

The information is best presented by grouping states that behaved similarly. The two most interesting subgroups are the large states like Texas and California where the residents loudly complained that their voice was suppressed by the electoral vote allocation but in fact, the allocated electoral votes were not far from their share of the popular vote! By contrast, Floridians had a more legitimate reason to gripe since their share of the popular vote much exceeded their share of the electoral vote. This pattern also persisted throughout the battleground states.

Redo_wp_electoralcollege

The hardest part of this design is making the legend:

Redo_wp_electoralcollege_legend