Pay levels in the U.S.

The Wall Street Journal published a graphic showing the median pay levels at "most" public companies in the U.S. here.

Wsj_mediancompanypay

People who attended my dataviz seminar might recognize the similarity with the graphic showing internet download speeds by different broadband technologies. It's a clean, clear way of showing multiple comparisons on the same chart.

You can see the distribution of pay levels of companies within each industry grouping, and the vertical lines showing the sector medians allow comparison across sectors. The median pay levels are quite similar with the energy sector leaning higher, and consumer sector leaning lower.

The consumer sector is extremely heavy on the low side of the pay range. Companies like Universal, Abercrombie, Skechers, Mattel, Gap, etc. all pay at least half their employees less than $6,000. The data is sourced to MyLogIQ. I have no knowledge of how reliable or valid the data are. It's curious to me that Dunkin Brands showed a median of $110K while Starbucks showed $13K.

Wsj_medianpay_dunkinstarbucks

***

I like the interactive features.

The window control lets the user zoom in to different parts of the pay range. This is necessary because of the extremely high salaries. The control doubles as a presentation of the overall distribution of median salaries.

The text box can be used to add data labels to specific companies.

***

See previous discussion of WSJ Graphics.

 


The ebb and flow of an effective dataviz showing the rise and fall of GE

Wsj_ebbflowGE_800A WSJ chart caught my eye the other day – I spotted someone looking at it in a coffee shop, and immediately got a hold of a copy. The chart plots the ebb and flow of GE’s revenues from the 1980s to the present.

What grabbed my attention? The less-used chart form, and the appealing but not too gaudy color scheme.

The chart presents a highly digestible view of the structure of GE’s revenues. We learn about GE’s major divisions, as well as how certain segments split from or merged with others over time. Major acquisitions and divestitures are also depicted; if these events are the main focus, the designer should find ways to make these moments stand out more.

An interesting design decision concerns the sequence of the divisions. One possible order is by increasing or decreasing importance, typically indicated by proportional revenues. This is complicated by the changing nature of the business over the decades. So financial services went from nothing to the largest division by far to almost disappearing.

The sequencing need not be data-driven; it can be design-constrained. The merging and splitting of business units are conveyed via linking arrows. Longer arrows are unsightly, and meshes of arrows are confusing.

On this chart, the long arrow pointing from the orange to the gray around 2004 feels out of place. What if the financial services block is moved to the right of the consumer block? That will significantly shorten the long arrow. It won’t create other entanglements as the media block is completely disjoint and there are no other arrows tying financial services to another division.

 

***


To improve readability, the bars are spaced out horizontally. The addition of whitespace distorts the proportionality. So, in 2001, the annotation states that financial services (orange) accounted for “about half of the revenues,” which is directly contradicted by the visual perception – readers find the orange bar to be clearly shorter than the total length of the other bars. This is a serious deficiency of the chart form but this chart conveys the "ebb and flow" very well.


Making people jump over hoops

Take a look at the following chart, and guess what message the designer wants to convey:

Wsj_brokercensus

This chart accompanied an article in the Wall Street Journal about Wells Fargo losing brokers due to the fake account scandal, and using bonuses to lure them back. Like you, my first response to the chart was that little has changed from 2015 to 2017.

It is a bit mysterious the intention of the whitespace inserted to split the four columns into two pairs. It's not obvious that UBS and Merrill are different from Wells Fargo and Morgan Stanley. This device might have been used to overcome the difficulty of reading four columns side by side.

The additional challenge of this dataset is the outlier values for UBS, which elongates the range of the vertical axis, squeezing together the values of the other three banks.

In this first alternative version, I play around with irregular gridlines.

Jc_redo_wsjbrokercensus1

Grouped column charts are not great at conveying changes over time, as they cause our eyes to literally jump over hoops. In the second version, I use a bumps chart to compactly highlight the trends. I also zoom in on the quarterly growth rates.

Jc_redo_wsjbrokercensus2

The rounded interpolation removes the sharp angles from the typical bumps chart (aka slopegraph) but it does add patterns that might not be there. This type of interpolation however respects the values at the "knots" (here, the quarterly values) while a smoother may move those points. On balance, I like this treatment.

 

PS. [6/2/2017] Given the commentary below, I am including the straight version of the chart, so you can compare. The straight-line version is more precise. One aspect of this chart form I dislike is the sharp angles. When there are more lines, it gets very entangled.

Jc_redo_wsjbrokercensus3


Lines that delight, lines that blight

This WSJ graphic caught my eye. The accompanying article is here.

Wsj_ipo_dealdrought_full

The article (judging from the sub-header) makes two separate points, one about the total amount of money raised in IPOs in a year, and the change in market value of those newly-public companies one year from the IPO date.

The first metric is shown by the size of the bubbles while the second metric is displayed as distances from the horizontal axis. (The second metric is further embedded, in a simplified, binary manner, in the colors of the bubbles.)

The designer has decided that the second metric - performance after IPO - to be more important. Therefore, it is much easier for readers to know how each annual cohort of IPOs has performed. The use of color to map to the second metric (and not the first) also helps to emphasize the second metric.

There are details on this chart that I admire. The general tidiness of it. The restraint on the gridlines, especially along the horizontal ones. The spatial balance. The annotation.

And ah, turning those bubbles into lollipops. Yummy! Those dotted lines allow readers to find the center of each bubble, which is where the values of the second metrics lie. Frequently, these bubble charts are presented without those guiding lines, and it is often hard to find the circles' anchors.

That leaves one inexplicable decision - why did they place two vertical gridlines in the middle of two arbitrary years?


Story within story, bar within bar

This Wall Street Journal offering caught my eye.

Wsj_gender_workforce_sm

It's the unusual way of displaying proportions.

Your first impression is to interpret the graphic as a bar chart. But it really is a bar within a bar: the crux of the matter - gender balance - is embedded in individual bars.

Instead of pie charts or stacked bar charts, we see  stacked columns within each bar.

I see what the designer is attempting to accomplish. The first message is the sharp decline in gender equality at higher job titles. The next message is the sharp drop in the frequency of higher job titles.

This chart is a variant of the "Marimekko" chart (beloved by management consultants), also called the mosaic chart. The only difference being how the distribution of jobs in the work force is coded.

The Marimekko is easier to understand:

Redo_wsjgenderworkforce_mekko2

A key advantage of this version is to be found in the thin columns.

Here is another way to visualize this data, drawing attention to the gender gap.

Redo_wsjgenderworkforce_lines

In the other versions, the reader must do subtractions to figure out the size of the gaps.


Here are the cool graphics from the election

There were some very nice graphics work published during the last few days of the U.S. presidential election. Let me tell you why I like the following four charts.

FiveThirtyEight's snake chart

Snake-1106pm

This chart definitely hits the Trifecta. It is narrowly focused on the pivotal questions of election night: which candidate is leading? if current projections hold, which candidate would win? how is the margin of victory?

The chart is symmetric so that the two sides have equal length. One can therefore immediately tell which side is in the lead by looking at the middle. With a little more effort, one can also read from the chart which side has more electoral votes based only on the called states: this would be by comparing the white parts of each snake. (This is made difficult by the top-bottom mirroring. That is an unfortunate design decision - I'd would have preferred to not have the top-bottom reversal.)

The length of each segment maps to the number of electoral votes for the particular state, and the shade of colors reflect the size of the advantage.

In a great illustration of less is more, by aggregating all called states into a single white segment, and not presenting the individual results, the 538 team has delivered a phenomenal chart that is refreshing, informative, and functional.

 Compare with a more typical map:

Electoral-map

 New York Times's snake chart

Snakes must be the season's gourmet meat because the New York Times also got inspired by those reptiles by delivering a set of snake charts (link). Here's one illustrating how different demographic segments picked winners in the last four elections.

 

Nytimes_partysupport_by_income

They also made a judicious decision by highlighting the key facts and hiding the secondary ones. Each line connects four points of data but only the beginning and end of each line are labeled, inviting readers to first and foremost compare what happened in 2004 with what happened in 2016. The middle two elections were Obama wins.

This particular chart may prove significant for decades to come. It illustrates that the two parties may be arriving at a cross-over point. The Democrats are driving the lower income classes out of their party while the upper income classes are jumping over to blue.

While the chart's main purpose is to display the changes within each income segment, it does allow readers to address a secondary question. By focusing only on the 2004 endpoints, one can see the almost linear relationship between support and income level. Then focusing on the 2016 endpoints, one can also see an almost linear relationship but this is much steeper, meaning the spread is much narrower compared to the situation in 2004. I don't think this means income matters a lot less - I just think this may be the first step in an ongoing demographic shift.

This chart is both fun and easy to read, packing quite a bit of information into a small space.

 

Washington Post's Nation of Peaks

The Post prints a map that shows, by county, where the votes were and how the two Parties built their support. (Link to original)

Wpost_map_peaks

The height represents the number of voters and the width represents the margin of victory. Landslide victories are shown with bolded triangles. In the online version, they chose to turn the map sideways.

I particularly like the narratives about specific places.

This is an entertaining visual that draws you in to explore.

 

Andrew Gelman's Insight

If you want quantitative insights, it's a good idea to check out Andrew Gelman's blog.

This example is a plain statistical graphic but it says something important:

Gelman_twopercent

There is a lot of noise about how the polls were all wrong, the entire polling industry will die, etc.

This chart shows that the polls were reasonably accurate about Trump's vote share in most Democratic states. In the Republican states, these polls consistently under-estimated Trump's advantage. You see the line of red states starting to bend away from the diagonal.

If the total error is about 2%, as stated in the caption of the chart, then the average error in the red states must have been about 4%.

This basic chart advances our understanding of what happened on election night, and why the result was considered a "shock."

 

 


An example of focusing the chart on a message

Via Jimmy Atkinson on Twitter, I am alerted to this chart from the Wall Street Journal.

Wsj_fiscalconstraints

The title of the article is "Fiscal Constraints Await the Next President." The key message is that "the next president looks to inherit a particularly dismal set of fiscal circumstances." Josh Zumbrun, who tipped Jimmy about this chart on Twitter, said that it is worth spending time on.

I like the concept of the chart, which juxtaposes the economic condition that faced each president at inauguration, and how his performance measured against expectation, as represented by CBO predictions.

The top portion of the graphic did require significant time to digest:

Wsj_fiscalconstraints_top

A glance at the sidebar informs me that there are two scenarios being depicted, the CBO projections and the actual deficit-to-GDP ratios. Then I got confused on several fronts.

One can of course blame the reader (me) for mis-reading the chart but I think dataviz faces a "the reader is always right" situation -- although there can be multiple types of readers for a given graphic so maybe it should say "the readers are always right."

I kept lapsing into thinking that the bold lines (in red and blue) are actual values while the gray line/area represents the predictions. That's because in most financial charts, the actual numbers are in the foreground and the predictions act as background reference materials. But in this rendering, it's the opposite.

For a while, a battle was raging in my head. There are a few clues that the bold red/blue lines cannot represent actual values. For one thing, I don't recall Reagan as a surplus miracle worker. Also, some of the time periods overlap, and one assumes that the CBO issued one projection only at a given time. The Obama line also confused me as the headline led me to expect an ugly deficit but the blue line is rather shallow.

Then, I got even more confused by the units on the vertical axis. According to the sidebar, the metric is deficit-to-GDP ratio. The majority of the line live in the negative territory. Does the negative of the negative imply positive? Could the sharp upward turn of the Reagan line indicate massive deficit spending? Or maybe the axis should be relabelled surplus-to-GDP ratio?

***

As I proceeded to re-create this graphic, I noticed that some of the tick marks are misaligned. There are various inconsistencies related to the start of each projection, the duration of the projection, the matching between the boxes and the lines, etc. So the data in my version is just roughly accurate.

To me, this data provide a primary reference to how presidents perform on the surplus/deficit compared to expectations as established by the CBO projections.

Redo_wsj_deficitratios

I decided to only plot the actual surplus/deficit ratios for the duration of each president's tenure. The start of each projection line is the year in which the projection is made (as per the original). We can see the huge gap in every case. Either the CBO analysts are very bad at projections, or the presidents didn't do what they promised during the elections.

 

 

 


Denver outspends everyone on this

Someone at the Wall Street Journal noticed that Denver's transit agency has outspent other top transit agencies, after accounting for number of rides -- and by a huge margin.

But the accompanying graphic conspires against the journalist.

Wsj_denverRail

For one thing, Denver is at the bottom of the page. Denver's two bars do not stand out in any way. New York's transit system dwarfs everyone else in both number of rides and total capital expenses and funding. And the division into local, state, and federal sources of funds is on the page, absorbing readers' mindspace for unknown reasons.

But Denver is an outlier, as can be seen here:

Redo_transit2

 


Super-informative ping-pong graphic

Via Twitter, Mike W. asked me to comment on this WSJ article about ping pong tables. According to the article, ping pong table sales track venture-capital deal flow:

Wsj_pingpongsales

This chart is super-informative. I learned a lot from this chart, including:

  • Very few VC-funded startups play ping pong, since the highlighted reference lines show 1000 deals and only 150 tables (!)
  • The one San Jose store interviewed for the article is the epicenter of ping-pong table sales, therefore they can use it as a proxy for all stores and all parts of the country
  • The San Jose store only does business with VC startups, which is why they attribute all ping-pong tables sold to these companies
  • Startups purchase ping-pong tables in the same quarter as their VC deals, which is why they focus only on within-quarter comparisons
  • Silicon Valley startups only source their office equipment from Silicon Valley retailers
  • VC deal flow has no seasonality
  • Ping-pong table sales has no seasonality either
  • It is possible to predict the past (VC deals made) by gathering data about the future (ping-pong tables sold)

Further, the chart proves that one can draw conclusions from a single observation. Here is what the same chart looks like after taking out the 2016 Q1 data point:

Redo_pingpongsales2

This revised chart is also quite informative. I learned:

  • At the same level of ping-pong-table sales (roughly 150 tables), the number of VC deals ranged from 920 to 1020, about one-third of the vertical range shown in the original chart
  • At the same level of VC deals (roughly 1000 deals), the number of ping-pong tables sold ranged from 150 to 230, about half of the horizontal range of the original chart

The many quotes in the WSJ article also tell us that people in Silicon Valley are no more data-driven than people in other parts of the country.


The surprising impact of mixing chart forms

At first glance, this Wall Street Journal chart seems unlikely to impress as it breaks a number of "rules of thumb" frequently espoused by dataviz experts. The inconsistency of mixing a line chart and a dot plot. The overplotting of dots. The ten colors...

Wsj_oilpredict_Feb16

However, I actually like this effort. The discontinuity of chart forms nicely aligns with the split between the actual price movements on the left side and the projections on the right side.

The designer also meticulously placed the axis labels with monthly labels for actual price movements and quarterly labels for projections.

Even the ten colors are surprisingly manageable. I am not sure we need to label all those banks; maybe just the ones at the extremes. If we clear out some of these labels, we can make room for a median line.

***

How good are these oil price predictions? It is striking that every bank shown is predicting that oil prices have hit a bottom, and will start recovering in the next few quarters. Contrast this with the left side of the chart, where the line is basically just tumbling down.

Step back six months earlier, to September 2015. The same chart looks like this:

  Wsj_oilpredict_sept15

 Again, these analysts were calling a bottom in prices and predicting a steady rise over the next quarters.

The track record of these oil predictions is poor:

Wsj_oilpredict_sep15_evaluated

The median analyst predicted oil prices to reach $50 by Q1 of 2016. Instead, prices fell to $30.

Given this track record, it's shocking that these predictions are considered newsworthy. One wonders how these predictions are generated, and how did the analysts justify ignoring the prevailing trend.