The tech world in which everyone is below average

Laura pointed me to an infographic about tech worker salaries in major tech hubs (link).

What's wrong with this map?

Entrepreneur_techsalaries_map

The box "Global average" is doubly false. It is not global, and it is not the average!

The only non-American cities included in this survey are Toronto, Paris and London.

The only city with average salary above the "Global average" is San Francisco Bay Area. Since the Bay Area does not outweigh all other cities combined in the number of tech workers, it is impossible to get an average of $135,000.

***

Here is the second chart.

What's wrong with these lines?

Entrepreneur_techsalaries_lines

This chart frustrates the reader's expectations. The reader interprets it as a simple line chart, based on three strong hints:

  • time along the horizontal axis
  • data labels show dollar units
  • lines linking time

Each line seems to show the trend of average tech worker salary, in dollar units.

However, that isn't the designer's intention. Let's zoom in on Chicago and Denver:

Entrepreneur_techsalaries_lines2

The number $112,000 (Denver) sits below the number $107,000 (Chicago). It appears that each chart has its own scale. But that's not the case either.

For a small-multiples setup, we expect all charts should use the same scale. Even though the data labels are absolute dollar amounts, the vertical axis is on a relative scale (percent change). To make things even more complicated, the percent change is computed relative to the minimum of the three annual values, no matter which year it occurs.

Redo_entrepreneurtechsalarieslines2

That's why $106,000 (Chicago) is at the same level as $112,000 (Denver). Those are the minimum values in the respective time series. As shown above, these line charts are easier to understand if the axis is displayed in its true units of percent change.

The choice of using the minimum value as the reference level interferes with comparing one city to the next. For Chicago, the line chart tells us 2015 is about 2 percent above 2016 while 2017 is 6 percent above. For Denver, the line chart tells us that 2016 is about 2 percent above the 2015 and 2017 values. Now what's the message again?

Here I index all lines to the earliest year.

  Redo_junkcharts_entrepreneurtechsalaries_lines

In a Trifecta Checkup analysis (link), I'd be suspicious of the data. Did tech salaries in London really drop by 15-20 percent in the last three years?

 

 


Diverging paths for rich and poor, infographically

Ray Vella (link) asked me to comment on a chart about regional wealth distribution, which I wrote about here. He also asked students in his NYU infographics class to create their own versions.

This effort caught my eye:

Nyu_redo_richpoor

This work is creative, and I like the concept of using two staircases to illustrate the diverging fortunes of the two groups. This is worlds away from the original Economist chart.

The infographic does have a serious problem. In one of my dataviz talks, I talk about three qualifications of work called "data visualization." The first qualification is that the data visualization has to display the data. This is an example of an infographic that is invariant to the data.

Is it possible to salvage the concept? I tried. Here is an idea:

Redo_econ_richpoor_infog2

I abandoned the time axis so the data plotted are only for 2015, and the countries are shown horizontally from most to least equal. I'm sure there are ways to do it even better.

Infographics can be done while respecting the data. Ray is one of the designers who appreciate this. And thanks Ray for letting me blog about this.

 

 

 


Egregious chart brings back bad memories

My friend Alberto Cairo said it best: if you see bullshit, say "bullshit!"

He was very incensed by this egregious "infographic": (link to his post)

Aul_vs_pp

Emily Schuch provided a re-visualization:

Emilyschuch_pp

The new version provides a much richer story of how Planned Parenthood has shifted priorities over the last few years.

It also exposed what the AUL (American United for Life) organization distorted the story.

The designer extracted only two of the lines, thus readers do not see that the category of services that has really replaced the loss of cancer screening was STI/STD testing and treatment. This is a bit ironic given the other story that has circulated this week - the big jump in STD among Americans (link).

Then, the designer placed the two lines on dual axes, which is a dead giveaway that something awful lies beneath.

Further, this designer dumped the data from intervening years, and drew a straight line from the first to the last year. The straight arrow misleads by pretending that there has been a linear trend, and that it would go on forever.

But the masterstroke is in the treatment of the axes. Let's look at the axes, one at a time:

The horizontal axis: Let me recap. The designer dumped all but the starting and ending years, and drew a straight line between the endpoints. While the data are no longer there, the axis labels are retained. So, our attention is drawn to an area of the chart that is void of data.

The vertical axes: Let me recap. The designer has two series of data with the same units (number of people served) and decided to plot each series on a different scale with dual axes. But readers are not supposed to notice the scales, so they do not show up on the chart.

To summarize, where there are no data, we have a set of functionless labels; where labels are needed to differentiate the scales, we have no axes.

***

This is a tried-and-true tactic employed by propagandists. The egregious chart brings back some bad memories.

Here is a long-ago post on dual axes.

Here is Thomas Friedman's use of the same trick.


Bewildering baseball math

Over Twitter, someone asked me about this chart:

Mlb_pipeline

It's called the MLB pipeline. The text at the top helpfully tells us what the chart is about: how the playoff teams in baseball are built. That's the good part.

It then took me half a day to understand what is going on below. There are four ways for a player to be on a team: homegrown, trades and free agents, wherein homegrown includes drafted players or international players.

Each row is a type of player. You can look up which teams have exactly X players of a specific type. It gets harder if you want to know how many players team Y has of a given type. It is even harder if you don't know the logos of every team (e.g. Toronto Blue Jays).

Some fishy business is going on with the threesomes and foursomes. Here is the red threesome:

Mlb_threesome1

Didn't know baseball employs half a player. The green section has a different way to play threesomes:

Mlb_threesome2

The blue section takes inspiration from both and shows us a foursome:

Mlb_foursome

I was stuck literally in the middle for quite a while:

Mlb_middlesection

Eventually, I realized that this is a summary of the first two sections on the page. I still don't understand why there is no gap between 11 and 14 but then the 14 and 15 arrows are twice as large as 9, 10 and 11 even though every arrow contains exactly one team.

***

The biggest problem in the above chart is the hidden base: each team's roster has a total of 25 players.

Here is a different view of the data:

Redo_mlb_pipeline

With this chart, I want to emphasize two points: first, addressing the most interesting question of which team(s) emphasize which particular player acquisition tactic; second, providing the proper reference level to interpret the data.

Regarding the vertical, reference lines: take the top left chart about players arriving through trade. If every team equally emphasizes this tactic, then each team should have the same number of traded players on the 25-person roster. This would mean every team has approximately 11 traded players. This is clearly not the case. Several teams, especially Cubs and Blue Jays, utilized trades more often than teams like Mets and Royals.

 

 


Shortchanging and subverting the message

Reader Michael N. calls this an "unusual" marketing bar chart--because the designer distorted the data in a way that weakens, rather than strengthen, the story!

Republic_Wireless_excerpt

The infographic is pitching savings if the family switches to Republic. The savings is about 70% off and yet the height of the $40 bar is more than 50% of the $150 bar. 

The entire infographic is a case of misplaced emphases. (Click here to see original.)

Ranked by size of font from largest to smallest, this poster gives us the following information:

Average cellphone bill for a family of four

Penetration rate of Wifi

Price comparison between average plan and Republic Wireless plan

Median national household income

Cellphone bill growth versus inflation rate

Cost of wireless data split from the bill

Growth of global Wifi hotspots

Actual amount of wireless data used by cellphone users

 

The intended message is families are paying for a lot of unused wireless data, and Republic Wireless has a Wifi solution to save you 70% of their bill.

 


Why you need a second pair of eyes

Reader Aaron K. submitted an infographic advertising the upcoming New England Auto Show to be held in Boston (link).

As Aaron pointed out, there is plenty of elementary errors contained in one page. I don't think the designer did these things consciously. I believe in having someone else glance at your work before you publish it. Or take a walk around the house and look at your own work after flushing your head.

In the following diagram, the graphical elements (stick figures) are coding the data labels, rather than the data!

Neas_agegroup

Helping readers figure out which one is male and which one is female seems, hmm, unnecessary.

Neas_gender

Placing the above two charts side by side has the effect of suggesting that only male attendees were asked about their age.

Neas_gender_agegroup

 Look again, is the proportion of attendees over 18 4%, 96% or 100%?

Neas_over18

***

This map irritates me.

Neas_geo

Is it because they could have enlarged the frame just a little so as not to have to expel little Rhode Island from New England? Is it because not having the right frame size caused two numbers to sit outside New England when only one should? Is it because having two numbers outside the boundary tempted the designer to single out Rhode Island for the purpose of labeling? Is it because no other state is labeled besides Rhode Island?

Or is it because the land area is vastly disproportional to the data being displayed? Is it because the map construct is a geography lesson and nothing more (something I wrote about years ago)? Is it because the geography lesson is incomplete since only one state is labeled?

***

According to the text at the bottom, this part of the country is proud of "it's (sic) academia" and has hundreds of thousands of college students, who somehow "contribute $4.8 billion+ to the city's economy," which tells me they are super-productive in the classrooms.


Sheep tramples sense

Merry Christmas, readers.

***

A Twitter follower pointed me to this visual:

Wtf_sheep

I have yet to understand why the vertical axis of the top chart keeps changing scales over time. The white dot labelled "Peak 1982" (70 million) is barely above the other white dot for "2007" (38 million). This chart hides a clear trend: the population of sheep in New Zealand has plunged by 45% over 25 years.

To address the question of sheep versus human, one should plot the ratio of sheep-to-human directly. In this case, the designer probably faced a problem: because of the plunging population of sheep, the ratio has plunged steeply in 25 years. To make a point that "people are outnumbered more than 9 to 1", the designer didn't want to show a plunging trend. (Could this be the reason why the human population in 1982 was not printed?)

This is a case of too many details. Instead of manipulating the scale to distort the data, one can simply show the current ratio, or the average ratio in the last five years.

***

As the reader scans to the bottom set of charts, a cognitive wedge is encountered, as the curved scale of the New Zealand chart gave way to the normal uniform scale. These smaller charts are no less confusing, however.

  Australia_iceland_sheep

The two lines on these two charts appear almost the same and yet, the Australian chart (on the left) shows a ratio of 4 to 1 while the Icelandic chart (on the right) shows a ratio of 1.5 times. Makes you wonder if each one of the small-multiples have a dual axis.

Again, I'm not convivned that the time series adds anything to the message.