Five steps to let the young ones shine

Knife stabbings are in the news in the U.K. and the Economist has a quartet of charts to illustrate what's going on.

Economist_20190309_WOC479

I'm going to focus on the chart on the bottom right. This shows the trend in hospital admissions due to stabbings in England from 2000 to 2018. The three lines show all ages, and two specific age groups: under 16 and 16-18.

The first edit I made was to spell out all years in four digits. For this chart, numbers like 15 and 18 can be confused with ages.

Redo_econ_ukknives_1

The next edit corrects an error in the subtitle. The reference year is not 2010 as those three lines don't cross 100. It appears that the reference year is 2000. Another reason to use four-digit years on the horizontal axis is to be consistent with the subtitle.

Redo_econ_ukknives_2

The next edit removes the black dot which draws attention to itself. The chart though is not about the year 2000, which has the least information since all data have been forced to 100.

Redo_econ_ukknives_3

The next edit makes the vertical axis easier to interpret. The indices 150, 200, are much better stated as + 50%, + 100%. The red line can be labeled "at 2000 level". One can even remove the subtitle 2000=100 if desired.

Redo_econ_ukknives_4

Finally, I surmise the message the designer wants to get across is the above-average jump in hospital admissions among children under 16 and 16 to 18. Therefore, the "All" line exists to provide context. Thus, I made it a dashed line pushing it to the background.

Redo_econ_ukknives_5

 

 

 


Big Macs in Switzerland are amazing, according to my friend

Bigmac_chNote for those in or near Zurich: I'm giving a Keynote Speech tomorrow morning at the Swiss Statistics Meeting (link). Here is the abstract:

The best and the worst of data visualization share something in common: these graphics provoke emotions. In this talk, I connect the emotional response of readers of data graphics to the design choices made by their creators. Using a plethora of examples, collected over a dozen years of writing online dataviz criticism, I discuss how some design choices generate negative emotions such as confusion and disbelief while other choices elicit positive feelings including pleasure and eureka. Important design choices include how much data to show; which data to highlight, hide or smudge; what research question to address; whether to introduce imagery, or playfulness; and so on. Examples extend from graphics in print, to online interactive graphics, to visual experiences in society.

***

The Big Mac index seems to never want to go away. Here is the latest graphic from the Economist, saying what it says:

Econ_bigmacindex

The index never made much sense to me. I'm in Switzerland, and everything here is expensive. My friend, who is a U.S. transplant, seems to have adopted McDonald's as his main eating-out venue. Online reviews indicate that the quality of the burger served in Switzerland is much better than the same thing in the States. So, part of the price differential can be explained by quality. The index also confounds several other issues, such as local inflation and exchange rate

Now, on to the data visualization, which is primarily an exercise in rolling one's eyeballs. In order to understand the red and blue line segments, our eyes have to hop over the price bubbles to the top of the page. Then, in order to understand the vertical axis labels, unconventionally placed on the right side, our eyes have to zoom over to the left of the page, and search for the line below the header of the graph. Next, if we want to know about a particular country, our eyes must turn sideways and scan from bottom up.

Here is a different take on the same data:

Redo_jc_econbigmac2018

I transformed the data as I don't find it compelling to learn that Russian Big Macs are 60% less than American Big Macs. Instead, on my chart, the reader learns that the price paid for a U.S. Big Mac will buy him/her almost 2 and a half Big Macs in Russia.

The arrows pointing left indicate that in most countries, the values of their currencies are declining relative to the dollar from 2017 to 2018 (at least by the Big Mac Index point of view). The only exception is Turkey, where in 2018, one can buy more Big Macs equivalent to the price paid for one U.S. Big Mac. compared to 2017.

The decimal differences are immaterial so I have grouped the countries by half Big Macs.

This example demonstrates yet again, to make good data visualization, one has to describe an interesting question, make appropriate transformations of the data, and then choose the right visual form. I describe this framework as the Trifecta - a guide to it is here.

(P.S. I noticed that Bitly just decided unilaterally to deactivate my customized Bitly link that was configured years and years ago, when it switched design (?). So I had to re-create the custom link. I have never grasped  why "unreliability" is a feature of the offering by most Tech companies.)


Is the chart answering your question? Excavating the excremental growth map

Economist_excrement_growthSan Franciscans are fed up with excremental growth. Understandably.

Here is how the Economist sees it - geographically speaking.

***

In the Trifecta Checkup analysis, one of the questions to ask is "What does the visual say?" and with respect to the question being asked.

The question is how much has the problem of human waste in SF grew from 2011 to 2017.

What does the visual say?

The number of complaints about human waste has increased from 2011 to 2014 to 2017.

The areas where there are complaints about human waste expanded.

The worst areas are around downtown, and that has not changed during this period of time.

***

Now, what does the visual not say?

Let's make a list:

  • How many complaints are there in total in any year?
  • How many complaints are there in each neighborhood in any year?
  • What's the growth rate in number of complaints, absolute or relative?
  • What proportion of complaints are found in the worst neighborhoods?
  • What proportion of the area is covered by the green dots on each map?
  • What's the growth in terms of proportion of areas covered by the green dots?
  • Does the density of green dots reflect density of human waste or density of human beings?
  • Does no green dot indicate no complaints or below the threshold of the color scale?

There's more:

  • Is the growth in complaints a result of more reporting or more human waste?
  • Is each complainant unique? Or do some people complain multiple times?
  • Does each piece of human waste lead to one and only one complaint? In other words, what is the relationship between the count of complaints and the count of human waste?
  • Is it easy to distinguish between human waste and animal waste?

And more:

  • Are all complaints about human waste valid? Does anyone verify complaints?
  • Are the plotted locations describing where the human waste is or where the complaint was made?
  • Can all complaints be treated identically as a count of one?
  • What is the per-capita rate of complaints?

In other words, the set of maps provides almost all no information about the excrement problem in San Francisco.

After you finish working, go back and ask what the visual is saying about the question you're trying to address!

 

As a reference, I found this map of the population density in San Francisco (link):

SFO_Population_Density

 


Digital revolution in China: two visual takes

The following map accompanied an article in the Economist about China's drive to create a "digital silkroad," roughly defined as making a Silicon Valley. 

Economist_digitalsilkroad

The two variables plotted are the wealth of each province (measured by GDP per capita) and the level of Internet penetration. The designer made the following choices:

  • GDP per capita is presented with less precision than Internet penetration. The former is grouped into five large categories while the latter is given as a percentage to one decimal place.
  • The visual design favors GDP per capita which is encoded as the shade of color of each province. The Internet penetration data appeared added on as an afterthought.

If we apply the self-sufficiency test (i.e. by removing the printed data from the chart), it's immediately clear that the visual elements convey zero information about Internet penetration. This is a serious problem for a chart about the "digital silkroad"!

***

If those two variables are chosen, it would seem appropriate to convey to readers the correlation between the two variables. The following sketch is focused on surfacing the correlation.

Redo_jc_china_digitalsilkroad2

(Click on the image to see it in full.) Here is the top of the graphic:

Redo_jc_china_digitalskilkroad_detail

The individual maps are not strictly necessary. Just placing provincial names onto the grid is enough, because regional pattern isn't salient here.

The Internet penetration data were grouped into five categories as well, putting it on equal footing as GDP per capita.

 


Diverging paths for rich and poor, infographically

Ray Vella (link) asked me to comment on a chart about regional wealth distribution, which I wrote about here. He also asked students in his NYU infographics class to create their own versions.

This effort caught my eye:

Nyu_redo_richpoor

This work is creative, and I like the concept of using two staircases to illustrate the diverging fortunes of the two groups. This is worlds away from the original Economist chart.

The infographic does have a serious problem. In one of my dataviz talks, I talk about three qualifications of work called "data visualization." The first qualification is that the data visualization has to display the data. This is an example of an infographic that is invariant to the data.

Is it possible to salvage the concept? I tried. Here is an idea:

Redo_econ_richpoor_infog2

I abandoned the time axis so the data plotted are only for 2015, and the countries are shown horizontally from most to least equal. I'm sure there are ways to do it even better.

Infographics can be done while respecting the data. Ray is one of the designers who appreciate this. And thanks Ray for letting me blog about this.

 

 

 


Fifty-nine intersections supporting forty dots of data

My friend Ray V. asked how this chart can be improved:

Econ_rv_therichgetsricher

Let's try to read this chart. The Economist is always the best at writing headlines, and this one is simple and to the point: the rich get richer. This is about inequality but not just inequality - the growth in inequality over time.

Each country has four dots, divided into two pairs. From the legend, we learn that the line represents the gap between the rich and the poor. But what is rich and what is poor? Looking at the sub-header, we learn that the population is divided by domicile, and the per-capita GDP of the poorest and richest regions are drawn. This is a indirect metric, and may or may not be good, depending on how many regions a country is divided into, the dispersion of incomes within each region, the distribution of population between regions, and so on.

Now, looking at the axis labels, it's pretty clear that the data depicted are not in dollars (or currency), despite the reference to GDP in the sub-header. The numbers represent indices, relative to the national average GDP per head. For many of the countries, the poorest region produces about half of the per-capita GDP as the richest region.

Back to the orginal question. A growing inequality would be represented by a longer line below a shorter line within each country. That is true in some of these countries. The exceptions are Sweden, Japan, South Korea.

***
It doesn't jump out that the key task requires comparing the lengths of the two lines. Another issue is the outdated convention of breaking up a line (Britian) when the line is of extreme length - particularly unwise given that the length of the line encodes the key metric in the chart.

Further, it has low data-ink ratio a la Tufte. The gridlines, reference lines, and data lines weave together in a complex pattern creating 59 intersections in a chart that contains only 40  36 numbers.

***

 I decided to compute a simpler metric - the ratio of rich to poor.  For example, in the UK, the richest area produces about 20 times as much GDP per capita as the poorest one in 2015.  That is easier to understand than an index to the average region.

I had fun making the following chart, although many standard forms like the Bumps chart (i.e. slopegraph) or paired columns and so on also work.

Redo_econ_jc_richgetricher

This chart is influenced by Ed Tufte, who spent a good number of pages in his first book advocating stripping even the standard column chart to its bare essence. The chart also acknowledges the power of design to draw attention.

 

 

PS. Sorry I counted incorrectly. The chart has 36 dots not 40. 


Confuse, confuses, confused, confusing

Via Twitter, @Stoltzmaniac sent me this chart, from the Economist (link to article):

Econ_vehicles

There is simply too much going on on the right side of the chart. The designer seems not to be able to decide which metric is more important, the cumulative growth rate of vehicles in use from 2005 to 2014, or the vehicles per 1,000 people in 2014. So both set of numbers are placed on the chart, regrettably in close proximity.

In the meantime, the other components of the chart, such as the gridlines and the red line indicating 2005 = 100 are only relevant to the cumulative vehicle growth metric. Perhaps noticing the imbalance, the designer then paints the other data series in rainbow-colored boxes, and prints the label for this data series in a big white box. This decision tilts the chart towards the vehicle per capita metric, as our eyes now cannot help but stare at the white box.

***

There are really three trends: the growth in population, the growth in vehicles, and the resultant growth in vehicle per capita. They are all be accommodated in a small-multiples setting, as follows:

Jc_econ_vehicles2

There are some curious angular trends revealed here. The German population somehow dipped into negative territory around 2007-8 but since then has turned around. Nigeria's vehicle growth declined sharply after 2006 so that the density of vehicles has stabilized.

 


If Clinton and Trump go to dinner, do they sit face to face, or side by side?

One of my students tipped me to an August article in the Economist, published when last the media proclaimed Donald Trump's campaign in deep water. The headline said "Donald Trump's Media Advantage Falters."

Who would have known, judging from the chart that accompanies the article?

Economist_20160820_woc352_1

There is something very confusing about the red line, showing "Trump August 2015 = 1." The data are disaggregated by media channel, and yet the index is hitched to the total of all channels. It is also impossible to figure out how Clinton is doing relative to Trump in each channel.

Here is a small-multiples rendering that highlights the key comparisons:

Redo_economist_earnedmedia1b

Alternatively, one can plot the Clinton advantage versus Trump in each channel, like this:

Redo_economist_earnedmedia2b

One sees that Clinton has caught up in the last month (July 2016), primarily through more coverage by "online news."

Imagine Mr. Trump and Mrs. Clinton dining at a restaurant. Are they seated side by side (Economist) or face to face (junkcharts)?


Graphical inequity ruins the chart

This Economist chart has a great concept but I find it difficult to find the story: (link)

Economist_brexit

I am a fan of color-coding the text as they have done here so that part is good.

The journalist has this neat idea of comparing those who are apathetic ("don't care about whether Britain is in or out") and those who are passionate ("strongly prefer" that Britain is either in or out).

The chosen format suffers because of graphical inequity: the countries are sorted by decreasing apathy, which means it is challenging to figure out the degree of passion.

This chosen order is unrelated to the question at hand. One possible way of interpreting the chart is to compare individual countries against the European average. The journalist also recognizes this, and highlighted the Euro average.

The problem is that there are two different averages and no good way to decide whether a particular country is above or below average.

Here is my version of the chart:

Redo_econ_brexit2

The biggest change is to create the new metric: how many people say they really care about Brexit/Bremain for every person who say they don't care. In Britain, over four people really care for each one who doesn't while in Slovenia, you can only find fewer than half a person who really cares for each one who doesn't.