« January 2015 | Main | March 2015 »

A startling chart about income inequality, with interpretative difficulties

Reader Robbi B. submitted the following chart posted to Twitter by Branko Milanovic:


The chart took a little time to figure out. This isn't a bad chart. Robbi wondered if there are alternative ways to plot this information.

The U.S. population is divided into percentiles across the horizontal axis, presumably based on the income distribution in some year (I'm guessing 2007, the start of the recession). For each percentile of people, the real per capita growth (decline) in disposable income is computed for two periods: the blue line shows the decline during the recession (2007-2010) and the orange shows the growth (in some cases further decline) during the recovery (2010-2013).

This chart draws attention to the two tails of the distibution, namely, the bottom 10 percent, and the top 5 percent. At one level, these two groups (excepting the bottom 2%) experienced the best of the recovery. But then, they also suffered the worst declines during the recession.


Here is one possible view of the same data, in a format with which I have been experimenting recently. You might call this a Bumps panel or a slopegraph panel.


The slopes draw attention to the relative magnitude of the declines and the subsequent recoveries. (I thinned the middle 80% substantially because there isn't much going on in that part of the dataset.) If I have more time, I'd have chosen a different color instead of grayscale for those lines.

I ignored any questions I have about the underlying data. How is disposable income defined and measured? Does it carry the same meaning across the entire spectrum of income distribution? etc. (Milanovic points to the Survey of Consumer Fiannces as the source.)


One reason for the reading difficulty is the absence of a reference point. It's unclear how to judge the orange line. Two answers are suggestive (but problematic). One is the zero line: which segments of the population experienced a recovery and which didn't? Another is the mirror image of the blue line: how much of what one lost during the recession did one recover by 2013 (roughly speaking)?

Both of these easy interpretations worry me because they carry an assumption of equal guilt (blue line) and/or equal spoils (orange line). It is very possible that the unwarranted risk-taking or fraud was not evenly spread out amongst the percentiles, and if so, it is impossible to judge whether the distribution exhibited in the blue line was "fair". It is then also impossible to know if the distribution contained in the orange line was "fair". Indeed, if the orange line mirrored the blue line, then all segments recovered similarly what they lost--this would only make sense if all segments are equally culpable in the recession.

An unsuccessful adaptation of a classic

Found this chart in Hemispheres magazine on board a United flight:


A quick self-sufficiency test reveals the biggest shortcoming of this visual presentation.


What would you guess is the difference in areas between the two white-ish sectors (pointing at 9 o'clock and 2 o'clock)? The actual numbers are 18.3% and 12.5%. So roughly, if one takes the 2-o'clock sector (right), halve it and add it back to itself, one should obtain the area of the 9-o'clock sector (left). Clearly, the piece on the left is much too big.

The following chart shows the index of exaggeration increasing with the value of the data. (For example, the highest value of 18.3% is about 9 times the lowest value of 2.3% but the the ratio of the areas depicted is ~500 times.)


The distortion is larger than usual because the designer encodes the data twice, once in the angle of the sector, and again in the radius. Both those quantities contribute to the area of a circle.

Readers must look at the data in order to read this chart properly, therefore the visual elements are not self-sufficient. Further, if readers chose to perceive the relative sizes of the sectors, they would have misread the data massively.


The designer was probably inspired by the Nightingale rose diagram (link to Wikipedia):


In the original, Nightingale does not encode data into the angles. The circle is divided evenly into 12 pieces to display the 12 months of the year (She might have taken into account 28-31 days; it's hard to tell by inspection). The data is encoded once along the radial axes.

Another difference between the two charts is the ordering of the data. In Nightingale's version, the order is logically determined by the passing of time. In the Hemispheres chart, the order is chosen based on taste. A more natural order would be by the proportion of employment but I think the resulting chart would look like a snail's shell, or worse. I must say a more balanced "rose diagram" looks nicer but it forces my eyes to jump around to answer a simple question such as which are the top three employment sectors in San Francisco.

Fixing the visual versus fixing the story

It's great for me when my friend Alberto Cairo lent a helping hand (link). Here is the original chart showing deaths in African and Middle East countries due to recent unrest:


This is Cairo's redesign:


There is no doubt the new version brings out the data more clearly. I like the cropping of the continent. I'd color-code the countries using the same legend as above.

I'm troubled by the concept of the original chart. I struggle to find any interesting correlation of deaths, whether with time, with government reaction, or with geography. Of the three, I think geography is the most correlated so a good design should bring that out. (Of course, geographical bias is expected and thus rather boring.)

If the intention of the chart is to answer the question of what factors affect deaths, then the wrong variables are being utilized.

So, as regards the Trifecta Checkup, Cairo solved the V problem while the D problem remains.


Minimalism as a form of abuse

With each succeeding year, I get more and more frustrated with "minimalist" designs that have little respect for users.

This Christmas, I received a portable cellphone charger as a gift. A thoughtful gift. I have heard of these devices but have never touched one. Until a few weeks ago (when I wrote this post).

This is the packaging.


The Phunkee Juice Box is a square cylinder. It has no buttons, and no obvious signals. The only other thing I found in the box was a multi-headed wire. This is as minimal as you can get. Even the brand's name is taped on, as if to say "You don't even have to advertise our name if you don't like it".

I needed to get some power into this battery first. I was in a computer lab with many power outlets but the cord in the box had no plugs. I looked for instructions. This is the back cover:


So how do I use this thing? There's a note at the bottom: Please see detailed instructions inside.


Amusingly, there wasn't anything inside the box that resembled instructions (see the first photo).


Perhaps I could connect the device to one of the lab computers and power it up that way. Instinctively, I inserted the USB connector into the device. Then I realized none of the three remaining connectors could fit into the computer.


The device has two sockets, so I reversed the wire.


Now the USB connector went into the desktop computer while the mini-USB plug went into the Juice Box.


A red light appeared around the neck of the device. It was a persistent light, not blinking, not changing colors. There was only one light on the Juice Box so how much charge did it have?


Then I started having doubts. Was I sure power was flowing from the computer to the Juice Box? Couldn't power be moving from the Juice Box to the computer? What I think caused this confusion was the reversing of the wire. The USB port was first inserted in the computer, then flipped over to the device. Cords are typically uni-directional but this one might be bi-directional.

An hour later, I didn't see any change. The red light was still on. Someone told me I should use my iPhone plug and insert the Juice Box directly to the socket on the wall. This device made me feel dumb.

Again, the red light came on, and again no other signal was forthcoming. Eventually, after three hours or so, the light turned blue. Finally, I learned that the light turns from red to blue on a full charge. I still have no idea how much charge is in the device at any time.

I left the fully charged device on my desk. One day later, my phone was out of power and I connected the Juice Box the only way it could -  the mini-USB port into the phone, the USB port into the Juice Box. I had reversed the direction of the cord again. Presumably power was flowing from the battery into my phone. I wasn't sure since the one and only light was completely extinguished. (PS. Turned out no power was moving across. Perhaps the device was defective. Perhaps the power dissipated during those 24 hours of idleness.)

You know I will get to visualization eventually. The current trend of hiding labels and text is irritating. The new interface of Google Maps is more confusing to use than the previous interface, not least because of de-cluttering and replacing text with symbols. To read many of today's graphics, stumbling readers must hover over or click on the chart surface--these interactions add nothing to the experience.

Minimalism is taking away unneccessary things. It isn't taking away everything. Please stop torturing users.

Numbersense, in Chinese and Japanese

This is a cross-post on my two blogs.

The new year brings news that my second book, Numbersense: How to Use Big Data to Your Advantage has been translated into Chinese (simplified) and Japanese. Here are the book covers:


In Chinese, the title reads: "Say No to Fake Big Data". Captures the sentiment of the book pretty well, I must say.


I have no idea what the Japanese title means. Perhaps a reader can help me out here.


The Japanese version is available here or here.

The Chinese version is here.

The English version is here.

There are no easy charts

Every chart, even if the dataset is small, deserves care. Long-time reader zbicyclist submits the following, which illustrates this point well.


The following comments are by zbicyclist:

This is from http://win.niddk.nih.gov/statistics/  -- from the National Institute of Diabetes and Kidney Diseases, part of the U.S. National Institutes of Health.
The pie chart is terrible in a pedestrian way – a bar chart could be so much clearer, or even a table. You have to do too much work to match up the colors, numbers and labels on the pie chart.

To the right of the pie is a bar chart, but a bar chart in which the categories are nested – extreme obesity is part of obesity, extreme obesity and obesity are part of overweight or obesity.  If we want to do something like this, there should be 3 charts (e.g. space on the x axis indicating a break). The normal expectation for a bar graph is that the categories are mutually exclusive.  This problem is repeated in the Race/Ethnicity graph just below these.


Now, some comments by me.

Another issue of the design is inconsistency. The same color scheme is used in both charts but to connotate different concepts.


Put yourself at the moment when you just understood the chart on the left side. You figured out that obesity is deep green while extreme obesity is light green. Now you shifted your attention to the column chart. You were expecting the light green columns to indicate extreme obesity, and the deep green, obesity. And yet, the light/dark green represents a male-female split.

Here is a stacked column chart showing that females are more likely than males to be either extremely obese or not overweight. In other words, the female distribution has "fatter tails".


I learned the most upsetting thing about this chart when re-making it: the listed percentages on the pie chart added up to 106 percent.