« February 2015 | Main | April 2015 »

To map or not to map

The New York Times shows the following set of maps to illustrate State policies relating to illegal immigrants. (link to article)



This is a great classroom exercise. The question is: to map or not to map. What are other possible displays and how do they compare?


Designers fuss over little details and so should you

Those who attended my dataviz talks have seen a version of the following chart that showed up yesterday on New York Times (link):


This chart shows the fluctuation in Arctic sea ice volume over time.

The dataset is a simple time series but contains a bit of complexity. There are several ways to display this data that helps readers understand the complex structure. This particular chart should be read at two levels: there is a seasonal pattern that is illustrated by the dotted curve, and then there are annual fluctuations around that average seasonal pattern. Each year's curve is off from the average in one way or another.

The 2015 line (black) is hugging the bottom of the envelope of curves, which means the ice volume is at a historic low.

Meanwhile the lines for 2010-2014 (blue) all trace near the bottom of the historic collection of curves.


There are several nice touches on this graphic, such as the ample annotation describing interesting features of the data, the smart use of foreground/background to make comparisons, and the use of countries and states (note the vertical axis labels) to bring alive the measure of coverage volume.

Check out my previous post about this data set.

Also, this post talks about finding real-life anchors to help readers judge size data.

My collection of posts about New York Times graphics.


PS. As Mike S. pointed out to me on Twitter, the measure is "ice cover", not ice volume so I edited the wording above. The language here is tricky because we don't usually talk about the "cover" of a country or state so I am using "coverage". The term "surface area" also makes more sense for describing ice than a country.

Bumps chart goes mainstream

It’s a happy day when one of my favorite chart types, the Bumps chart, makes it to the Wall Street Journal, and the front page no less! (Link to article)

This chart shows the ground shifting in global auto production in the next five years, with Mexico and India gaining in rank over Germany and South Korea.


The criss-crossing of lines is key to reading these charts. A crossing ("bump") necessarily means one entity has surpassed the other entity in absolute terms, even though we are looking at the relative rank.

Of course, there is no Swiss Army Knife of charts. This graphic provides no clue as to the share of world production. It's quite possible that the first few countries account for the majority of the world's producction, so that the rank shifts toward the bottom of the chart are relatively inconsequential. Wikipedia says that the top player (China) produces a quarter of the world's vehicles, and twice as many as the next biggest producer. Any country ranked below 4 accounts for less than 5 percent of global volume.


I made a few minor edits in this version below. Fro example, it's unclear why both 2014 and 2015 are depicted since there were no rank shifts and also the 2015 data is a projection. (I don't have any problem with the two red lines even though I didn't carry over the color scheme.)


Shortchanging and subverting the message

Reader Michael N. calls this an "unusual" marketing bar chart--because the designer distorted the data in a way that weakens, rather than strengthen, the story!


The infographic is pitching savings if the family switches to Republic. The savings is about 70% off and yet the height of the $40 bar is more than 50% of the $150 bar. 

The entire infographic is a case of misplaced emphases. (Click here to see original.)

Ranked by size of font from largest to smallest, this poster gives us the following information:

Average cellphone bill for a family of four

Penetration rate of Wifi

Price comparison between average plan and Republic Wireless plan

Median national household income

Cellphone bill growth versus inflation rate

Cost of wireless data split from the bill

Growth of global Wifi hotspots

Actual amount of wireless data used by cellphone users


The intended message is families are paying for a lot of unused wireless data, and Republic Wireless has a Wifi solution to save you 70% of their bill.


Observing Rosling’s Current Visual Style

On the sister blog, I wrote about Hans Rosling’s recent presentation in New York (link). I noted that Rosling has apparently simplified his visual palette.

Rosling is best known as the developer of the Gapminder tool, used to visualize global social statistics data collected by national statistical agencies. I wrote favorably about this tool in a series of posts (link). Gapminder made popular the moving bubble chart, although not the only graphical form present.


These animated bubble charts also made Rosling a YouTube star (See here.)


In last week’s presentation, Rosling only showed one moving bubble chart. The rest of his graphics are noticeably simpler, something that anyone can produce on Excel or Powerpoint. Here is one example:


I’m particularly impressed by a simple sequence of charts in which Rosling explains the demographic changes the world is expecting to see in the next 50 to 100 years.


This is an enhanced area chart. Each slice of area is subdivided into stick figures so that an axis for population counts becomes unnecessary.

Instead, the reader sees two useful dimensions: region of the world, and age group.

How the population ages as it grows is the feature story and the effect of aging is ingeniously portrayed as layers. This becomes apparent as Rosling lets time roll forward, and the layers literally walk off the page. (Unfortunately, I couldn't capture each step fast enough.)


 (This photo courtesy of Daniel Vadnais.)

When Rosling showed the 2085 projection, we find that the entire rectangle has filled up, so the world population has definitely grown, roughly by 30 percent. The growth happens by filling up of adults; the total number of children has not changed. This is one of the key insights from recent demographic data. The first photo above shows something remarkable: the fertility rate in Asian countries has plunged to about the same level of developed countries already.


This set of charts is unusually effective. It represents another level of simplification in visual means. At the same time, the message is sharpened.

As I reported the other day (link), Rosling does not believe modern tools have improved data analysis. This talk which utilized simple tools is a good demonstration of his point.

Tricky boy William

Last week, I was quite bothered by this chart I produced using the Baby Name Voyager tool.


According to this chart, William has drastically declined in popularity over time. The name was 7 times more popular back in the 1880s compared to the 2010s. And yet, when I hovered over the chart, the rank of William in 2013 was 3. Apparently, William was the 3rd most popular boy name in 2013.

I wrote the nice people at the website and asked if there might be a data quality issue, and their response was:

The data in our Name Voyager tool is correct. While it may be puzzling, there are definitely less Williams in the recent years than there were in the past (1880s). Although the name is still widely popular, there are plenty of other baby names that parents are using. In the past, there were a limited amount of names that parents would choose, therefore more children had the same name.

What bothered me was that the rate has declined drastically while the number of births was increasing. So, I was expecting William to drop in rank as well. But their explanation makes a lot of sense: if there is a much wider spread of names in recent times, the rank could indeed remain top. It was very nice of them to respond.


There are three ways to present this data series, as shown below. One can show the raw counts of William babies (orange line). One can show the popularity against total births (what Baby Name Wizard shows, blue line). One can show the rank of William relative to all other male baby names (green line). Consider how different these three lines look!


The rate metric (per million births) adjusts for growth in total births. But the blue line is difficult to interpret in the face of the orange line. In the period 1900 to 1950, the actual number of William babies went up but the blue line came down. The rank is also tough especially in the 1970-2000 period when it took a dive, a trend not visible in either the raw counts or the adjusted counts.

Adding to the difficulty is the use of the per-million metric. In the following chart, I show three different scales for popularity: per million, per 100,000, and per 100 (i.e. proportion). The raw count is shown up top.


All three blue lines are essentially the same but how readers interpret the scales is quite another matter. The per-million births metric is the worst of the lot. The chart shows values in the 20,000-25,000 range in the 1910s but the actual number of William babies was below 20,000 for a number of years. Switching to per-100K helps but in this case, using the standard proportion (the bottom chart) is more natural.


The following scatter plot shows the strange relationship between the rate of births and the rank over time for Williams babies.


Up to 1990s, there is an intuitive relationship: as the proportion of Williams among male babies declined, so did the rank of William. Then in the 1990s and beyond, the relationship flipped. The proportion of Williams among male babies continued to drop but the rank of William actually recovered!



Inverting the axis for goodness sake

Last week, Wall Street Journal inverted the vertical axis in one of their charts. The last time someone did this, a huge uproar ensued. (The Florida gun deaths chart discussed here.) This time, the act appeared to have caused barely a ruffle. Perhaps it's because the designer placed a text box on the chart to alert us that the axis is inverted. (See the original on the left.)


When I saw the chart, I tweeted that there is a better way to deal with this. Instead of inverting the axis, one can invert the currency ratio, as shown on the right. Each hryvnia is worth progressively fewer U.S. cents over the last two years. It is clear that the hryvnia is weakening, without having to annotate this.

(PS. I pulled a weekly dataset so the numbers don't completely match up for the Wednesday in question. Also, the steepness of the curve is due to the ratio inversion.)

Course Announcement: Data Visualization Workshop

The next installment of my data visualization workshop runs from April 7 to May 12 in New York City.

My workshop is modeled after a creative writing workshop. The focus of the six weeks is on giving and receiving feedback on a datavis project of the student's choosing. There are selected readings and industry speakers who provide some perspective on this fast-changing field.

You can find an outline of the course here.

Here are some comments about the course from past students:

"A terrific class. Excellent readings and a workshop structure that allowed for a high level of creativity and honed our skills in constructing and de-constructing effective visualizations."

"This was a great course that opened my eyes!!!"

"Fung brought in some fantastic speakers that are well respected in the industry. The workshop nature of the class helped hone our eye not just for our own project, but to observe and comment on those around us."

Registration information here. Please send along to your friends and/or colleagues.

PS. The course is part of the Certificate in Data Visualization although you can register for my course without doing the Certificate. The full set of courses is found here.


In addition, I'm announcing a new course called "Careers in Data Science and Business Analytics". Please see the announcement on my sister blog here.