Free NYU session on data visualization
The graphical version of "to be seen"

Beyond the obvious

Flowing Data has been doing some fine work on the baby names data. The names voyager is a successful project by Martin Wattenberg that has received praise from many corners. It's one of these projects that have taken on a commercial life as you can see from the link.

Here is a typical area chart presentation of the baby names data:


The typical insight one takes from this chart is that the name "Michael" (as a boy's name) reached a peak in the 1970s and have not been as popular lately. The data is organized as a series of trend lines, for each name and each gender.

Speaking of area charts, I have never understood their appeal. If I were to click on Michael in the above chart, the design responds by restricting itself to all names starting with "Michael", meaning it includes Michael given to a girl, and Michaela, for example. See below.


What is curious is that the peak has a red lining. At first thought, one expects to find hiding behind the blue Michael a girl's name that is almost as popular. But this is a stacked area chart so in fact, the girl's name (Michael given to a girl, if you mouse over it) is much less popular than the boy Michael (20,000 to 500 roughly).


Nathan decides to dig a layer deeper. Is there more information beyond the popularity of baby names over time?

In this post, Nathan zones in on the subset of names that are "unisex," that is to say, have been used to name both boys and girls. He selects the top 35 names based on a mean-square-error criterion and exposes the gender bias for each name. The metric being plotted is no longer pure popularity but gender popularity. The larger the red area, the greater the proportion of girls being given that name.

You can readily see some interesting trends. Kim (#34) has become almost predominantly female since the 1960s. On the other hand, Robbie (#18) used to be predominantly female but is now mostly a boy's name.


 One useful tip when performing this analysis is to pay attention to the popularity of each name (the original metric) even though you've decided to switch to the new metric of gender bias. This is because the relative proportions are unstable and difficult to interpret for less popular names. For example, the Name Voyager shows no values for Gale (#29) after the 1970s, which probably explains the massive gyrations in the 1990s and beyond.


Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.