Baby names and success
Jun 26, 2007
While we speak of baby names, David F. nominates this set of 6 charts from WSJ. Compare this with Wattenberg's names voyager, and the benefit of interactive graphics is immediately evident.
In David's words:
They show graphs of six different names, but the two on the bottom use a dramatically different scale (from 1st to ~20th, instead of from 1st to 1000th). The introductory text notes the difference, but it is still a shock.
We like the use of "small multiples" but their impact is compromised if we don't keep the background material constant so that readers can compare between charts. By having different scales, the message was distorted: Mary has had a much larger drop than David, and it's easily missed in these charts.
Lines should take the place of areas which carry scant meaning in this context.
The use of blue and red is a nice touch but dovetailing the male and female charts strikes us as excessive fun. It would have been clearer to give the sons and the daughters their own columns.
The article itself relates the anguish of modern parents in naming their babies. Much of this angst can be traced to serious econometric studies that claim to have found cause-and-effect relationships between someone's name and their eventual success in life. Some of this research was highlighted in Freakonomics, for example. My stance is that all such studies are dubious, there being innumerable confounding factors (socio-economic, genetic, cultural, luck, etc. etc.). In addition, the measured response can range from "happiness" to income to many other metrics. The danger of finding something because one looks hard enough is very real. We don't currently have tools powerful enough to substantiate this sort of studies.
Source: "The Baby-Name Business", Wall Street Journal, June 22, 2007
A common logarithmic scale would have been a better choice. The decrease in David and Mary is still obvious, and the scale better reflects the importance of a change.
Posted by: Ken | Jun 26, 2007 at 02:52 AM
For many, charts are the "lighter side" of data analysis. Thanks God for that, because statistics are so boring... That's why they can't resist to have some fun, even at the cost of a less (and less and less) clear message.
Posted by: Jorge Camoes | Jun 26, 2007 at 03:02 AM
as ET says...if your numbers are boring, you've got the wrong numbers.
Posted by: edmDusty | Jun 26, 2007 at 12:13 PM
I wonder how they'd look as sparklines? (apologies for the crude simulation of actual sparklines)
I realise that sparklines are not well equipped to deal with logarithmic scales.
Posted by: derek | Jun 26, 2007 at 03:18 PM
If you presented a lengthy table of names (so more people could find their own name, of course), rather than just a few, you would need sparklines. This would be a two-word display for each name: the name and the sparkline.
Posted by: Jon Peltier | Jun 27, 2007 at 09:01 AM
When I saw this in the original WSJ article, I thought it made the point well, and I still do. Logs would be confusing to the audience and the mindset of those reading a "baby name" article, even in the WSJ.
The graphs don't really need to be aligned to make the point in this case.
Full disclosure: My old grad school friend, Cleveland Kent Evans, is quoted in this article. Cleve's one of the new people named after two cities in Ohio. There's no sign of this becoming a trend.
Posted by: zbicyclist | Jun 27, 2007 at 09:43 PM
I don't have a problem with the reciprocal scale used, but I do have a problem with the filled area under the curve. That area is meaningless, and the curve should have been a line instead.
Posted by: derek | Jun 28, 2007 at 05:40 AM
I disagree. The area under the curve is arguably a measure of the name's overall popularity. Given a series of these graphs at the same scale (like the first four here), a comparison of those areas offers a quick way to guage relative popularities (for instance, there's a lot more red for "Nicole" than for "Farrah"). Though I can't prove it, I would imagine having that area filled in (against a background color) allows our visual system to do the "integration" quicker.
Again, I think this would be useful when eyeballing a series of these graphs, such as the table suggested above. At worst, at adds nothing (if you don't care about "overall popularity"). At best, it facilitates the perception of another related metric.
Posted by: miked | Jun 28, 2007 at 11:33 PM
But wait, the area together with the reverse rank is dangerous! You can arbitrary make the area as large as you want by picking an arbitrarily large end-point for the rank scale. In the case of David, one could choose a y-axis of 1 to 200 for example.
Posted by: Kaiser | Jun 29, 2007 at 02:25 AM
I'd prefer plotting proportion of total names (as is done on that baby names website), rather than ranks.
Posted by: Andrew Gelman | Aug 02, 2007 at 09:10 AM
I think an illustration with more names would make more of a point. This only shows that Mary and David are getting uncommon. Oh and statistics get so boring!
Posted by: Ally sal | Sep 26, 2008 at 06:43 AM