Bloomberg issues a health warning dressed up as a fast-food menu
New but is it better?

Simple rendering of complex data

Andrew Gelman likes this line chart showing the day-by-day trend in childbirth:



Andrew makes a number of good points about this chart. Make sure you read the whole post.

One of his points concerns making the line smoother by removing the within-week fluctuations. Doing so removes the weekday/weekend effect. By removing effects that are not of interest, we can focus on effects that are interesting. The number of births on any given day is a confluence of many factors, weekday/weekend being one of them. If we don't remove some of the contributing factors, we'd have no idea which factors are more important and which are less so.


The problem of "confounding" in complex datasets is demonstrated in the heat map, which Gelman also cited, without approval:

Heatmapbirthdays1Heat maps are great for certain datasets. This isn't one of them.

The weakness of heat maps is the reliance on color scales. Most software does not allow precise mapping of numbers to colors. The color pattern is automatically generated, which is often not to our liking. Even if the colors are acceptable, it is impossible to learn anything from a heat map other than the big-picture patterns.

The big-picture pattern we find here includes summer months being most popular for births while springtime is less popular. I fail to find any consistent patterns in the rows. If this is the key message, then we can collapse the rows, and even collapse the columns into seasons.

But what is the color scale? The colors correspond to ranks. Ranks ignore the actual difference between two data points. In other words, all the drastic troughs and peaks in the line chart disappear from this heat map. There are much better ways to turn count data into discrete bins.

 Picking the right ranking scheme is the most pressing issue here. The designer ranks all 366 days in one overall ranking. This ranking serves to play up the summer bulge in births but obscures other patterns.

Alternatively, days can be ranked within each month. That would remove the month-to-month effect and highlight the day-of-the-month effect.




Feed You can follow this conversation by subscribing to the comment feed for this post.

Jon Peltier

I thought that heat map was terrible, but all the articles citing it said it was awesome.


"I fail to find any consistent patterns in the rows."

What about the 13th of each month being a low birth day? That heatmap view is one of the few ways that draws out that fact. It seems like people may be avoiding giving birth on the 13th out of superstition. There are other ways to pull out this fact, but I thought it jumped out well on the heatmap.

While it isn't noted in the author's post, since the data comes from the CDC, I expect it's for births in the United States. It's a very important fact that should be noted on the chart because of the correlations to US holidays.

Geoff Ellis

Yes, it shows the 13th low as well as the 4/5 July and 23/26 December. Must be the US .... interesting to see if public holidays in other countries show a similar trend. One thing is doesn't show are weekly patterns, which could be achieved by highlighting each Sunday for instance.

"Alternatively, days can be ranked within each month". Rather than this (is a month interval really relevant?), why not remove the smoothed value to see the weekly trends. Another view would be to stack up the weeks on top of each other.

Just goes to show that the key to exploring data is to give the user lots of different views to explore! To do this automatically is not so straightforward, but for instance an FFT would pick out the principal frequencies and views could be suggested based on this.

BTW, I wouldn't consider this as complex data!


If the downward spikes were reliably on one day of the week (as the Thanksgiving one is) I'd suggest fifty two weeks plotted seven times, once for each week day. But the other patterns don't fall on week days, so instead I suggest two plots of fifty two weeks each, one being the median of seven days, and the minimum of those same seven days. The weekly minimum will be e.g. Christmas day whichever week day it's on.

Alternatively, a year starting on the first weekday after 1 March should allow you to line up the seven days without smearing, even in a leap year. That would also solve the problem where 1 January is getting a raw deal by being on the left with no December figures for comparison. But Thanksgiving risks being smeared a little, or doubled at least. Can't be helped.

all good points about the heat map use for this visualization. thanks for this post.

at first glance, i thought the heat map was nice. but then having seen the simple line chart, the heat map does become very one dimensional. it loses out on being able to show the significant dips/spikes, which seems to me to be one of the most important findings here.

@derek thanks for sharing your thoughts on plotting. very informative and helpful. tnx!

3d rendering

I was very happy as I discovered this website. I needed to thank you for this excellent information! And I have bookmarked your blog to read. Many thanks

3d rendering services

This is readable and Nice blog post. a good lesson it have, Thank you very much..

The comments to this entry are closed.