« Raw data and the incurious | Main | More chart drama, and data aggregation »



I agree, the bubble chart might not be the best. IMO, the column chart is not a) visually pleasing and b) is marginalizes the trend the authors are trying to emphasize to about 4-6 lines in the histogram. Perhaps something more akin to a violin chart would be better?? This is a case where the individual data points are not the -point-.


Bob: Smoothing the data first will solve this problem. The "trend" here requires interpolation.


I agree that this particular bar chart is not visually pleasing, and also that it doesn't quite drive the point home(I don't think the original made much of a point either though).

I don't agree that showing the point as "only" 4-6 lines in a histogram is marginalizing it. 4-6 lines in a histogram can mean a great deal.

Interpolation, or simply aggregation, will go a long way to fix that.

But I think the main point is that the bubbles, other than looking pretty, aren't really telling us much, and there are better ways to show it.


I had earlier wanted to link to Kosara's response to this blog post and Typepad's spam filter struck again, and removed a comment by the author of the blog!

Here is Kosara's take on this post: link.

As you can see from the above comments, I agree with him that smoothing helps.

When I retained the spikes, they were cued by the original chart in which certain days were highlighted. But I agree that those dates were probably not very meaningful.


My first issue here is that the data doesn't directly talk about the number of infections, but rather reports the number of birds killed; without knowing more about the relationship between infection and culling we can't judge to what extent the one is a good proxy for the other.

I also think that looking at just the totals is misleading, it misses the fact that there are more interesting stories here than is captured by these aggregate charts.

For example, a brief study of the data reveals that there are two strains of the virus: H5N8 and H5N2; the first large culls (23 Jan and 12 Feb) are of flocks with H5N8, so could (should?) be omitted if the story is about H5N2.

Based on the reported species the vast majority (99.8%) of birds destroyed were chickens and turkeys of one sort or another; grouping these two together and disregarding the ducks, pheasant and other or mixed species, and looking at the timeline it appears that the virus appears to affect turkeys first, and chickens later.

This detail could lead to hypotheses about spread from the smaller (?) turkey population through mixed flocks to the large chicken flocks.

I don't have knowledge of poultry or virology, so don't know if these are valid concerns and hypotheses, but these are the sorts of stories I'd like to tease out from the data.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Good Books

Keep in Touch

follow me on Twitter