New look
Dropped, just like that

Wading in waste

Sciam_bacteria A poor graphic leaves readers wading in waste, in this case, the waste of time.  (Thanks to a tip from Dr. Bruce W.)

This very busy chart conveys a simple research finding, that the density of bacteria increases with the prevalence of impervious surfaces.  As Bruce pointed out, underlying this chart is but six observations taken at selected tidal creeks, each observation being a (paired) measurement of bacteria count and prevalence of impervious surfaces.

A factory worth of graphical elements was employed, including columns, pies, colors, data labels, legends and so on.  The result is utter confusion.  How is it that the tip of each column does not coincide with the center of each pie?  Do equal-sized pies imply equal surface areas?  What is the bacteria count at each location?

Redo_bacteriaA scatter plot brings out the key correlation with minimal fuss.

Reference: "Wading in Waste", Scientific American, June 2006


Feed You can follow this conversation by subscribing to the comment feed for this post.

Kelly O'Day


I've added correlation information and creek name labels to your version.


Jorge Camoes

Kaiser, Kelly: that's the easy solution. You should know by now that the media (in general) is afraid of scatter plots and do what ever it takes just to avoid them, as you can see in the example above.

And you are aware that your charts are so naked they are hardly decent, don't you? You should replace each dot with a photo of each creek, or something like that. Just to hide the nature of the chart...

Aleks J

What strikes me about this truly awful chart is the correlation-causation junk. Paved surfaces are just one of the indicators of urbanization. Deciding on the effect of population density vs pet density and similar would have been much more enlightening. I guess one needs a chaotic chart to hide something very obvious.


Kelly, labels are the first thing I thought of too, but I'm not sure the exponential trend line means anything, even though the correlation is an impressive 99%.

I'd be more inclined to go for a y=mx2 relationship, even if the correlation doesn't turn out to be as good.


I was grateful to Xan Gregg, the commenter in the rip-tide thread who provided the numbers the graph was based on. If anyone else acquires such a data set for one of Kaiser's charts, can they post a link to it in comments so we can all play?


That should be y=mx2+c :-)


Picking up where Aleks left off - the degree of correlation in this data is uncanny; this usually implies either a bad design or data tampering. Part of the problem is that the small sample size of 6 does not justify fitting nonlinear models. Also, maybe some simple mechanism (latent variable) explains this correlation.


I'm guessing catchment area versus creek length, hence my thought about a quadratic fit.


kaiser, i enjoy reading your blog half a year now and finally i've got something more or less usefull to add:

i like the new design of your page better but it needs some re-design.
it is not intuitively clear that the comment is from the "posted by" below the comment because of the dotted line between them. the line should be under the "posted by". or: make the space between comments bigger.


失踪: That's a great point. I'll figure out if there is a way for me to change that setting.

The comments to this entry are closed.