A matter of timing 2
May 12, 2008
Our last post generated much discussion around double axes. In this post, we take up Michael's suggestion of a scatter plot, and several suggestions to retain the original units.
The scatter plot in this case did not provide any insight, unfortunately. See below. It just highlighted the jerkiness in the data so we ended with much zig-zagging.
Retaining the original units is not advisable because those units were not comparable. In the following caricature, we show how to shape the axis to tell any story we want.
Panel plots are slightly better insofar as such mischief could be spotted by the amount of white space.
Another way to make the two data series comparable is to plot percentage change from year to year. This is similar to indexing, just the difference between annual change and cumulative change.
The problem with periodic percentage change is that the overall (cumulative) trends in the data are washed out by the periodic fluctuations.
Posted by: Jon Peltier | May 13, 2008 at 12:18 AM
I had to interpolate data values from your plots, so my data might be off.
Connecting the data points highlights the zig-zagging. A standard scatter plot and the correlation of -0.62 seem to indicate a fairly strong negative correlation between volume and crashes for this type of data.
Posted by: Michael Galloy | May 13, 2008 at 01:45 AM
I agree with Michael - why join the dots in the scatter plot?
Seems to me the scatter plot is both the correct and the most visually effective chart in this case.
Posted by: Stephen Hampshire | May 13, 2008 at 04:16 AM
I'm agreeing with Michael in the previous post's comments. Why not plot rates (crashes/volume)? Volume is an imperfect, but reasonable measure of exposure to the risk of a bicycle crash/fatality.
Posted by: 9.2.5 | May 13, 2008 at 11:43 AM
Yes, I like the idea to plot volume vs. crashes/volume. Here's the scatter plot, r = -0.84.
Posted by: Michael Galloy | May 13, 2008 at 01:02 PM
It's a matter of what question you're trying to answer. If we are looking for a functional relationship between accidents and volume, then yes, a scatter plot without lines works better. In this case, and in most social science situations, the time dimension is important. The line serves to expose any trends; in this case, it's hard to tell. Plus, the time series is too short.
Michael: great to see another reader making your own charts!
Posted by: Kaiser | May 13, 2008 at 09:39 PM
There's a reason to join the time points in the scatterplot. It does provide insight.
Note that the points primarily go counter-clockwise.
As Krider, et al. note (Marketing Science, 2005) the counter-clockwise pattern is associated with Y causing X -- or, more conservatively, evidence that it's not X [bike volume] causing Y [fewer bike accidents].
---
As a frequent bicyclist myself, I will make two non-graphical comments:
(1) Overall bicycle safety is likely to be improved with additional bicyclist volume -- vehicles are more likely to be aware of bicyclists.
(2) But, increased volume can lead to a bunch of newbys who THINK they know how to ride in traffic and end up running lights, passing trucks on the right and other forms of unsafe behavior.
Posted by: zbicyclist | May 14, 2008 at 09:28 AM