Form and function: when academia takes on weed
Apr 26, 2019
I have a longer article on the sister blog about the research design of a study claiming 420 "cannabis" Day caused more road accident fatalities (link). The blog also has a discussion of the graphics used to present the analysis, which I'm excerpting here for dataviz fans.
The original chart looks like this:
The question being asked is whether April 20 is a special day when viewed against the backdrop of every day of the year. The answer is pretty clear. From this chart, the reader can see:
- that April 20 is part of the background "noise". It's not standing out from the pack;
- that there are other days like July 4, Labor Day, Christmas, etc. that stand out more than April 20
It doesn't even matter what the vertical axis is measuring. The visual elements did their job.
If you look closely, you can even assess the "magnitude" of the evidence, not just the "direction." While April 20 isn't special, it nonetheless is somewhat noteworthy. The vertical line associated with April 20 sits on the positive side of the range of possibilities, and appears to sit above most other days.
The chart form shown above is better at conveying the direction of the evidence than its strength. If the strength of the evidence is required, we use a different chart form.
I produced the following histogram, using the same data:
The histogram is produced by first locating the midpoints# of the vertical lines into buckets, and then counting the number of days that fall into each bucket. (# Strictly speaking, I use the point estimates.)
The midpoints# are estimates of the fatal crash ratio, which is defined as the excess crash fatalities reported on the "analysis day" relative to the "reference days," which are situated one week before and one week after the analysis day. So April 20 is compared to April 13 and 27. Therefore, a ratio of 1 indicates no excess fatalities on the analysis day. And the further the ratio is above 1, the more special is the analysis day.
If we were to pick a random day from the histogram above, we will likely land somewhere in the middle, which is to say, a day of the year in which no excess car crashes fatalities could be confirmed in the data.
As shown above, the ratio for April 20 (about 1.12) is located on the right tail, and at roughly the 94th percentile, meaning that there were 6 percent of analysis days in which the ratios would have been more extreme.
This is in line with our reading above, that April 20 is noteworthy but not extraordinary.
P.S. [4/27/2019] Replaced the first chart with a newer version from Harper's site. The newer version contains the point estimates inside the vertical lines, which are used to generate the histogram.
There may also be some seasonality to consider. NHTSA shows a correlation between fatalities and monthly average temperature. April is a month that has school bus traffic as well as people getting on motorcycles and bicycles.
Could you change your graph to compare 4/20 to other days in April and May with similar temperatures?
Posted by: Dave C. | Apr 26, 2019 at 02:24 PM
Dave: Would recommend reading the more detailed report at Harper's site. He and Palayew tried two other set of reference days: going two weeks instead of one week around the analysis day; something they call "all other control days," which I think means the same day of the week in the other 51 weeks. They also compared analyses of April 20 and of July 4. The July 4 effect is much more solid. Doesn't directly address your question but a similar type of analysis.
Posted by: Kaiser | Apr 27, 2019 at 09:01 PM