I have a longer article on the sister blog about the research design of a study claiming 420 "cannabis" Day caused more road accident fatalities (link). The blog also has a discussion of the graphics used to present the analysis, which I'm excerpting here for dataviz fans.
The original chart looks like this:
The question being asked is whether April 20 is a special day when viewed against the backdrop of every day of the year. The answer is pretty clear. From this chart, the reader can see:
- that April 20 is part of the background "noise". It's not standing out from the pack;
- that there are other days like July 4, Labor Day, Christmas, etc. that stand out more than April 20
It doesn't even matter what the vertical axis is measuring. The visual elements did their job.
If you look closely, you can even assess the "magnitude" of the evidence, not just the "direction." While April 20 isn't special, it nonetheless is somewhat noteworthy. The vertical line associated with April 20 sits on the positive side of the range of possibilities, and appears to sit above most other days.
The chart form shown above is better at conveying the direction of the evidence than its strength. If the strength of the evidence is required, we use a different chart form.
I produced the following histogram, using the same data:
The histogram is produced by first locating the midpoints# of the vertical lines into buckets, and then counting the number of days that fall into each bucket. (# Strictly speaking, I use the point estimates.)
The midpoints# are estimates of the fatal crash ratio, which is defined as the excess crash fatalities reported on the "analysis day" relative to the "reference days," which are situated one week before and one week after the analysis day. So April 20 is compared to April 13 and 27. Therefore, a ratio of 1 indicates no excess fatalities on the analysis day. And the further the ratio is above 1, the more special is the analysis day.
If we were to pick a random day from the histogram above, we will likely land somewhere in the middle, which is to say, a day of the year in which no excess car crashes fatalities could be confirmed in the data.
As shown above, the ratio for April 20 (about 1.12) is located on the right tail, and at roughly the 94th percentile, meaning that there were 6 percent of analysis days in which the ratios would have been more extreme.
This is in line with our reading above, that April 20 is noteworthy but not extraordinary.
P.S. [4/27/2019] Replaced the first chart with a newer version from Harper's site. The newer version contains the point estimates inside the vertical lines, which are used to generate the histogram.