This is the continuation of the last post on Chapter 5 of John Adams's excellent book on Risk. In the previous post, I discussed the problems of measurement that undermines the mathematical theory of risk. This post focuses on the particularly compelling example of traffic black spots, which reveals some risks in the statistical interpretation of risk data.
What Adams said:
Parts of the [road] network that have experienced bad spell are defined as accident black spots. When they are "treated", the numbers of accidents usually go down - but they probably would have gone down anyway.
He goes on to show a table of data used to identify traffic black spots. By convention, you list all the sites by the rate of accidents in descending order. The top sites with the most accidents are grouped together as black spots. In that table, 14 sites were identified as the worst sites with on average 3.53 accidents in a "before" period. In the "after" period, the accident rate dropped to 1.50.
Here's the catch: there was no treatment separating the "before" and "after" period. The analysis is an "A/A test," which I have frequently advocated as a way for data analysts (or our clients) to get a feel for "uncertainty". So the halving of the accident rate has zilch to do with any safety measure. One plausible explanation for the sudden drop in accident rate is the phenomenon of regression to the mean: if you select the worst sites for any given time, then in the next window of time, it is likely that those sites will experience below-average accident rates. (Vice versa, if you focus on the best sites for any given time, it is likely that those sites will suffer above-average accident rates in the after period.)
If you read the table of data carefully, you will notice that the 2000+ sites that were perfectly safe (zero accidents) in the "before" period subsequently suffered an increase in the accident rate in the "after" period. The average accident rate in the "after" period was 0.19, still low as an average number, but when multiplied by 2,000 sites, it can't be ignored. To see this even more clearly, in the "after' period, the black spots contributed 1.50x14 = 21 total accidents while the "no accident" spots contributed 2000x0.19 = 380 total accidents, 18 times more than the black spots!
The disease of the "Top X" analysis is an epidemic in our Big Data era; the OCCAM nature of these datasets worsens the disease (link). Pouring money on the "top" prospects is folly when one has not established causation of treatment and effect.
"Top X" type analysis is regrettably common in practice. You take a set of data and divide up the data into groups (often called deciles, quartiles, etc. depending on how many groups). Then you can compute some metric of the "lift". For example, in the "before" period, in the aggregate (> 2,600 sites), the average accident rate was 0.26. So if we predict that accidents will happen in the "black spots" (i.e. the worst class of sites), then the lift is 3.53/0.26 = 13.5 times. That is to say, the black spots are predicted to see 13.5 times more accidents than the average site.
Unexpectedly, in the "after" period, this lift number plunged to below 6 times. In other words, the predictive power is poor (when computed on data not even collected at the time of model-building). Given that no treatment was applied, if the predictive model captured the underlying dynamics, we should expect to see the lift number to be close to 13 times again. (We're assuming no strong known other factor that could cause the accident rates to collapse during the "after" period - in fact, in aggregate, the total number of accidents slightly increased.)
Here is the more subtle error: usually, the "after" period is after some safety measure is applied to the black spots (and not the other sites). The expected result is a drop in accident rate at the black spots, providing evidence that the safety measure actually caused the reduction in accidents. It is therefore very convenient for analysts (and policymakers who hire them) to conclude that the drop from 13.5 times to 6 times is evidence of the effectiveness of the safety measure. What the above analysis shows is that even in the absence of a safety measure, the accident rate could have dropped by half! So if you did apply a safety measure, it would need to produce a massive effect beyond the regression to the mean.
There are two reasons why analysts may fall prey to this type of bad analysis: one is confirmation bias - seeing in the data what you want or expect to see; the other is being what Nassim Taleb calls "fooled by randomness".
Adams goes on to discuss "accident migration". This is a second-order phenomenon: sites near to "treated black spots" often see an increase of accidents - the hypothesis is that since drivers take more caution when passing through black spots, they end up being more careless after the black spots are treated and so accidents spike up in nearby areas. (Adams does point out that it is possible that accident migration is not a separate thing from regression to the mean.)
Some related reading is the material on highway congestion in my book, Numbers Rule Your World (link). The problem of road congestion is slippery. If you successfully reduce the amount of traffic along one route, drivers will then change their habits to return that route to congestion - for example, they may have chosen to go along local roads before.