Hits and misses 2
In the previous post, we discussed how charts need to address the key question posed by the data. In this case, the journalist was trying to show that police shots often go errant, and are largely unpredictable even when the distance of the target is given.
In the comments, there is interest in seeing the hit rate v. distance chart. Because the data came to us in buckets, we do not have enough to continue the analysis. If one were to guess, the real curve would start out with 100% accuracy at distance 0, fall sharply to a plateau in the 20-40% range at modest distances, and then drop again at large distances, decaying to zero.
Andrew Gelman has conducted this analysis for a similar problem, that of predicting accuracy of golf putts based on distance from the hole. Here are two key charts from his paper (joint with Deborah Nolan):
The left chart is our hit rate chart above, except the golf data set is larger, allowing a curve fitting. The right chart is the fitted curve which is a "model" for the true relationship between accuracy and distance from the hole. The model fitted the data well.
Gelman and Nolan didn't just find any best fitting line through the data. They started out with a trigonometric model (shown on the right), with the angle of the putt as a random variable. With this setup, they wrote down the formula for computing the probability that the putt will fall in, that is, the proportion of success. The angle is assumed to follow a normal distribution with the standard deviation being an unknown parameter. The standard deviation is estimated from the available data.
Of course, the human body is a bit harder to model than the hole in the ground but this procedure could very well apply.
For more details, check out the paper (PDF). This example is also found in their book on teaching statistics.
Source: Gelman and Nolan, "A Probability Model for Golf Putting".

















Recent Comments