« Response to Kohavi's note on non-significance in A/B testing | Main | In times of paranoia, seek out data and stick to facts »


Feed You can follow this conversation by subscribing to the comment feed for this post.


It is even worse because most burglars commit more than one crime, so it might be 1 in a 1000 persons who are burglars. There is some good news, and that is that of people who are already convicted of burglary a much higher proportion will be repeat offenders. Prediction models probably work well on that group.

One point is that these models have all the same problems as with diagnostic testing in medicine but no one in the data mining/science community seems to understand them. Some of the methods of evaluating performance are wrong and would lead to apparently better performance than was the case.


Burglaries != burglars.

This example ignores many simple issues in order to contrive a straw man. Foremost, the model is part of a decision process that includes setting thresholds for acting, and many other process steps that can mitigate or exacerbate the model results. The model does not "point the finger". Humans take the actions by using the scores to decide what to do.

Overlooking the burglaries issue, if we have 770 expected events, then we could limit the action step to just 770 events. Less people should mean less false positives, but also less true positives.

What we need, what all models need, is a good economic model of the errors to chose better thresholds.

After that, the police and elected officials & lawmakers(!) can decide an appropriate follow-up to the scores. Maybe extra surveillance on some people, but not necessarily precautionary incarceration.

Maybe if the prediction was really good, then the burglar needs only to be diverted at the time the burglary is predicted? Like give him an education or have a Koch bro loan him $1000?


Chris: Thanks for the discussion. I am not disagreeing with your comments. My concern is on the difficulty/impossibility of measuring the actual impact of such algorithms. How many people were flagged? How many of these would not have been flagged by traditional methods? How many flagged by traditional methods would not be identified by the algorithm? How many were treated in what ways? How many were false positives? How many were false negatives? etc. If we can't measure the performance in real life, we can't assess its impact properly, and we can't improve these algorithms. And answering those questions I just listed takes a lot of effort and genius. In a follow-up post, I explored some ways of measuring them. Has any municipality or vendor published detailed statistics on the performance of these algorithms? Would be happy to look at them.


Thanks Kaiser. I commented here before reading Part 2 where you do answer some of my concerns. I work in health care on models of event prediction so I don't have first hand knowledge of these recidivism models, but health care has the same issue with counter-factual evaluations. We cannot measure how a person would have progressed without a treatment if they in fact received the treatment. We can only search for "twins" and evaluate their futures. Yes we need to answer all those questions you raise in the data before implementation.

But I do think the solution is to look beyond the algorithm into the entire decision process that embeds the algorithm. There is a deeper moral question here as well, one well covered by ProPublica and various rebuttals on the COMPAS model. That question is "to what extent should we use historical data with known systematic bias to make these life altering decisions?"

Thanks for all the time you put into sharing these important questions. Its not easy to target as wide an audience as you do.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep