This is Part 3 of a multi-part post on student surveillance technologies. You many want to read Part 1 and Part 2 first.
In the Washington Post article on student surveillance, one of the featured use cases is predictive analytics. In short, the location data are merged with other data to “score” students. Scoring is a form of rating by means of a formula.
The most famous example of predictive scoring is the ubiquitous and consequential credit scoring system. The credit score is a prediction of whether an individual is likely to repay the credit card company which is lending him or her money to purchase something today (without any collateral). I consider it a fundamental example of big data analytics, and have an extensive discussion of credit scoring in Chapter 2 of Numbers Rule Your World (link).
[Image credit: Frankieleon on Flickr]
***
Misunderstanding predictive analytics is flooding the marketplace with lots of defective products that harm people. I'm quite concerned by the language used by the business folks in the Post article to sing the praises of predicting student well-being. These statements reveal unfounded optimism about the accuracy of predictive analytics, and a shallow grasp of cause-effect relationships.
One vendor presented how their predictive model works: “a student avoiding the cafeteria might suffer from food insecurity or an eating disorder”. This kind of overly simplistic analysis is giving data science a bad name.
In reality, an extreme minority of people who avoid the cafeteria will be suffering from an eating disorder. Most scoring algorithms work with not one but multiple predictive factors, and even then, such an app will trigger many false alarms.
Even more shocking is the deduction that “a student skipping class might be grievously depressed”. It’s not the lack of modeling skill that concerns me here; it is the lack of common sense. For each student who skips class because of “grievous” depression, there are tons of non-depressed students who skip class. The insertion of “grievous” makes it even worse.
Behind any such deduction is a cause-effect model. It should be clear that Skip Class -> Grievously Depressed does not fly.
This may be a fallacy of transitivity. Grievously depressed -> Skip class is likely to hold. But reversing the causal order does not work.
***
Flawed data science products have consequences. Welfare checks are wasted on large numbers of healthy students. Students rightfully demand to be treated as adults. The truly grievously depressed students meanwhile may be missed.
***
In Chapter 4 of Numbers Rule Your World (link), and Chapter 5 of Numbersense (link), I discuss ways to examine predictive models, and dealing with their inaccuracy.
Comments