This Wall Street Journal article about predictive modeling in the workplace is well worth reading. Your reaction may vary from excitement to stomach churning. (link)
As with most reporting on analytics, the author never addresses the issue of model errors. For example, this article started out with:
When looking for workers to staff its call centers, Xerox Corp. used to pay lots of attention to applicants who had done the job before. Then, a computer program told the printer and outsourcing company that experience doesn't matter.
The fun may be squeezed out if we want a more balanced story. I get that. But surely, someone should ask if this "computer program" makes no errors. Because if you read this style of reporting, you would think every prediction is correct.
The reporter does get into the creepiness and the legal murkiness of using such models but those discussions hover over the unspoken assumption that the predictions issued by these computer algorithms using "big data" are infallible. It's one thing if the model excludes, say, women from being hired because well, every woman fails in that particular job; it's another thing if the model excludes women erroneously! (I can already see the "I took steroids but look, it didn't help." excuse.) Later in the article, we are told that a startup company has created the model of "an ideal call-center worker". The fact that there could be an "ideal" reflects a level of over-confidence in models that bothers on insanity. (And I'm someone who builds models for a living.)
As you read the article, bear these points in mind:
- In Chapter 2 of Numbers Rule Your World, I talk about models of correlations versus models of causation. In my view, when people's livelihood is at stake, a correlational model is not good enough. All these models described in the article are correlation based. The factors they use are things like commuting distance, and social networks. There is a serious danger of "causation creep" here.
- An analogy: if you look at crime statistics, it is certainly true that African Americans are more likely to be criminals. If you build a crime prediction model, race will be a strong predictor even if it's not the only predictor. A lot of us are uncomfortable with this kind of racial profiling.
- A technical note: this type of models suffers from "rejection inference" (terminology from the credit scoring world). I didn't find space to talk about rejection inference in Chapter 2 but I might as well say something here. The output of such predictive models is used to make a hiring decision. The candidate can either be hired or rejected. If the candidate is hired, the company can track his or her performance and make a determination as to whether, ex-post, the hiring decision was good or bad. If the candidate is rejected, the company has no way of knowing whether the rejection was just or not. So when any company collects data to build a predictive hiring model, the dataset is already biased, as it does not contain candidates who would have been rejected.
That's kind of creepy. I'm not (entirely) doubting the ability for a survey to accurately predict which candidates are less likely to file for disability claims, I'm just wondering on what basis do the survey developers and business owner suggest that such claims are illegitimate. Some workers may have higher feelings of loyalty to their companies such that they are less likely to file disability claims, but that a company would rather higher "yes men" than payout potentially legitimate claims caused (on the job) to their workers should be an ethical concern to the survey developers, the article’s author, and WSJ's readers.
I question the degree to which we, as a society, privilege quantitative data and prediction models. We seem to accept all such products as if they were providing given, a priori observations requiring no further dissection, investigation, or rigor. We're going to get ourselves into a lot of trouble soon, I'm afraid.
Posted by: Jordan G | 10/12/2012 at 12:01 PM
Jordan: I haven't read Nathan Silver's book but it seems like he's bucking the trend to tell us why most predictions fail.
Posted by: Kaiser | 10/16/2012 at 12:16 AM
Surely it's self-defeating to create a model of the ideal candidate? Sure, you can create a list of ideals and compare applicants against it. If you ignore experience there, you'll get no experienced people because they're more expensive.
Presumably they have measured existing employees and done the statistical analysis on what were the strongest predictors of quality. That then hinges on what factors were measured -- was race one of them? It also depends on the appropriateness of the quality measure.
If you measure the number of calls handled per hour I can show you a room full of monkeys that will make your model ecstatic.
Posted by: Phil H | 10/19/2012 at 06:39 AM