The New York Times featured a story about customer targeting recently. In particular, it describes an application of predicting which of Target's female customers may be pregnant. Pregnancy is considered a major life event during which customers may be more willing to shift their spending from one retailer to another.
I recommend reading the article to get a sense of what companies do with our data these days. Bear in mind it's written by a journalist who has a good but not firm grip on the details of statistical modeling.
In particular, I'd like to shed some light on the last two paragraphs of the article, which I reprint here:
On my way back to the hotel, I stopped at a Target to pick up some deodorant, then also bought some T-shirts and a fancy hair gel. On a whim, I threw in some pacifiers, to see how the computers would react. Besides, our baby is now 9 months old. You can’t have too many pacifiers.
When I paid, I didn’t receive any sudden deals on diapers or formula, to my slight disappointment. It made sense, though: I was shopping in a city I never previously visited, at 9:45 p.m. on a weeknight, buying a random assortment of items. I was using a corporate credit card, and besides the pacifiers, hadn’t purchased any of the things that a parent needs. It was clear to Target’s computers that I was on a business trip. Pole’s prediction calculator took one look at me, ran the numbers and decided to bide its time. Back home, the offers would eventually come. As Pole told me the last time we spoke: “Just wait. We’ll be sending you coupons for things you want before you even know you want them.”
Charles Duhigg, the author, does the same thing that other reporters do when it comes to writing about predictive models: there is no sense that these models can make any errors. The reason why Duhigg didn't receive "sudden deals on diapers or formula" at the Target store was interpreted as an instance of accurate prediction--that the computer figured out that he was on a business trip. The reason why he later would receive these offers "back home" was also interpreted as an instance of accurate prediction.
When earlier in the piece, Duhigg discussed the decision to dilute the message of the marketing materials targeted at women predicted to be pregnant, mixing in non-pregnancy-related products, the tactic was portrayed as a way to deal with the remarkable accuracy of predictive models. They even tell unsuspecting dads that their daughters are pregnant before they tell their parents!
It's unfortunate that the coverage of statistical modeling has been laced with such hype. I hate to pop the bubble but most predictions made by such models are simply wrong. These models may work on average but that doesn't mean individual predictions will be right. (In Chapter 4 of Numbers Rule Your World, I discuss why businesses may deliberately make certain types of errors if they want to maximize profits.)
What's more embarrassing? To send a brochure filled with pregnancy-related products to women who are not pregnant, or sending the same brochure to women who are indeed pregnant but are surprised that Target knows. You see, mixing in random products serves to hide the inaccuracy of the underlying predictions, the false positives.
Here's a simple way to see this: let's say 10 percent of the female customers are pregnant at any time. In order to find even 6 percent of the 10 percent,Target's model will predict,say, 12 percent of the women to be pregnant. Right there, there will be 6 percent incorrect predictions at the minimum. The model that I just posited is incredibly accurate: the base rate of pregnancy is 10 percent while the base rate of pregnancy within those targeted is 6/12 = 50 percent.
Duhigg's experience at the store cannot be explained by the pregnancy prediction model because his baby is already 9 months old. In those confusing sentences cited above, he presumed that Target has several other predictive models running, including one that predicts whether the customer is on a business trip, one that relates the purchase of pacifiers to buying formula or diapers, and one that predicts what the customer would buy at home by analyzing what the customer bought while traveling.
Chances are Target doesn't have all those models. Remember Duhigg himself told us that marketers have determined the 2nd trimester of pregnancy as the moment to target young couples. The implication is that it is very difficult to change their buying habits when the kid is 9 months old.