My friend Liliane pointed me to this HBR article about "machine learning." (link)
As a basic introduction, the article is pretty good. Here are 3 things you learned and 2 things you didn't learn from this article.
3 Things You Learned
- Machine learning is not the same as learning as you know it. At the end of step 5 ("Iterate"), the author proudly states "It has learned." But what does "learning" mean?
- Machine learning is an application of statistics. The author defines it this way: "machine learning applies statistical models to the data you have in order to make smart predictions about data you don’t have." The decision tree model predicts by estimating the probabilities of the target behavior within successively smaller subsets of the population. The probability is the proportion of members of a given subset who exhibit some specified behavior, which means it is computed by having replicates of members of the subset. Now, we can't have clones of Karl but we can have lookalikes of Karl. This is where "big data" comes into play.
- Machines are taught to learn by human beings. The modelers frame the problem, determine the variables used in the model, express the variables in the scale suitable for the model, interpret the probabilities computed by the model, and make decisions based on those probabilities.
2 Things You Didn't Learn
Like Nate Silver likes to say, the key to understanding data is probabilistic thinking, something completely absent from the decision tree description. Astute readers might wonder how the "machine" is able to make such precise decisions. In reality, the "machine" make mistakes. Did Karl really remain a subscriber and did Cathy quit? Attached to all those subsets in the bottom layer of the decision tree are estimated probabilities of defection. Cathy may belong to a subset that has an estimated 30 percent chance of quitting - that might be three times the overall defection rate of 10 percent, and thus places Cathy in the high-likelihood-to-defect category. But you know what, 7 out of 10 customers in that subset are projected not to defect. And we haven't even gotten to talking uncertainty about this uncertainty.
- There is one key human decision that went unmentioned. The machine can go as far as placing any customer in one of those subsets at the bottom layer of the decision tree; each subset has an estimated probability of defection. So, from left to right, the probabilities might run 2%, 3%, 5%, 6%, 10%, etc. No machine can tell the business manager what the cutoff probability is for inclusion in the marketing campaign.
I cover most of these points in my discussion of marketing data in Numbersense (link). In the short HBR article, there is perhaps not enough room to include these deeper issues but not appreciating these points leads to the failure to create value using these models.
Comments