At the start of the year, The Atlantic published a very nice, long article about Netflix's movie recommendation algorithm. You may remember this algorithm (internally known as Cinematch) received a $1 million makeover several years ago (the Netflix Prize), only that the prize-winning entry was deemed too complex--and does not generate sufficient incremental value--to be put into production.
The reporter, Alexis Madrigal, noticed that Netflix has shifted attention from the queue of recommended movies to providing (micro-)genres of movies you might be interested in. His article is a great example of powerful data journalism: he reverse-engineered the internal structure of Netflix's new algorithm by extracting all of the keywords ("About Horses", "Critically Acclaimed", "Visually Striking", to name a few), and then creating all sensible combinations of these keywords (e.g. "Critically Acclaimed, Visually Striking Movies About Horses"), producing the roughly 80,000 possible microgenres used by Netflix. (It's clear that Netflix management endorsed this exercise and article but it's not clear how much proactive support they provided.)
One of my favorite columnists, Felix Salmon, reacted negatively to the change in algorithms, titling his post "Netflix's Dumbed-Down Algorithm". He interpreted the change as foreshadowing the day when Netflix no longer could offer any movie any user places in his/her queue because the third-party content providers have ratcheted the costs too high. It's a longstanding weakness in Netflix's streaming business model.
Felix lamented that the genre-driven recommendations would be far inferior to the original recommendations:
The original Netflix prediction algorithm — the one which guessed how much you’d like a movie based on your ratings of other movies — was an amazing piece of computer technology, precisely because it managed to find things you didn’t know that you’d love. More than once I would order a movie based on a high predicted rating...
The next generation of Netflix personalization, by contrast, ratchets the sophistication down a few dozen notches: at this point, it’s just saying “well, you watched one of these Period Pieces About Royalty Based on Real Life, here’s a bunch more”.
***
Felix is right on the business model but misses the mark on the analytics. As someone who builds predictive models, I had the opposite reaction when reading The Atlantic's piece. I thought Netflix's data engineers learned something from the Netflix Prize "fiasco".
The major change to the analytical approach is shifting from predicting whether you'd like a movie to whether you'd watch a movie. This shift makes a lot of sense to Netflix as a business. It is sensible even from the user's perspective: since when is it that we never watch a bad movie? (Even the movies we place in the queue ourselves could turn out to be bad.)
One big problem with the Netflix Prize was its singular focus on the RMSE metric, which roughly speaking measures the average error of the predicted ratings against actual ratings. The ratings data, though, is extremely skewed, making an average error criterion worse than misleading. By skew, I mean (a) a very small number of popular movies receives the majority of the ratings and (b) a small number of highly active users contribute the majority of movie ratings. Put differently, missing data is far and away the most important feature of the data.
Because of missing data, it is next to impossible to get good predictions for niche movies (with few ratings) or for users who do not actively feed signals into the algorithm. Improving RMSE by 10 percent does not mean every user's prediction improved by 10 percent. The improvement is likely concentrated to user-movie pairings for which there is sufficient data to work with. It would be enlightening if someone does an analysis of the performance of the winning algorithms by segments of users (based on the amount of prior data to work with).
Now, consider predicting what you'd watch next based on the viewing behavior of you (and other users). For every user and movie combination, the user either have or have not watched the movie. Just like that, the missing-data issue vanishes. The result of what Felix sees as "dumbing down" may be a stoking up.
***
As I pointed out in Chapter 5 of Numbersense (in talking about Groupon's bid to personalize offers; link), every business faces a set of conflicting objectives when trying to "personalize" marketing to customers. I believe this Netflix shift shows they have found a good balanced solution.
Can the user-seen pair setup that does not suffer from missing data problem be treated as a special case of the user-rateing pair, but with seen movies with rating 1 and unseen movies with with rating 0?
Posted by: Max Lin | 03/05/2014 at 08:17 AM
Max: In terms of running the algorithm, you can do as you said. What you'll notice is that the matrix is now complete as opposed to very sparse in the case of ratings.
Posted by: Kaiser | 03/06/2014 at 01:58 PM
But doesn't the user-seen pair also suffer from not distinguishing between user - 'not aware movie existed' and user - 'chose not to watch it'
Posted by: Chris | 03/17/2014 at 03:20 PM
Chris: Awareness is a different concept. The proposed model is based on actual watching; aware but don't watch is grouped with not aware. You can of course build a more complicated model if you so desire.
Posted by: Kaiser | 03/19/2014 at 01:47 AM