Gillian Brockell at the Washington Post published a heart-felt essay pointing out the inhumanity of algorithmic (i.e. programmatic) advertising on social-media platforms like Facebook and Instagram (owned by Facebook). It's getting some attention in tech circles, which is a good thing. (Thanks to reader Antonio R. for tweeting it to me.)
Here's a tl;dr of her story: she works at the Post as an video editor, and is very active on social media, especially Facebook. Sadly, she recently suffered a stillborn. She found that Facebook continued to bombard her with advertising of pregnancy products, which kept bringing up sad memories. When she clicked on the "I don't want to see this ad" icon, the advertising did not stop but just switched gears, now assuming that she needs post-childbirth products.
She's social-media savvy enough to realize that the advertising platforms perfected by Facebook and Google are inhuman, driven by algorithms, based on silent, pervasive collecting of personal details that are either shared publicly, shared within semi-private communities, or transacted privately. Advertisers are relying on the scoop of data to auto-pilot their advertising campaigns, most of which define the goal as ad clicks.
***
The zeal of the tech industry to sell more servers, more processors, more boxes, more tools has led to a level of over-confidence in predictive algorithms that underly Gillian's dissatisfactory experience. Such enthusiasm is matched by the advertisers' hunger for more clicks, and more sales while ignoring the tenuous link between those clicks and sales. As data scientists, it's easy to get caught up in the adulation and, using Nassim Taleb's phrase, get fooled by randomness.
It is widely held that predictive algorithms are all-knowing, personalized, and omnipotent. The reality is not so clearcut. Let's take a deeper look at how algorithms work, and from there, maybe the data-science community will come up with alterations that can reduce the chance of dissatisfaction.
***
ALL-KNOWING
Algorithms are not all-knowing despite our tendency to believe they are so. Each algorithm is driven by a fixed set of data inputs. In Gillian's article, she suggested some possibilities, such as the tags on her Instagram, the contents of her Facebook posts, or her friends' posts, Google searches, "metadata" on Amazon wishlists, etc.
All of those could be inputs in someone's algorithm but each such item has to be meticulously captured by code. Collecting every additional item requires more coding. How much is collected depends on the culture of the development team. Some are extremely zealous, others draw the line.
Every algorithm then weighs the importance of different data inputs. These weights ultimately control the actions of the algorithms. Their determination is a guessing game. It's like what factors you'd use to decide who should be a Hall of Famer in sports. No two people can agree, neither can two algorithms.
***
PERSONALIZED
No algorithm can be truly "personalized", as in one-to-one personalized. One-to-one is incompatible with Big Data. If the algorithm tailors recommendations to you just based on what it knows about you, then it likely will make lots of mistakes. Data available at the individual level is sparse, and riddled with holes.
All algorithms leverage statistical averages. It's much easier to predict what movie the average teenager would watch than the preference of each particular teenager. Most algorithms work like this: if the teenager discloses all the movies that s/he watched in the past, then that personal history will form the basis for the recommendations, which can be quite accurate; but for most people, that level of detail is not present, so the algorithm falls back on statistical averages - what the average teenager watches.
Even within individuals, taste changes over time. So even for the data-hoarding teen, the recommendation derives from a "moving" average of his or her past record, in other words, the average over time of his or her past viewings. Because of this, the longer the history, the less sensitive is the algorithm to recent data. The GPA is a good analogy. It's much harder to move your GPA in your senior year than in your sophomore year because of the accumulation of grades.
(PS. This is a call for the industry to take up the issue of right to be forgotten. Deleting old data not only pays respect to your users but also removes a source of error from these algorithms!)
***
OMNIPOTENT
The most damaging myth about predictive algorithms is their supposed omnipotence. Almost every report in mainstream media about predictive algorithms makes assertions such as "Google can predict when you will die", and "IBM can predict who's a good employee". What do they mean by "can predict"? Readers are led to believe that "can predict" means "will predict", or "will predict accurately".
In reality, there is no black-or-white, no can or cannot, no accurate or not accurate. Accuracy is important, is much lower than advertised, and is not a binary state but a progression.
Predictive algorithms have been around for a long long time. Think weather forecasts. If I say, "the weather channel can predict the weather", you do not presume they will predict every day accurately. Some of us might even think "well, they can barely predict the weather. They have the ability to issue forecasts, but that doesn't mean the forecasts are accurate or useful."
***
There are several other important aspects of predictive algorithms, which I delve into in my two books. For example, accuracy is measured several ways, and always involves a trade-off between false positives and false negatives. The algorithms behave as per the incentives of their owners. The chapter on steroids testing, terrorist prediction, polygraphs in NUMBERS RULE YOUR WORLD (link) tackles this.
Also in NUMBERSENSE (link), the chapter on Target's model to predict pregnancy walks through how one measures the accuracy of predictive models, and how incentives play a role. Nate Silver's Signal and Noise book is also recommended - he has a section discussing the accuracy of weather forecasts, and how they are systematically biased (due to incentives).
Recent Comments