There is hope in our business journalism! That's the message I'm getting as I delight in the current stream of articles that take a critical look at the inherent unaccountability of technology giants. These stories have been around for years, even decades, but better late than never.
Uber drivers gaming the pricing algorithm (Wall Street Journal)
Google's Android tracks you everywhere whether you allow it or not (Bloomberg)
Finally some coverage of the shame of IBM's Marketing (Wall Street Journal)
As blog readers know, most companies in the Web/mobile ecosystem use an advertising-supported model. Even the most popular products like Youtube, Google Maps, and Facebook have not found it possible to charge users anything (when in theory, users derive value from such products). Such digital advertising, since the beginning, has been heralded as inherently more accountable because it was supposedly more measurable. Just look at the mountains of data that are generated by the very act of putting up ads on a website. You can even trace a user's final action, say the purchase, back to various interactions prior to the action. (This data is called the "click stream".)
The marketing on digital marketing centers on its "efficiency," achieved through automating the process, taking human beings out of the equation, and trusting the black box machines. This system is flawed even in its most theoretical form, assuming "complete data". The clickstream and other digital data have multiple problems. Assigning causality is highly suspect. I wrote about this years ago - for example, here.
The opacity of the black box machines offers not just efficiency but also shelter for fraud. Any kind of verification, quality checking, common-sense making slow the system down. The more efficient is the system, the easier and faster it is to commit fraud!
What kind of fraud are we talking about? Fake everything! Fake clicks, fake impressions, fake ads, fake viewers, fake plays, fake followers, fake accounts, fake likes, fake webpages, you name it. Since advertisers are paying money for clicks, impressions, ads, viewers, plays, followers, accounts, etc., and most advertisers have neither the incentives nor the nerves to look under the hood, they are paying real dollars for all the fakes.
The much-lauded efficiency of the black-box advertising model sits on a pile of fake data. The system is seriously flawed assuming complete data; it is worse when we admit that not only do we not possess complete data, but we are drowning in fake data.
Now, data scientists are hard at work building even more black-box machines that promise to suss out the fake stuff. To understand how this might work, think about spam emails, an analogous problem. In the machine-learning world, spam detection is thought of as a "solved" problem. Spam filtering algorithms have existed for decades. Think about how many fake emails continue to sit in your inbox, and in addition, how many not-fake emails land in your spam folder - and you can begin to see why technical solutions are not potent enough to drive out fraudsters.
Below are some comments on selected parts of the latest New York Times article on fake Youtube views (here).
- Youtube stars can make millions from advertisers by publishing viral videos. Google which owns Youtube makes tons of money by having people upload their content to the platform and charging rent for it. Advertisers ultimately pay for those views. Youtube stars, and by extension, Google, can make lots of money from fake views, i.e. bots (software) pretending to play the video. Third parties approach wannabes to sell them fake views and take a cut of the action.
- It's a cat and mouse game that the fraudsters are winning. They have the upper hand because the products are often designed without assuming nefarious players gaming the system. Adding software to deal preemptively with nefarious players slows down tech startups trying to build an initial market. Preventing bad players from acting badly imposes (the externality of) restrictions for good players, which is another disincentive to tackle the problem.
- Youtube officially claims: "fake views represent just a tiny fraction of the total." (later defined as 1 percent). This is a guess masquerading as fact. To know what fraction of the views on Youtube is "fake" requires knowing whether each view is fake or not. In theory, one can review a sample of the views, and then infer the fraction in the entire catalog from that sample. In practice, it is not easy to prove if a view is fake just from the data collected about the viewer's browser, OS, etc. A machine-learning algorithm will "predict" if a view is fake or not but such predictions should not be used as "ground truth".
- Legitimate organizations buy views, including marketing agencies, news organizations and musicians. As a former employee said, "View count manipulation will be a problem as long as views and the popularity they signal are the currency of YouTube." It's not just YouTube - we know that when teachers are compensated based on student test scores, they become complicit in abetting their students to cheat.
- This underground economy is quite well developed now, with wholesalers and resellers of fake views. If you are a reseller, you don't even need to know how to generate fake views. You just buy fakes from the wholesalers and charge a markup!
- The sellers of fake views are so sure they can evade Youtube's detection technologies that they are openly talking about their clients, business models, etc. to the press. Either they know those technologies don't work, or they know that Youtube isn't serious about cracking down.
- Several buyers interviewed by the Times said they truly believed the marketing spin of the fake-view sellers, that they were driving real users to watch their videos. This is a tad disingenuous. As one writer who purchased views pointed out, those purchased views led to no sales, and she couldn't get any data about who viewed the videos.
It's actually very easy to demonstrate fraud as the fake views are barely hidden. The article came with a great graphic, which does not appear to be addressed directly in the text. So let me explain.
The reporter purchased fake views from the vendor called "BuyYoutubViews". The top chart shows that close to 100% of those views ("in campaign") played the entire video (over 5 minutes long). The bottom chart shows that when they turned off the "campaign" i.e. stopped buying fake views, almost no one plays the video for more than a few seconds. (What is not shown is that the number of views probably plunged after the fake-views campaign was shut down.)
Raise your hand if you think "BuyYoutubViews" is finding real people to watch the video!