In the second half of 2018, in the aftermath of the Cambridge Analytica-Facebook story, the news media have broken out of a stupor, and started realizing that the world of the Web and mobile is filled with fake data. I've written about this problem quite a bit. For example, my 2017 piece on the birth of the "fake news business" is still worth your time.
I also urged data analysts to recognize the amount of fake data that we are analyzing while looking the other way. This post describes some real examples of fake data discovered by journalists. Even earlier, in 2015, I authored a piece for HBR discussing Augustine Fou's work on uncovering ad fraud - driven by automated bots.
New York magazine just noticed that a lot of Web data are fake. This is a nice piece discussing the current state - it has only gotten worse. Among the trove of lies and damned lies mentioned in this article are fake web traffic, fake websites, fake clicks, fake mouse movements, fake social network accounts, fake cookies, fake video views, fake time on site, fake subscribers, fake video viewers, fake AI assistants, fake Instagram influencers, fake sponsored content (i.e. ads that pretend to be news), fake people, ... One inexplicable omission is the fake product review, which is probably the most commonly encountered species.
The tech industry is driving a lot of this and has not taken the proper actions to contain this problem before it goes out of hand. The fake data problem will evolve into a trust problem; in the last half of 2018, we start to hear rumblings of discontent.
Every benchmark ever created has been manipulated, so it is not surprising that web based ones are. Just now that all the advantages of technology can be applied to beating the system.
Posted by: Ken | 01/27/2019 at 03:34 AM