Here is a problem staring many digital/Web/social media analysts in the face today: what if you are told that the majority of the data you have been dutifully reporting, analyzing and (gasp!) modeling are fake data?
By fake data, I mean, useless numbers that have no bearing on reality: visits to websites that never happened, clicks on ads by hired hands, clicks on ads by bots, clicks on ads that are buried layers deep invisible to any humans, video "views" that result from automatically playing clips, video "views" that last one second, ad reach (i.e. number of people who have seen the ad) that exceeds Census counts, reviews planted by hired hands, etc. etc.
Every one of the above is not fictional but the reality of the uncontrolled and unaudited, increasingly machine-driven and complex, secretive world of digital advertising. All major players - Google, Facebook, Microsoft, ad networks like AppNexus and Mediamath - are implicated.
I raised the alarm two years ago in an article at Harvard Business Review, featuring the work of leading ad fraud researcher Dr. Augustine Fou. Recently, there is tidal wave of news reports about all kinds of ad fraud and fake data.
Here are a selected few links to get you started:
I have invited Dr. Fou to comment on this fast-developing situation in the Principal Analytics Prep Webinar on Wednesday night. Learn more about the Webinar and register for free here.
The focus of most news items are from the perspective of brand advertisers who belatedly are waking up to the huge amount of dollars wasted. And a big story is being missed. Such waste was enabled by massive amounts of data that we now know are fake.
What about the zillions of reports, analyses and models created over the last 20 years by countless data "scientists" and analysts, in which the data from Google, Facebook, and myriad digital marketing vendors are taken at face value as accurate?
In fact, the digital advertising industry was built on the promise that it is more measurable, more accountable and more cost-effective. What Dr. Fou shows is that only basic statistics is needed to uncover such fraud.
Data cleaning is a huge time sink already without fake data - now, we have to wrestle with mountains of fake data. But that is the reality, and we have to rise up to it.