I first wrote about the fake data menace here (that was 2017) and again (2018). Earlier this year, I discussed an article in the Hustle about fake Amazon reviews, and made a video about it.
This holiday season, Buzzfeed finally noticed. Here's a nice article detailing the scams that abuse online review platforms. Amazon (and other online retailers) prefer to look sideways for obvious reasons.
The reviewer featured in the Buzzfeed article has spent over $15,000 on "verified purchases" on Amazon - all of which were stuff that she bought in order to earn commissions after posting 5-star reviews. She wrote hundreds of reviews, not a single one true. She actually doesn't think she did anything wrong: "I'm just a pawn in their marketing scheme."
It's unfair to say Amazon and other retailers have no protections against fake reviews. The typical response to this problem is to set up community rules, and look sideways. If a case is raised to their attention, they will enforce the rule - but only on that case. For example, Amazon bans sellers from giving away free products in exchange for reviews but the ecommerce giant is not proactively seeking out offenders. (For example, there are many Facebook groups in which sellers and fake reviewers find each other. The article even mentions a rebate website where people can buy stuff from Amazon, wait for the return period to expire, and then get a full "rebate".) With a platform which is so vast, this mechanism allows the vast majority of fake reviews to persist.
Now that the reviewer has seen the sausage factory, her attitude towards the data is different. She confessed that she does not look at reviews when buying for herself: "When I see an off-brand product that’s Amazon’s Choice, [that label] doesn’t mean anything to me anymore."
There is zero financial incentive to fight fake reviews, for the simple reason that they work. These sham reviews are amplified by data science products - such as recommendation engines, which consider reviews as "data", the heralded unstructured "big data" - which push products that have large number of positive reviews (dubbed "relevant") to the top of search results, which inevitably are displayed in order of "relevance" by default. The top search results get clicked on the most, which creates a positive feedback loop keeping those products at the top. This is the "marketing scheme" mentioned above.
***
It's hard to come up with a solution to this fake data menace. For ecommerce, the harm done by the fake reviews is that other shoppers are duped into buying things that don't need, or things that are not what they appear to be. Also, honest merchants and brands lose market share. So you have this "little white lies" justification. Now, if these reviews are for medical services, then someone could get truly harmed by poor quality of care.
On the other side, lots of people benefit from the scams: the online retailers and the sellers all earn more revenues; the people who write fake reviews get free products and side gigs; friends of the fake-review writers get free handouts; suppliers of the sellers and retailers get more business; etc. Even data scientists arguably benefit as they have "more data" to work with.
Comment below if you have solutions!
Not perfect neither complete (I'm only taking a buyer perspective), but:
1) manual solution: read reviews giving 1-2-3 stars foremost;
2) automatic solution: give each review a "genuineness" or "informativeness" score and compute review mean accordingly.
For example: consider the review poor if the mean number of stars given in the last 12 months / 10 purchases (potentially weighted by price) by the reviewer is very high or very small with no at all or very low variance. Really do we want to listen about an always-enthusiastic or always-whiny buyer? So a fake reviewer is caught quickly.
If it's too much to hope Amazon takes care, it is possible to build a browser addon to implement it.
In other words: try to fight bad data with data.
Posted by: Antonio Rinaldi | 12/05/2019 at 01:48 PM