I've talked about "fake data" before. A lot of fake data come from people trying to game algorithms or skew metrics, and oftentimes, automated bots are involved. Attempts to obscure these tactics typically involve creating layers of complexity so it is not easy to connect the dots.
I come across suspicious data all the time, and it's not always clear what's going on or why. So I thought I'd feature some of them here and see if anyone can figure it out.
***
This account on Yelp caught my attention because this user apparently uploaded a photo of her desktop to one of the restaurant pages. She has uploaded a total of four photos, all of them are unrelated to food. The four photos were uploaded to two New York City restaurants while she indicated that she lives in San Francisco. She did not review either of those NYC restaurants but she has written one review for a cafe in Long Island City. The review seems genuine (although it's hard to tell unless you've been to that cafe).
She has five friends. While she lives in San Francisco, these friends live in Manhattan, Brooklyn, Scottsdale and Oceanside. She has no friends in the Bay Area. None of these friends have ever written a single review and have no likes. However, each of these friends have 100-400 friends. It's not clear why one would be friends with someone on Yelp who has no reviews or likes.
***
Is this account fake? If so, why was it created? How did those photos get uploaded? How did they get placed in those particular restaurants? Who are these friends? Are they fake as well? If the account is fake, was that review also fake? Is it possible to predict that the review is fake?
So many questions, and so hard to get answers. What do you think is going on?
Some of it doesn't seem odd at all. Let's say when Grace joined yelp she was living in NY but then soon moved to San Francisco. It'd make sense that her review is in NY while her (current/last self-reported) location is elsewhere. It doesn't report her historical location at the time she wrote the review. I'd also ask questions about what it means to be "friends" on yelp. Is there an acceptance process or could these other users who might rightfully be fake follow Grace without her consent (like Instagram can work for public profiles) and thus be fake without her caring. Understanding the data generating process would help us understand how much "verification" of the presented data might have occurred (eg, Grace having to accept these people as friends).
It seems more likely to me that Grace is real and wrote a real review, while the other users with large friend counts might be the fake accounts.
Posted by: Adam Schwartz | 08/21/2019 at 08:57 AM
Yelp has the option of adding friends from Facebook. I expect that means that some people try Yelp, ad all their Facebook friends and never use Yelp again.
Posted by: Ken | 08/23/2019 at 03:11 AM
Ah, see I didn't know that @Ken. Good insight! So having lots of friends on Facebook may be relatively meaningless on Yelp even if technically the large counts show up there. Though fake accounts may well exist on Yelp, perhaps the simpler explanation should dominate here. Though I like Kaiser's overall point I think he's getting at - how much work would an analyst have to do to believe they have "good" data if every user is a research project like this one? It'd be exhausting.
Posted by: Adam Schwartz | 08/27/2019 at 09:50 AM
Does this mean you can add a Facebook friend without the friend agreeing to be a Yelp friend? If the friend doesn't have a Yelp account, does Yelp set up an account automatically?
Posted by: Kaiser | 08/27/2019 at 10:29 AM
It's a good question. Would have to experiment with linking Yelp to my Facebook account, but I generally try to avoid linking anything to Facebook explicitly. Why make it easier on them?
Posted by: Adam Schwartz | 08/29/2019 at 10:08 AM
AS: That's one of my tactics too. Yes, they probably can link you but why make it easier?
Posted by: Kaiser | 08/29/2019 at 10:59 AM
Kaiser, fairly sure I learned it from you. :)
Posted by: Adam Schwartz | 09/04/2019 at 01:47 PM