There has been a barrage of negative publicity related to Uber recently. The latest salvo is a long article in the New York Times (link). This piece focuses on Uber's CEO, who was trained as a computer engineer, but my interest lies primarily in several revelations about how Uber collects and uses customer data.
The key episode picked up by various outlets (e.g. TechCrunch, Wired) involves Uber "secretly identifying and tagging iPhones even after its app had been deleted and the devices erased." What Uber engineers did was against Apple rules, and they knew it because they also implemented an elaborate cover-up operation but Apple eventually discovered the ruse, and CEO Tim Cook called Uber to task.
Much of the reporting on this episode, whether in the Times or Wired, etc., miss the mark. These reporters seemed to have received Uber's side of the story from their PR team, and just printed it without asking the tough questions. Uber claimed that their rule-breaking code was intended to "prevent fraud," and suggested that it is a standard practice used by many other app developers.
If those assertions were true, then Apple would have summoned a lot of developers to Cupertino! Uber clearly went beyond what other app developers were doing.
Also, it is not at all clear how this alleged fraud detection scheme works. Here are the details offered by the Times:
At the time, Uber was dealing with widespread account fraud in places like China, where tricksters bought stolen iPhones that were erased and resold. Some Uber drivers there would then create dozens of fake email addresses to sign up for new Uber rider accounts attached to each phone, and request rides from those phones, which they would then accept. Since Uber was handing out incentives to drivers to take more rides, the drivers could earn more money this way.
To halt the activity, Uber engineers assigned a persistent identity to iPhones with a small piece of code, a practice called “fingerprinting.” Uber could then identify an iPhone and prevent itself from being fooled even after the device was erased of its contents.
As someone with an engineering degree, I don't understand what those words mean. First, it seems that erasing the device does not remove the Uber-added piece of code, which is perhaps more of an uncomfortable question for Apple than for Uber. Second, am I to believe that there are no legitimate, refurbished iPhones in use? That last sentence sounds like every iPhone that switched users is a fraud. Third, there are many other ways to detect fraud - those fake user accounts have to be tied to credit cards, for example. Fourth, if a promotion encourages gaming and fraud, to the extent that a major software development effort, including a cover-up operation, is needed to support it, maybe the promotion should be retired, no?
Readers should definitely read behind the lines here. The most noteworthy items weren't even mentioned.
First, Uber is after personally identifying data. If we believe that fraud detection was the original intent of such unruly fingerprinting, then Uber wants to know which customer is using a specific phone, which implies that the data being collected must identify individuals (as well as individual phones).
Second, while Uber claimed that the code was developed for fraud detection, nowhere did it deny that the collected data have been used for other purposes, such as marketing. Such data is extremely useful for acquiring new customers. It tells Uber which phones do not have their app installed. It is as if the first owner of the iPhone who installs the Uber app places every future owner of that iPhone onto Uber's prospect list.
The data is also extremely usesful for "winback" marketing efforts. This piece of code is installed on every user who deletes the Uber app, as Uber has no way of knowing if the iPhone will change hands at the time that the Uber app is deleted. So everyone who deletes the app can be tracked and presumably sent winback communications.
It is debatable whether consumers care about being tracked 24/7, and I don't pretend to speak for everyone. Uber is not the only company that has developed software to follow their customers' every move. What is clear though is that the engineers who write the code to execute these concepts - at Uber or elsewhere - believe that the consumers do not want to be tracked 24/7. However, they choose to do it anyway.
We know they know we don't want to be tracked. That's because the tracking code is typically hidden from view, and there is a general disclosure buried in Privacy Notices or Terms and Conditions that everyone knows nobody reads. If no one cares about tracking, then it can be performed in the open. Similarly, the cover-up operation to hide the illicit code from Apple engineers reveals that the engineers knew they were breaking the rules, and they did it anyway.
The Times article contained another revelation. Uber buys data from a startup called Slice Intelligence, who resells data from Unroll.me. Unroll.me runs a free service that helps people get rid of the clutter of spam in their email boxes - you grant permission to the company to peek into your mailbox, and pull out the unsubscribe links from various email lists. Well, it turns out that this service is a front for corporate espionage. Once inside your mailbox, their code gathers data about your purchases, and sells the data to companies: in this case, Uber buys data from Slice about its competitor, Lyft.
Here is actually one of the unspoken secrets of this "big data" industry. Unroll.me is one of many, many apps that are designed to collect data about our daily lives while fronting to be something else. I am pretty sure that the various receipt scanning apps (for expense reporting) are doing the same. I was told that weather apps are location databases that track all their users everywhere.
Again, it appears that the founders, managers or engineers who work for these outfits assume that their customers do not want to be tracked in such a manner because all such operations are hidden from view, and any disclosure is usually buried inside legalese that almost no one ever reads.
Slice Intelligence is hiding behind the weasel word "anonymized," which it explains as "no customer names." Usually, the deniers say they do not provide any "personally identifiable information" (PII), which would include phone numbers, addresses, emails, customer ids, etc. If any of those items are attached, not providing names is meaningless.
It is highly likely that Slice customers want personally identifying data - that is the key to connecting the dots. Analysts want to match John Smith on dataset A to John Smith on dataset B. If one of these datasets is truly anonymized, it will be very challenging, if not impossible, to correlate the data.
The "cover" of anonymized data is archaic, and I am surprised it is still in use. It's been proven a number of times that analysts can recover people's identities easily, even in datasets that are stripped of PII. Say, you are a loyal Uber customer, and pretty much take Uber cars everywhere. If I am given all of your trip information, origins, destinations and time of travel, I can immediately figure out where you live and where you work. From there, I will likely be able to identify you. Then I can look at the other trips to build a profile of your preferences, by analyzing what stores you shop at, what restaurants you eat at, how much you tip, etc.
Data sleaze is the data about one's own customers that are obtained secretly by businesses, and then sold to the highest bidders, also in secret transactions. The production of data sleaze is frequently justified by giving services away for "free." However, running a business as a "free service" fronting a profitable espionage operation is a choice made by management teams, not an inevitability. Indeed, many businesses that have a proper revenue model also produce data sleaze.