There has been a barrage of negative publicity related to Uber recently. The latest salvo is a long article in the New York Times (link). This piece focuses on Uber's CEO, who was trained as a computer engineer, but my interest lies primarily in several revelations about how Uber collects and uses customer data.
The key episode picked up by various outlets (e.g. TechCrunch, Wired) involves Uber "secretly identifying and tagging iPhones even after its app had been deleted and the devices erased." What Uber engineers did was against Apple rules, and they knew it because they also implemented an elaborate cover-up operation but Apple eventually discovered the ruse, and CEO Tim Cook called Uber to task.
Much of the reporting on this episode, whether in the Times or Wired, etc., miss the mark. These reporters seemed to have received Uber's side of the story from their PR team, and just printed it without asking the tough questions. Uber claimed that their rule-breaking code was intended to "prevent fraud," and suggested that it is a standard practice used by many other app developers.
If those assertions were true, then Apple would have summoned a lot of developers to Cupertino! Uber clearly went beyond what other app developers were doing.
Also, it is not at all clear how this alleged fraud detection scheme works. Here are the details offered by the Times:
At the time, Uber was dealing with widespread account fraud in places like China, where tricksters bought stolen iPhones that were erased and resold. Some Uber drivers there would then create dozens of fake email addresses to sign up for new Uber rider accounts attached to each phone, and request rides from those phones, which they would then accept. Since Uber was handing out incentives to drivers to take more rides, the drivers could earn more money this way.
To halt the activity, Uber engineers assigned a persistent identity to iPhones with a small piece of code, a practice called “fingerprinting.” Uber could then identify an iPhone and prevent itself from being fooled even after the device was erased of its contents.
As someone with an engineering degree, I don't understand what those words mean. First, it seems that erasing the device does not remove the Uber-added piece of code, which is perhaps more of an uncomfortable question for Apple than for Uber. Second, am I to believe that there are no legitimate, refurbished iPhones in use? That last sentence sounds like every iPhone that switched users is a fraud. Third, there are many other ways to detect fraud - those fake user accounts have to be tied to credit cards, for example. Fourth, if a promotion encourages gaming and fraud, to the extent that a major software development effort, including a cover-up operation, is needed to support it, maybe the promotion should be retired, no?
***
Readers should definitely read behind the lines here. The most noteworthy items weren't even mentioned.
First, Uber is after personally identifying data. If we believe that fraud detection was the original intent of such unruly fingerprinting, then Uber wants to know which customer is using a specific phone, which implies that the data being collected must identify individuals (as well as individual phones).
Second, while Uber claimed that the code was developed for fraud detection, nowhere did it deny that the collected data have been used for other purposes, such as marketing. Such data is extremely useful for acquiring new customers. It tells Uber which phones do not have their app installed. It is as if the first owner of the iPhone who installs the Uber app places every future owner of that iPhone onto Uber's prospect list.
The data is also extremely usesful for "winback" marketing efforts. This piece of code is installed on every user who deletes the Uber app, as Uber has no way of knowing if the iPhone will change hands at the time that the Uber app is deleted. So everyone who deletes the app can be tracked and presumably sent winback communications.
***
It is debatable whether consumers care about being tracked 24/7, and I don't pretend to speak for everyone. Uber is not the only company that has developed software to follow their customers' every move. What is clear though is that the engineers who write the code to execute these concepts - at Uber or elsewhere - believe that the consumers do not want to be tracked 24/7. However, they choose to do it anyway.
We know they know we don't want to be tracked. That's because the tracking code is typically hidden from view, and there is a general disclosure buried in Privacy Notices or Terms and Conditions that everyone knows nobody reads. If no one cares about tracking, then it can be performed in the open. Similarly, the cover-up operation to hide the illicit code from Apple engineers reveals that the engineers knew they were breaking the rules, and they did it anyway.
The Times article contained another revelation. Uber buys data from a startup called Slice Intelligence, who resells data from Unroll.me. Unroll.me runs a free service that helps people get rid of the clutter of spam in their email boxes - you grant permission to the company to peek into your mailbox, and pull out the unsubscribe links from various email lists. Well, it turns out that this service is a front for corporate espionage. Once inside your mailbox, their code gathers data about your purchases, and sells the data to companies: in this case, Uber buys data from Slice about its competitor, Lyft.
Here is actually one of the unspoken secrets of this "big data" industry. Unroll.me is one of many, many apps that are designed to collect data about our daily lives while fronting to be something else. I am pretty sure that the various receipt scanning apps (for expense reporting) are doing the same. I was told that weather apps are location databases that track all their users everywhere.
Again, it appears that the founders, managers or engineers who work for these outfits assume that their customers do not want to be tracked in such a manner because all such operations are hidden from view, and any disclosure is usually buried inside legalese that almost no one ever reads.
***
Slice Intelligence is hiding behind the weasel word "anonymized," which it explains as "no customer names." Usually, the deniers say they do not provide any "personally identifiable information" (PII), which would include phone numbers, addresses, emails, customer ids, etc. If any of those items are attached, not providing names is meaningless.
It is highly likely that Slice customers want personally identifying data - that is the key to connecting the dots. Analysts want to match John Smith on dataset A to John Smith on dataset B. If one of these datasets is truly anonymized, it will be very challenging, if not impossible, to correlate the data.
The "cover" of anonymized data is archaic, and I am surprised it is still in use. It's been proven a number of times that analysts can recover people's identities easily, even in datasets that are stripped of PII. Say, you are a loyal Uber customer, and pretty much take Uber cars everywhere. If I am given all of your trip information, origins, destinations and time of travel, I can immediately figure out where you live and where you work. From there, I will likely be able to identify you. Then I can look at the other trips to build a profile of your preferences, by analyzing what stores you shop at, what restaurants you eat at, how much you tip, etc.
***
Data sleaze is the data about one's own customers that are obtained secretly by businesses, and then sold to the highest bidders, also in secret transactions. The production of data sleaze is frequently justified by giving services away for "free." However, running a business as a "free service" fronting a profitable espionage operation is a choice made by management teams, not an inevitability. Indeed, many businesses that have a proper revenue model also produce data sleaze.
Regarding the "persistent identity" I suspect this is done using something about the device that is persistent beyond the erasure of software. For example, the MAC address (it's similar to the IP address) of the network card is persistent and I believe unique. The operating system allows a developer to query that number for a variety of reasons. Obviously it also has the effect of uniquely identifying the device, even if wiped of software.
It may be that Uber isn't so much writing software that persists beyond wiping the device as it is possible for them to associate a certain device with having previously had an Uber account. When the device is wiped and the Uber app reinstalled, they can still query the hardware for that number and then re-associate that phone with a prior Uber account because of that persistent number.
Posted by: Adam Schwartz | 04/26/2017 at 09:58 AM
As other posters have pointed out, Uber did not all unique code to the device that is persistent when erased, instead, it accessed a unique device ID (such as the IMEI) in violation of Apple's Terms of Service.
Posted by: Cody L. Custis | 04/26/2017 at 05:10 PM
From experience I have found that I have to do some work to find out what is meant by "anonymized". For many it's "We deleted anything that directly identifies you e.g. names, address, SSN and phone number". Of course, as you explained above, it doesn't take much data to identify you. I prefer "confidentialised" myself.
Posted by: Richard Penny | 04/26/2017 at 06:44 PM
AS, CC: Thanks for clarifying. Then, they probably had the ID saved in their databases which they could reference later, as opposed to having some persistent "code" (something like a cookie) installed on the phone itself.
RP: Yes, anonymized typically means no PII, which has a legal definition. However, the reporters clearly stated that Slice uses anonymized to mean no customer names. And yes, users can take actions themselves.
Posted by: Kaiser | 04/26/2017 at 09:32 PM
I expect the reason they didn't work something with credit cards is that it would break the card companies terms which would be even more fatal. It is possible that they don't even see the card details or are required not to record them in certain forms. I do have a little bit of sympathy for Uber, it is going to be difficult for any company running this type of business if once they detect a phone being used in a fraudulent way there is nothing they can do because the phone is simply rebirthed.On the other hand anyone who has done software development on Apple computers knows that their rules are not allowed to be broken whether or not there is a good reason.
Posted by: Ken | 04/27/2017 at 08:13 AM
Ken: Given that most second users of an iphone will be legitimate people and not scammers, how does this fraud detection method work?
Posted by: Kaiser | 04/27/2017 at 10:26 AM
Kaiser: I don't think the aim of Uber was to block every rebirth iPhone. The method would be simple. Extract the phone identifier using a method that wasn't officially allowed and associate that with the customer details. Then if the customer is blocked also block their phone which prevents them repeating the scam at least with that phone. It tends to make it poorer value for the driver to cheat. The problem is that Apple check for this type of unofficial access so Uber turned off the checks whenever the IP address was within the blocks allocated to Apple. Apple have realised that this might happen and will have just rerouted blocks of IP addresses from elsewhere.
Posted by: Ken | 05/12/2017 at 07:32 AM