Vice discovered that one can spend $160 to buy information about people who have visited abortion clinics. Let's unpack this situation.
Is the data accurate?
The reporter also learned that among the results returned on a search of "Planned Parenthood", not all such clinics offer abortion services. Someone who buys this dataset thinks everyone counted on the list have gone to abortion clinics but in reality, only a portion of those have.
This illustrates one of the problems with today's data marketplace. Incorrectly matched data frequently get traded. These errors are unlikely to be corrected because correcting them usually reduces the size of the search results, which reduces the potential revenues that can be earned on them. Besides, the "victims" of inaccurate data have no clue that such data have been collected, or that such data have been assigned to them, or that such data have been used against their interests.
I don't think the vendor allows, or should allow, a more targeted search such as "abortion clinics" or "Planned Parenthood abortion clinics". So the inaccuracy may also be a deliberate business decision.
Abortion is a highly sensitive subject, and we may want to switch to something less emotional when thinking about the underlying issues. Facebook banned searches based on targeting college students. But if I were marketing to college students, I can buy location data capturing anyone who visits the campus bookstore, or the pizzerias, etc. I'm almost surely going to get a good coverage of the student population, but I will also get "noise": professors, parents of students, visitors, prospective students, etc. In marketing, the cost of missing a potential customer is typically much higher than the cost of spamming.
Who wants the data?
In the case of abortion clinics, the obvious, alarming use case are vigilantes in states empowered by recent laws to hunt down women who have had abortions.
But there are always two sides to a story. Abortion clinics themselves might desire this data for market research purposes, and pro-abortion researchers might like to analyze such data.
The entire surveillance capitalism industry missed a golden opportunity during the Covid-19 pandemic to shine a light on the positive aspects of data collection. They could have offered up location-based data for the common good. Such data would have been very useful to characterize the contact rates at local levels, and such public health use cases require only aggregated data, unlike marketing or law enforcement use cases, for which individuals must be identified.
The particular dataset acquired by Vice article is at the census block level, which means it shows, for example, which census block people who visited a specific Planned Parenthood location live. This is aggregated data but not really. The census block is quite precise, especially in less dense areas. There are over 10 million blocks in the U.S. In a city, it resembles a city block. When we filter by time and space, we may end up with one or only a handful of individuals in a block. Then, most likely, we just need one other data element to fully recover people's identity.
Don't believe this? Just pick five names from your contact list. Can you find one thing that is unique for each of these five people? Perhaps only one likes sports, perhaps only one is a Buddhist, perhaps only one speaks a foreign language, perhaps only one is over 50 years old, etc.
Why is the data so cheap?
Big Data is not scarce data. Any company collecting your location data can sell such datasets. Who's collecting your location data? If you have a iPhone, just go to Privacy/Location Services and see which apps are "using" your location data. That would be the starting point.
Apps that require location, like maps, navigation, share rides, and weather, obviously have your location data. But any app you install can collect your location data. Some years ago, Yelp - the app about restaurants - was discovered to be retrieving your location continuously throughout the night (a good way to learn where you live.) Your location at any time is one value but that same value is collected by lots of companies who then resell it to lots of other companies.
Then read the Data notice of those apps, and you'll learn that there is a buzzing marketplace in which lots of third parties piggyback on these apps to get access to your data. They repackage the data and sell them.
Comments
You can follow this conversation by subscribing to the comment feed for this post.