With the U.K. report on Facebook, and the stern language within it, the train on regulating data sharing may finally reach the station this year. The FTC is also likely to impose a stiff fine on Facebook for violating a consent decree.
So let's learn more about this data sharing business. If you prefer a video, the gist of this post can be heard here.
***
First, let's talk about data flows and the "cloud". Data are stored in computers that are called servers. In the cloud computing model, these servers are owned - not by the companies that collect the data - but by large tech companies like Amazon, Google, Microsoft, etc. who are responsible for managing the servers. These servers are geographically dispersed and so when data enter the cloud, they get replicated and spread to many servers. The technical benefit of such replication is recoverability of the data (allowing the use of cheaper, less reliable computers) but now, the data become much harder to delete.
Data become more telling if one combines different datasets measuring different aspects of our lives. For example, an auto insurer may have data on past claims and that data help predict your future claims. But if the auto insurer is able to get data from say an automaker about your car, e.g. how fast you drive, where you drive, etc., that data combined with past claims improve the predictive power.
Thus, a data-sharing industry has been created. Companies make agreements to share data with one another. This becomes much easier in the "cloud" as those servers are already connected to one another. These agreements may include explicit payments but even if they don't, both sides must be benefiting commercially from the arrangement, or else they would not exist.
So when company A shares data with company B, the data flow from A servers to B servers. B may also use a cloud, which then means the data would be replicated yet again, and dispersed geographically onto yet another set of servers.
And company B may also share data with company C, etc., etc.
***
An inexplicable part of the consent decree between Facebook and the FTC is the requirement that Facebook monitor what happened to the data after they are shared with third parties. I just can't figure out how that is possible. It isn't even possible within Facebook: if a user demands that his/her be deleted, it will be very hard to ensure that all copies of the data are deleted from every server, including data that might have landed in an analyst's computer. In fact, most analysts probably don't know how many replicates of data elements are being created during the analysis, and where those replicates exist!
***
The next question of general interest is all the different ways in which tech companies collect people's data without people realizing what's happening. In the video, I look at contact lists, personality tests, 2-factor authentication schemes, IOT devices, etc. in their roles as data collectors.
This is the reason why the video is called "Did you betray your friend today?"
Comments