Long-time reader Antonio R. pointed me to an article about Firefox, which is my preferred browser (link). Apparently, in a recent update, users were quietly opted in to a new tracking feature known as "privacy preserving attribution" (PPA). Since Firefox's brand is about anti-tracking, this discovery has unsettled many users.
There are two parts to the grievance. Aside from the PPA technology, a second issue is the use of opt-out, rather than opt-in. The opt-out tactic has long been adopted by tech companies to force users to do as the companies want; and certain segments of users, such as Firefox users, are unhappy about it.
***
Let's talk about cookies first. It was originally a simple technology used by websites to remember user details so that the Web experience can be customized. For example, if a website pre-populates your user name when you attempt to log in, it's because it has stored your name from a previous log-in in the cookie (which is just a document).
Later, the Web was commercialized, and all the free things we get on the Web are funded by advertisers. The advertisers then seize on the cookie as a way to target people, since the cookies contain information about individual browsers. (Eventually, the websites that charge for things also adopt the same tracking technologies.)
For example, an advertiser could store the number of ads they have shown to me in my cookie. Each time an ad is sent to my browser, a counter in the cookie is incremented. When the count exceeds a maximum frequency, they stop sending me ads.
This use case is beneficial to the user, as it encodes the rational concept of diminishing returns to ad impressions. I don't need to see it yet another time. But most of the time, the advertiser is using the cookies in more offensive ways. One example is tracking past behavior. The cookie can store which ad you have clicked on, so that the next time they see you, they send you another ad that they think will make you click and buy.
Recently, there has been some momentum to get rid of cookies. The idea is to allow advertisers access only to aggregated data. This requires some way to define groups of users, and some way of classifying each user into one (or perhaps more) groups. Then, instead of the advertiser reading cookies to obtain individual data, it would read group-level data collected by the browser.
Over time, the browser has become another store of user data. Google, which owns the Chrome browser, has been doing it forever. So has Microsoft through its Bing browser. Recall, above, the original use case of cookies, that of storing user account names so that the users do not have to re-enter each time they log on. Over time, the browsers have taken over this feature - this is why the browser asks us for permission to remember the user name (or passwords, etc.).
If the user allows the browser to store user names, then any website the user logs on to can grab your user name from the browser's storage, instead of the cookie.
And yes, this makes the browser developer a holder of our personal data - which is one of the main reasons why data-famished Google and Microsoft started making browsers.
***
Now let's get back to the nice idea of not sending individual data to advertisers but sending aggregated data. Advertisers of course hate this because they believe deeply that the more granular the data, the better is their targeting. This story has made little sense from a statistical perspective from the start, but it's one of those fallacies that persist. It's an example of Kahneman's Law of Small Numbers.
Advertisers, generally speaking, may want to sell you the thing you're shown interest in, or some other things you're not aware of, but could be interested in. The latter is called cross-sell, and usually, cross-sell is supported by data from other customers like you who have done something you haven't yet.
For cross-sell, the natural thing to do is to place you within a group of customers who have similar behavior. This is the aggregated data. For regular sales, if they are chasing after you because you have clicked on two or more big-screen TV ads in the last week, it should make no difference whether they get the information about you specifically, or about you as part of a group of customers who all have clicked on two or more TV ads in the last week.
I can't speak to whether these advertisers have other use cases which require individual data but the marketing use case should not require it.
Here's the catch, and the real source of the controversy.
***
Someone has to do the aggregation. Aggregation means taking individual data and classifying people into groups. It means defining what those groups are. Someone has to do it.
Whoever does it must have collected individual-level data. So who do you trust to do this aggregation? At this time, I'd still trust Firefox, more than any of the following: Google, Microsoft, advertisers, "third-party" data aggregators appointed by advertisers whose only customers are advertisers, etc.
In my view, this backlash against Firefox is part of a fight to control our personal data. Firefox probably annoyed many of those big players because they may lose control of some of the data.
***
Another possibility is that Firefox attracts all of the people who are most sensitive to invasion of privacy, and they are looking at the issue in an absolute, rather than relative, sense. Thus, in the original article, they said:
In this sense, Mozilla claims that the development of “privacy preserving attribution” improves user privacy by allowing ad performance to be measured without individual websites collecting personal data. In reality, part of the tracking is now done directly in Firefox. While this may be less invasive than unlimited tracking, which is still the norm in the US, it still interferes with user rights under the EU’s GDPR.
By this worldview, they would throw out the baby with the bath water. Browsers would no longer remember anything from the past, so every time we log on to a website, we would have to re-enter all information. Also, advertisers would know nothing about users between browser sessions, and thus send even less relevant ads.
Given that advertisers pretty much pay for most of the free stuff we get online, it's hard to imagine that a company like Firefox can survive if it takes an extreme view on advertisers.
It's actually a miracle that Firefox is alive. The big advertisers can simply work with the advertiser-friendly browsers like Microsoft Bing and Google Chrome. This has already led Google to dump its previous effort to move away from cookies. The current controversy reflects one of the reasons why Google failed: if Google runs the new aggregation technology, it continues to hold all of the individual data; it consolidates its stranglehold on the data further since other parties like advertisers will no longer be able to get individual-level data!
***
Let's also parse this bit:
In reality, this tracking option doesn’t replace cookies either, but is simply an alternative - additional - way for websites to target advertising.
One part of this is surely true... that the PPA technology cannot replace cookies completely. I hope that it could replace cookies as it relates to advertisers, though. PPA can't replace cookies because it would fail at the original use cases, such as remembering user account names, for which the websites need individual-level data.
***
What PPA actually is (link) is something highly limited. It certainly can't replace cookies completely. It addresses a single use case for advertisers, i.e. "attributing" user purchases to ad impressions.
It's doubling down on a highly dubious analytical framework, which I have critiqued in the past (see this post).
Advertisers place ads on third-party websites such as Facebook and New York Times, which drives traffic to their own websites, from which visitors may make purchases. The purchases (outcomes) are wholly visible to these advertisers since the transactions occur on their own websites. However, their ads show up on third-party sites for which these advertisers don't have visibility. To work around this, they store information about which ads have been shown to a user in a cookie. So when the user makes a purchase, the advertiser can look inside the cookie to determine if the purchaser has previously been shown an ad.
Attribution, to say it bluntly, is the act of turning temporal correlation into causation. A typically attribution rule is if there exists an ad "view" within X days of a subsequent purchase, then the first such ad is regarded as the "source" of the purchase. More "advanced" attribution would dole out credit to a sequence of prior ad "views", rather than the closest one.
I put "view" in quotes because in this industry, a "view" is the same as an "impression" which only means that some server attempted to deliver an ad to a user. It does not imply the user has "viewed" the ad. It does not even imply that the ad has fully loaded.
PPA moves these analytically-challenged computations to the browsers (actually, to third-party "DAP partners"). PPA groups users by which ads they have been shown, and returns the aggregated count of how many users made purchases (or other desired actions) subsequent to those ad "views".
It solves the simplest problem marketers care about but they want more information, such as the demographics of those who made purchases, the time lag of action, etc. If the advertiser is more serious about causal inference, they recognize that such data must be combined with other data sources (external to browsers) to yield a better attribution model - customers could have been influenced by other marketing channels (emails, TV, etc.) instead of online ads!
So I agree that PPA can't replace cookies right now. With future developments, it might come close. But inevitably, the data collection duties would have shifted from advertisers to browsers. That's why there is a lot of resistance to change.
***
Now back to the other issue of opt-out. I think the reason why Firefox decided to implement PPA by default while offering opt-out for users is that if most users are not part of PPA, then it is dead on arrival.
I hate opt-out. In this case, if we take the perspective of whether the world with PPA is better than the world without PPA, then there is an argument that it might be. If we take a purist perspective of whether PPA solves the unwanted user tracking problem, then PPA is definitely not the answer.
Thank you Kaiser for your clear analysis and brilliant explanation, as usual.
P.S.: MS browser is Edge, Bing is its search engine.
Posted by: Antonio | 10/01/2024 at 03:12 PM