In response to my post about the challenges of measuring digital marketing, Dean Eckles (Facebook) sent me a paper by him and his colleagues. The paper is titled "Social Influence in Social Advertising: Evidence from Field Experiments" by Bakshy, Eckles, et. al. (ACM 2012) and it's an impressive piece of work. In this post, I summarize the research for those who don't want to read an academic paper; and then present some of the feedback I provided to Dean, and Dean's response.
The fact that Bakshy, et. al. ran carefully designed randomized experiments is noteworthy in a field where statistical testing is much chattered but poorly practiced. The paper covered two sets of experiments. Facebook runs "sponsored stories" in the Newsfeed: these are advertising units in which a user is enticed to "like" a brand by being told the names of friends who have already liked the item.
Experiment 1 varies the number of friends shown in the ad, and concluded that listing more friends may increase the probability of the user clicking on the ad.
Experiment 2 compares the use of friends ("John Smith likes this") and the use of aggregate statistics ("14,892 people like this") in the ad copy, concluding that the inclusion of friends yields superior performance.
Experiment 3 (really just an interpretation of Experiment 2) considers the effect of stronger ties versus weaker ties on the chance of clicking. A stronger tie is someone with whom you exchange a relatively high proportion of communications. Not surprisingly, stronger ties are better influencers.
I sent Dean some detailed comments, from which I selected a few to discuss here.
While these experiments are useful, they also are limited in their interpretation. By design, each of these experiments measures whether one can improve response rates by varying the copy on the ad unit. It does not measure how much true benefit these sponsored stories are creating above and beyond other marketing activities.
There are two obstacles to using these experiments to conclude that "sponsered stories produce incremental value for advertisers" (Admittedly, they don't make this general claim.) The first is that neither experiment has a "holdout" cell that did not receive any sponsored story ads. The "holdout" is a great way of measuring the baseline response, or the counterfactual (what if the advertiser did not buy these ads).
The second obstacle is the choice of response. The metric used is clicks on the Facebook like button. This metric is so specific to Facebook that the only way to affect that metric is by marketing on Facebook. It would be great to see a more inclusive metric, like visits to the advertiser's home page or purchases, both of which can be affected through multiple marketing vehicles.
Dean generally agrees with these points. He said:
Despite the experimental setting, interpreting results is still tricky. The simple conclusion that can be drawn from Experiment 2, for example, is that social cues are more powerful than statistics at influencing "like" behavior. We don't know for example that "liking" something is correlated with purchasing something.
Another possiblity is "homophily" which the authors explained in the paper. Given that a social network is not a random group of people, one would expect that "birds of a feather flock together". So whether or not ad units are shown, it's quite possible that people within the same social network would like similar things. The fact that one member has indicated "like" on an item is predictive of another member of the same network also "liking" the item. And this latter action may not have been caused by advertising.
The paper is distinguished by the care taken to describe the many details of the experiment. As I pointed out in my Significance article, real-world online experiments are very complex and difficult to execute well. Let me describe several complications that the Facebook team has to navigate through.
Something like Experiment 1 can only be implemented on the subset of Facebook users who have one or more friends who have liked that brand. If someone has two friends who have previously liked the brand, and the experiment calls for showing one friend on the ad unit, then there is a question of which of the two is selected. As Experiment 3 shows, friends are not created equal; tie strength varies. Thus, the choice of who appears on the ad matters.
Facebook also runs "social marketing," which includes targeting ads based on characteristics of the users. This is a separate activity from the design of the ad copy of sponsored stories. A researcher would need to make sure that any such optimization does not introduce bias to the experiment.
Before ending, I'll address the trickiest but also most fundamental design issue... the unit of sampling. An uninformed view would think this is simple, just pick random users. In practice, this is not feasible. There are at any one time a variety of advertisers who are buying ad space on Facebook so you can't keep the ad units constant. In fact, many advertising campaigns are short-lived so you won't be able to collect enough samples if you stick to one or a few campaigns. So, the research is based on the (user, ad unit) pair as a sampling unit.
This immediately presents problems of interpretation. For any given user, the ad units are not randomized -- in fact, the social marketing algorithm I mentioned above is likely to bias which ad units get served. Besides, the same user can appear with multiple ad units. This means that one must be careful trying to interpret the study at the user level.
What I like most about the paper is that the researchers have thought through these issues. Whether or not you agree with their design decisions, they deliberated on many of these complications. I'm looking forward to reading other studies by this research group.