Harvard Business Review devotes a long article to customer data privacy in the May issue (link). The article raises important issues, such as the low degree of knowledge about what data are being collected and traded, the value people place on their data privacy, and so on. In a separate post, I will discuss why I don't think the recommendations issued by the authors will resolve the issues they raised. In this post, I focus my comments on an instance of "story time", some questions about the underlying survey, and thoughts about the endowment effect.
Much of the power of this article come from its reliance on survey data. The main survey used here is one conducted in 2014 by frog, the "global product strategy and design agency" that employs the authors. They "surveyed 900 people in five countries -- the United States, the United Kingdom, Germany, China, and India -- whose demographic mix represented the general online population". (At other points in the article, the authors reference different surveys although no other survey was explicitly described other than this one.)
Story time is the moment in a report on data analysis when the author deftly moves from reporting a finding of data to the telling of stories based on assumptions that do not come from the data. Some degree of story-telling is required in any data analysis so readers must be alert to when "story time" begins. Conclusions based on data carry different weight from stories based on assumptions. In the HBR article, story time is called below the large graphic titled "Putting a Price on Data".
The graphic presented the authors' computation of how much people in the five nations value their privacy. They remarked that the valuations have very high variance. Then they said:
We don't believe this spectrum represents a "maturity model," in which attitudes in a country predictably shift in a given direction over time (say, from less privacy conscious to more). Rather, our findings reflect fundamental dissimilarities among cultures. The cultures of India and China, for example, are considered more hierarchical and collectivist, while Germany, the United States and the United Kingdom are more individualistic, which may account for their citizens' stronger feelings about personal information.
Their theory that there are cultural causes for differential valuation may or may not be right. The maturity model may or may not be right. Their survey data do not suggest that there is a cultural basis for the observed gap. This is classic "story time."
I wonder if the HBR editors reviewed the full survey results. As a statistician, I think the authors did not disclose enough details about how their survey was conducted. There are lots of known unknowns: we don't know the margins of error on anything, we don't know the statistical significance on anything, we don't know whether the survey was online or not, we don't know how most of the questions were phrased, and we don't know how respondents were selected.
What we do know about the survey raises questions. Nine hundred respondents spread out over five countries is a tiny poll. Gallup surveys 1,000 people in the U.S. alone. If the 900 were spread evenly across the five countries, their survey has fewer than 200 respondents per country. A rough calculation gives a margin of error of at least plus/minus 7 percent. If the sample is proportional to population size, then the margin of error for a smaller country like the U.K. will be even wider.
The authors also claim that their sample is representative of the "demographic mix" of the "general online population." This is hard to believe since they have no one from South America, Africa, Middle East, Australia, etc.
The graphic referenced above, "Putting a Price on Data," supposedly gives a dollar amount for the value of different types of data. Here is the top of the chart to give you an idea.
The article said "To see how much consumers valued their data, we did conjoint analysis to determine what amount survey participants would be willing to pay to protect different types of information." Maybe my readers can help me understand how conjoint analysis is utilized for this problem.
A typical usage of conjoint is for pricing new products. The product is decomposed into attributes so for example, the Apple Watch may be thought of as a bundle of fashion, thickness, accuracy of reported time, etc. Different watch prototypes are created based on bundling different amounts of those attributes. Then people are asked how much they are willing to pay for different prototypes. The goal is to put a value on the composite product, not the individual attributes.
Also interesting is the possibility of an "endowment effect" in the analysis of the value of privacy. We'd really need to know the exact questions that the survey respondents were asked to be sure. It seems like people were asked how much they would pay to protect their data, i.e. to acquire privacy. In this setting, you don't have privacy and you have to buy it. A different way of assessing the same issue is to ask how much money would you accept to sell your data. That is, you own your privacy to start with. The behavioral psychologist Dan Kahneman and his associates pioneered research that shows the value obtained by those two methods are frequently wide apart!
In a classic paper (1990), Kahneman et. al. told one group of people that they have been gifted a mug, and asked them how much money they would accept in exchange for it (the median was about $7.) Another group of people were asked how much they were willing to pay to acquire a mug; the median was below $3.
Is this the reason why businesses keep telling the press we don't have privacy and we have to buy it? As opposed to we have privacy and we can sell it at the right price?
Despite my reservations, the HBR piece is well worth your time. It raises many issues about data collection that you should be paying attention to. Read the whole article here.