Thanks to @jeanniecrowley for pointing me to Rocky Agrawal's wonderful piece on how Google - and by extension, many hyped-up tech companies - abuses statistics to deceive the public. (link here).

The post is worth reading in full. The highlight for me is this bit:

In July, [Google CEO] Page claimed that the service had 10 million users who shared 1 billion items a day. That sounds incredibly impressive. But let’s do the math. That would mean that the *average *user was sharing 100 items a day...

So how did we get to that number? Well, it turns out Google was counting every *potential* recipient of that message. A single message from Scoble today would count 240,000 times toward that number. That’s preposterous.

There are many cases similar to this one where one can easily spot mischief just by taking a skeptical position. It's a simple division to get to an average of 100 items per user per day. If one knows a little bit of statistical thinking (say, stuff from Chapter 3 of **Numbers Rule Your World**), one quickly realizes that not all 10 million users can be like the "average user"... in fact, with so many users, there would be lots of inactive users, who would share zero items per day. For every non-sharer, there must be some active user who shares 200 items per day to keep the average at 100.

What Agrawal is arguing is that even the maximum sharer ("Scoble") probably didn't share 100 items a day, and the maximum can't be smaller than the average.

***

I like to call these "true lies". Under certain assumptions, exclusions and definitions, one can certainly justify that these statistics are "true" but the effect on consumers of such statistics is to mislead. Frequently, one needs to examine a set of statistics side-by-side to fully understand the data.

Most importantly, one should start with why we are computing the statistic.Take the Facebook "Like" button, which is touted in a lot of places as a measure of marketing success. I just went over to the McDonald's U.S. Facebook page. It shows about 14 million Likes. What does that statistic mean? Since Facebook reports about 150 million U.S. users, does this mean only less than 10% favorability? Who are these people who "like" McDonald's? Does this reflect the success of social-media marketing in gaining more fans? Or are these 14 million hard-core fans who have always loved McDonald's and now they take the opportunity to advertise their love to the world? Do we expect the act of "Liking" to generate additional revenues for McDonald's? If so, how so? What a surprise - the number of "Likes" is insufficient to answer any of these questions.

"What a surprise - the number of 'Likes' is insufficient to answer any of these questions."

Indeed. I think this is key. Society is getting lazy by measuring "popularity" by using easily available social media statistics such as the number of 'likes,' the number of followers on Twitter, or the number of hits for a Google search term. But statistics need to be evaluated for bias, variance, and those "lies and damned lies" that they potentially contain.

I've written a blog post on why it's a bad idea to estimate popularity based on the number of hits in a Google search:

http://blogs.sas.com/content/iml/2011/08/19/estimating-popularity-based-on-google-searches-why-its-a-bad-idea/

Posted by: Rick Wicklin | 01/27/2012 at 10:44 AM