Breaking news from New York Times. Twitter restated its user counts for three years (link)
Twitter claimed that it mistakenly counted users of third-party apps as its own users, even though those app users did not do anything on their platform. It so happens that those third-party apps were customers of a division of Twitter (Digits) that has recently been sold to Google.
This story has some holes. The biggest problem, as with any of the other data scandals involving Facebook, Google and the like, is that no outside companies are auditing these non-financial metrics. However, non-financial metrics like user counts, activity counts and so on are fundamental to establishing the value of startup companies. Therefore, there is a strong incentive to game these metrics.
Here are some questions that the hypothetical auditor might have asked:
- Given that the user counts are of "active" users, and "active" users are typically defined by the number or presence of activities performed within the prior month, it is hard to miss a group of users whom Twitter now admits "did not reflect activity on the Twitter platform". Activities are logged when users do something, like send a tweet, press a button, etc. No activities means no log entries. It's as if someone invented something out of nothing. So is there a larger problem with how activities are being measured?
- Typically, if a company is issuing user counts, each user has an ID number assigned to him/her. Digits is described as a platform that uses Twitter to send authentication messages through SMS, presumably meaning that the users of Digits are not users of Twitter. If that is the case, it would seem like they would not have Twitter user ids. So is there a larger problem with how users are being counted?
- This error somehow affected only MAUs (monthly active users) but not DAUs (daily active users). Given that the formula for MAU surely links it to DAU, it's hard to understand how this error can overstate one but not the other. This is a strange outcome that ought to be explained in more detail.
- "Twitter said its data-retention policies made it unable to reconcile the figures for periods before last year’s fourth quarter." How can they confidently state that the error started three years ago but they apparently do not have any data older than one year? Further, does this mean that they have no backup for any figures published more than a year ago? Did Google buy the Digits unit from Twitter without inspecting user/activity data?
- We are told that the over-counting problem started in 2014 (three years ago). Well, Twitter just happened to have gone public at the end of 2013, with its valuation almost purely justified by user counts since it was not profitable. The timing is a bit suspicious. How did they determine when the problem started?
- In recent quarters, Twitter has been under tremendous pressure because subscriber growth has stalled. This restatement has the effect of resetting the bar lower.
- The inability to reconcile errors even one year old is of great concern. Is it really just a data retention policy or are there technical reasons why the data were not retained? Some of the newer "noSQL" databases may overwrite old data, rendering it impossible to reconcile anything with the past. I know because my teams have been bitten by this very issue in the past. Is this part of the problem?
The credibility of web and app companies is at stake. Almost all of the stories relating to mis-reporting is about over-reporting. The combination of a closed, unaudited system, and a business model justified by eyeballs rather than dollars, creates a powerful incentive to game the usage data. It's hard for outsiders to tell between an embarrassing mistake or systematic cheating.
Just like financial metrics, data metrics must also be audited.
P.S.
(1) When looking at the official Twitter announcement, it seems like they have been using an extremely lax definition of an "active user". Anyone who "authenticate or access" their website or apps on any given day is considered "active." This raises one red flag. With such a loose definition, I can't see why those Digits users should be excluded.
(2) In this document, they did not explain how to obtain monthly active users (MAUs) from daily active users (DAUs).
(3) It is actually quite easy to make mistakes because of the modern way in which data are collected, stored and managed. What has been reported is surely just the tip of the iceberg. Without auditing, there is no accountability.
Comments
You can follow this conversation by subscribing to the comment feed for this post.