Another sunset photo compilation? Not quite.
This chart acts and smells like the sunset chart, being generated by many unknowing collaborators, this time, visitors to the content aggregation site, Digg. For those unfamiliar, web browsers can "digg" any web page they find interesting (by clicking on an image), which causes a link to be generated at Digg's web-site. We can use the number of Diggs to judge the value or popularity of a web page.
In effect, Digg is a gigantic save folder for the masses. What happens when we have huge amounts of data? We have to work really hard to dig out the useful information. This chart goes quite a long way to answer one specific question.
Digg users are plotted horizontally and the stories they Digged are plotted vertically. The bright white vertical strip represents suspicious activity; some user digged a large number of stories within the time window of the chart, most likely a bot trying to usurp the mass rating system.
Flickr and Digg are two of the more prominent stories of the so-called "Web 2.0", or mass collaboration on the Web. Between my last post and this post, I have kind of lost enthusiasm for this type of charts, at least from a statistical perspective. There is no real collaboration: the photographer who contributed sunset No. 103 does not know the one who uploaded No. 31, for example. Using this logic, every survey or census ever conducted qualifies as mass collaboration, just because there are many participants providing data.
What's worse, a typical survey brings together results from a random sample. These charts all have highly biased samples, and I haven't seen any discussion yet of this issue. They cannot be interpreted without understanding who participated.
Reference: "How Digg Combats Cheater", Technology Review, Jan 24, 2007.





Recent Comments