« Error spotting | Main | Horrid stuff »


Tom Carden

Long comment, sorry :)

It's a shame you're tired of this kind of chart just because you don't think the sites are collaborative (and who said they were?). We (Stamen) have more of these to come, including some that are even more reminiscent of the Flickr sunset chart in your last post.

With regards to the data sample, I don't think it's right to use a loaded word like biased. As much as anything, these kinds of charts serve as thinking aids: part of understanding the bigger picture, not the picture itself.

I also think you're off the mark a bit about the collaboration aspect of things. Flickr isn't a site to collaborate to find what time sunset is, it's a site for sharing photos and discussing them. Digg isn't a collaborating to find out what patterns of activity there are on the web, it's a site to find interesting webpages and share/discuss them.

If you're looking for "web 2.0" collaboration in these sites, it's easy to find though. Some of the groups on Flickr collaborate to collect and identify photos of plants for example. I would argue that the Digg front page is a collaborative effort to find the most interesting pages on the web, right now, with a heavy emphasis on novelty.

All said and done, you're quite right that there's not enough discussion of the sample choice with this kind of chart. Most often it's just the most recently available data, not necessarily a statistically useful slice. I recently talked to a math expert who perceives a small renaissance in university statistics departments, largely thanks to data mining and the internet. Here's hoping it rubs off on those of us involved with sorting the meaning from the mess.


I have to admit in my haste, I sounded as if I have lost interest in this area. Far from it! What I really meant was that the initial excitement has been taken over by a dose of reality...

Internet data represent lots of challenges for statisticians. Lurking behind the whole discussion of what caused the noise in the sunset chart is the issue of selection bias as well. Is there any control over how many of the sunset photographs are taken from different time zones, etc.? To be more exact, it's the classical statistical problem of self-selection.


I find the "charts" (to be simplistic at best) and plotting of internet community usage and submissions activity, such as on sites like digg - completely and entirely fascinating. I can definately see how web stats can drive a "small renaissance" in university statistics departments.

thanks for the post

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter