« Error spotting | Main | Horrid stuff »


Tom Carden

Long comment, sorry :)

It's a shame you're tired of this kind of chart just because you don't think the sites are collaborative (and who said they were?). We (Stamen) have more of these to come, including some that are even more reminiscent of the Flickr sunset chart in your last post.

With regards to the data sample, I don't think it's right to use a loaded word like biased. As much as anything, these kinds of charts serve as thinking aids: part of understanding the bigger picture, not the picture itself.

I also think you're off the mark a bit about the collaboration aspect of things. Flickr isn't a site to collaborate to find what time sunset is, it's a site for sharing photos and discussing them. Digg isn't a collaborating to find out what patterns of activity there are on the web, it's a site to find interesting webpages and share/discuss them.

If you're looking for "web 2.0" collaboration in these sites, it's easy to find though. Some of the groups on Flickr collaborate to collect and identify photos of plants for example. I would argue that the Digg front page is a collaborative effort to find the most interesting pages on the web, right now, with a heavy emphasis on novelty.

All said and done, you're quite right that there's not enough discussion of the sample choice with this kind of chart. Most often it's just the most recently available data, not necessarily a statistically useful slice. I recently talked to a math expert who perceives a small renaissance in university statistics departments, largely thanks to data mining and the internet. Here's hoping it rubs off on those of us involved with sorting the meaning from the mess.


I have to admit in my haste, I sounded as if I have lost interest in this area. Far from it! What I really meant was that the initial excitement has been taken over by a dose of reality...

Internet data represent lots of challenges for statisticians. Lurking behind the whole discussion of what caused the noise in the sunset chart is the issue of selection bias as well. Is there any control over how many of the sunset photographs are taken from different time zones, etc.? To be more exact, it's the classical statistical problem of self-selection.


I find the "charts" (to be simplistic at best) and plotting of internet community usage and submissions activity, such as on sites like digg - completely and entirely fascinating. I can definately see how web stats can drive a "small renaissance" in university statistics departments.

thanks for the post

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Good Books

Keep in Touch

follow me on Twitter