« Is it random? | Main | Digging it out »

Feb 01, 2007

Error spotting

My friend Augustine pointed me to this interesting graph showing the time of sunset over the course of a year.  (The original author's write-up is here.)

Flickr_sunset

Of course, one can produce a perfect chart by looking up meterological records.  The main interest in this graph is how it was constructed.  Each cell in the graph represents an hour of a day, with days running across and time running down. The cells that are not dark each contain a photograph of the sunset contributed to Flickr, the photo-sharing site.  So this is in effect a graph created through mass collaboration (about 35,000 photos).

The "white" band roughly indicates the sunset.  What intrigues me is the variability... what are the reasons for lighted cells appearing all over the graph?

Some ideas include:

  • Different time zones
  • Incorrect time setting by some photographers
  • Erroneous tagging of photos as "sunset"

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/452792/7725063

Listed below are links to weblogs that reference Error spotting:

Comments

What the photo shows is a number of things:

It proofs that the vast majority of the photos uploaded are originated from users located in the northern hemisphere. There is no significant difference between the time of sunsets on a given day, when you live on the same latitude. If the providers of photos were evenly distributed across the latitudes, the sunset 'band' would be straight, not curved.

Although I tried, I can't spot a sunrise pattern as well. I think this is a good proxy for assuming that most of the taging of the photos is actually ok. Think about it: why would you tag a photo sunset, if it wasn't actually a sunset?

I think it is more likely that a photo is incorrecty timed, rather than incorrectly tagged.

Jens

In the commments, the original flickr-composite creator ('jbum') suggests two reasons for the possible errors:

3) If date/time information is not present, Flickr makes the date_taken the same as the date_uploaded.

4) #3 happens a lot, which accounts for most of the noise.

5) Other noise may be due to photos tagged 'sunset' which weren't actually photographs of the sunset. e.g. "sunset grill"

People take a lot of photographs of things like sunsets on holiday, but they don't often change the time on their cameras.

This is a pretty amazing chart, in large part because of the way it was made. I'd add to Jens' comment the fact that the photographers represented here were working mostly at the mid latitudes in the northern hemisphere (i.e. relatively few photographers in the equatorial regions or significantly northern latitudes -- or as Jens mentioned, in the southern hemisphere). If we had many photographers in, say, the northern latitudes (like 50 or 60 degrees north) we'd also see the same curved sinusoidal shape but a more extreme version than this one.

Some part of the width of this must come from observations from slightly different latitudes, and some from differences in longitude within a time zone. A guess at how much error exists from these (and other) sources probably could be made by looking at the point on the curve at the two equinoxes (~ March 21, ~ Sep 21) when the width of overlapped latitude graphs should be almost zero -- essentially a point. I can (maybe) see a little narrowing there, but not much. But this really is an interesting plot.

Is it my imagination or is there a shadow band that might correspond to sunrise in the upper part of the chart? If so this raises the credibility of the mislabelling theory or perhaps a misassociation if the images are pulled out by a machine. If there is a label with sunset nearby in some sense then other pictures might be pulled out.

Also, where a person lives in relationship to the time zone boundaries will account for some of the noise.

am, I percieved the shadow band as well. It looks like it is 12 hours off the main band, which could be explained by a significant number of people having their cameras' clocks set 12 hours off--confusing the am and the pm.

Post a comment

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31