The sad tally 2: the data
Nov 05, 2005
The last post contained a little riddle: which of the 9 graphs (if any) is different from the other 8? I will disclose the answer here soon so to avoid the spoiler, read the previous post first.
Here is the data gleaned from the graphic in the SF Chronicle (any error is purely mine):
Location,Frequency
101,31
99,29
97,35
95,35
93,25
91,24
89,12
87,25
85,32
83,12
81,15
79,12
77,22
75,17
73,17
71,40
69,61
67,28
65,13
63,12
61,19
59,11
57,11
55,12
53,13
51,11
49,4
47,16
45,17
43,18
There are two ways to solve the riddle.
First, one can think of it as a pattern matching problem: which of 9 graphs contain a pattern that matches that in the map? This really isn't the point I was trying to make but I realize now that the question could have been interpreted this way. In this line of reasoning, one needs to identify the features that distinguish the pattern in the map. The most standout feature, for me, is the spike at location 69/70. Only the last two graphs contain spikes near this location and more careful inspection will reveal the bottom right chart to have the real data.
Alternately, one can ignore the context (of the sad tally) and treat this as a problem of comparing probability distributions. This was my original intent. Is there an "odd man out" among the 9 distributions?
We now know that the bottom right chart contains the real data and the other 8 charts plot random data. If the real data is the "odd man out", which features of the distribution allow us to differentiate it from the other graphs? I'll discuss my findings on some features in the next post.
Cumulative distribution would be one.
Posted by: Robert | Nov 05, 2005 at 11:54 AM