John Shonder, a reader, alerted me to the following unusual chart which identifies the precise locations where people jumped from the Golden Gate Bridge:
He asks: is the choice of location "random"?
This is a very rich question and different statisticians will take different approaches. In this post, I take a purely visual, non-rigorous look at the question; and if I have time (and if other readers haven't commented already), I may discuss more rigorous methods in the future.
First, I restrict my attention to light poles 43 through 112, i.e. the bridge segment that lies above the water. Also, I only consider the north-south locations: in other words, 43 and 44 are counted as one, so are 111 and 112. Otherwise, the distribution is clearly biased (towards the water and the east side).
When we say "random", we usually mean there is equal chance that someone will jump from location 43/44 or from location 111/112 or any location in between. There are 35 locations and 755 documented suicides, averaging to 21.6 suicides per location. But 21.6 is the average which is not observable; assuming that the choice of location is random, we still would not find exactly 21-22 suicides at each location. (Similarly, even if there is a 50/50 chance of getting a head when we flip a coin, in any given run of 100 flips, it is very unlikely that we will see exactly 50 heads.)
So, at some locations we will see more than 21.6 deaths; at others, fewer. The question becomes whether the fluctuations are too much to refute the notion that the choice of location is random.
In the following set of graphs, I ran some simulations. Eight of the nine graphs represent scenarios under which I sent 755 people to the bridge and randomly assign them one of the 35 locations to jump from (okay, this is a thought experiment only; please don't do this at home). The x-axis represent locations; the y-axis represent the number of suicides at that location -- but on a standardized scale.
The standardized scale allows us to compare across graphs. The zero line represents the mean number of suicides per location. The number of suicides at most locations is within one standard deviation away from this mean (i.e. between -1 and 1 on the y-axis). In some extreme cases, the number of suicides is more than 3 standard deviations larger than the mean (i.e. greater than 3).
Back to randomness: well, one of the 9 graphs is the real data from the map above. If you can guess which of the 9 is real, then the real data is probably not random. If you can't, then the real data may be random!
I will publish the answer tomorrow. In the meantime, feel free to take a guess and/or comment on what other approach you'd take. One take-away from this exercise is that it's very hard to tell non-random from random unless it is very obvious.