The hazard of casual analysis of hazards
Apr 04, 2011
Martin's quick analysis of the Japanese earthquake relative to other earthquakes produced a number of intriguing charts, including the following map which illustrates the "ring of fire" (Wiki link):
Imagine that one knows nothing about tectonic theory and geography. Such a map would be quite illuminating.
***
The comments section contains an exchange between two Martins about the hazard of non-experts plotting and then commenting on data they know little about. In the past, this conversation would take place behind closed doors as the statistician works with the geologist, each learning from the other. In our modern age, it is common for both sides to take up arms in public with colorful language.
The lesson for us spectators is the importance of understanding how data is collected, how data is defined, how data is processed, etc.
***
The plots Martin made do give me a chance to talk about a few interesting statistical issues relating to plotting data.
First, this bar chart of the number of earthquakes over time seems to indicate that earthquake activity has been increasing over time:
The chart doesn't have a vertical scale so it is hard to judge whether the growth is meaningful or not. The US Geological Service doesn't seem to think this is a remarkable event, as the other Martin pointed out. One explanation is the increasing sensitivity of measuring equipment, which leads to more small-magnitude earthquakes being recorded over time.
Whether or not this hypothesis is true in the context of earthquakes, this is a very important phenomenon that occurs often. One of the mysteries in the epidemiology of autism is the recent rise in the number of cases but this is made complicated by the much higher probability of diagnosis in recent times. Similarly, as the stigma against reporting rapes, harassment and other crimes dissipate, more such criminal reports will be filed, and how does one distinguish between higher reporting and higher incidence, both of which lead to higher counts?
This same phenomenon happened during the Toyota brake scare last year. It appeared as if the problem was getting worse while the scandal brewed. But as the awareness of the potential risk increases, so too will the probability of someone reporting an issue.
One other feature of the bar chart worth noting is the faux plunge in the number of earthquakes in 2011. My recommendation is either to omit the incomplete year, or to forecast the end-of-year count (labelling it clearly as a forecast).
Finally, for data relating to rare events like earthquakes, one should try to take as long a view as possible. Twenty years is good but a hundred years would be better.
***
Next, Martin created this histogram that plots the number of earthquakes of different magnitudes over the last two decades or so.
As Martin noted, Richter scale is a log scale, meaning a 1-unit increase on the horizontal direction is really a 10 fold increase.
Martin's point is to show how strong the earthquake in Japan was compared to historical earthquakes. This is a useful and interesting question to ask, and his choice of graphics cannot be faulted.
What the other Martin was complaining about is the third corner of our Trifecta checkup: whether the right data has been used. Martin used what he was able to find. The histogram in fact looked quite nice, was rather symmetric, and suggested a tractable mathematical model. But if we proceed along that path, we would have chosen the wrong model.
The trap is a missing data problem that will not be evident to non-experts. As discussed above, small-magnitude quakes are not being tracked. Thus, the left side of the histogram severely under-represents the true frequency of such small tremors.
There is a Guthenberg-Richter law relating earthquake magnitude and frequency. It's a power law. Roughly speaking, for each 1 unit decrease in Richter magnitude, earthquakes of that magnitude occur about one-tenth as frequently. This law fits the actual data really well when Richter magnitude is high but at lower magnitudes, geologists believe that our records are severely under-counted.
In the chart shown on the right, which comes from this paper (PDF link) about predicting earthquakes, the dots sit on the line at higher magnitudes but the observed frequencies fall markedly below the line (projection) for magnitude lower than 2. It suggests that the bars on the left side of the histogram above should have been much taller than we are seeing. For a proper power law, the bars on the left should be the tallest of all!
Hi Kaiser,
my comment should be called YANE, or Yet Another Non-Expert. We are clearly lacking the insight of a geological expert here, who can tell us the criteria which was used to "censor" smaller earthquakes - we just don't know.
Assuming that the mechanism is the same for the last 40 years for which we have data, we still can derive a whole lot of information from the sample, which - you are right - is "what I was able to find".
Under this assumption, I think my post should be free of statistical fallacies.
All in all, I am most disappointed in how poorly the data was described by the USGS, although they make it easily accessible to the public.
IF THERE IS ANY GEOLOGIST EXPERT LISTENING, PLEASE ENLIGHTEN US!
Martin
Posted by: Martin | Apr 04, 2011 at 11:29 AM
I'm wondering why Great Britain shows up so brightly in the map, even though it's well away from any plate boundaries.
Posted by: Tom West | Apr 04, 2011 at 12:40 PM
The stacked bar chart of the number of earthquakes over time stacks the most severe earthquakes on top, and makes adjacent severities so subtly different in colour that it's hard to see the boundaries.
I'd replace it with a stacked line chart with the highest severities on the bottom of the stack, so that any tendency, as time goes on, for less severe earthquakes to be recorded more conscientiously will be more obvious.
Posted by: derek | Apr 04, 2011 at 01:09 PM
Derek,
I agree, it is much better to look at the boxplot: http://www.theusRus.de/Blog-files/EQ-Boxplot.png.
The coloring was only used to get a linking between all three plots in a static representation.
Posted by: Martin | Apr 04, 2011 at 01:18 PM
Just a note. A 1-unit increase in Richter scale corresponds to a 31.6 (not 10) multiplicative factor.
http://en.wikipedia.org/wiki/Richter_magnitude_scale
Posted by: Antonio RInaldi | Apr 04, 2011 at 04:25 PM