Martin's quick analysis of the Japanese earthquake relative to other earthquakes produced a number of intriguing charts, including the following map which illustrates the "ring of fire" (Wiki link):

Imagine that one knows nothing about tectonic theory and geography. Such a map would be quite illuminating.

***

The comments section contains an exchange between two Martins about the hazard of non-experts plotting and then commenting on data they know little about. In the past, this conversation would take place behind closed doors as the statistician works with the geologist, each learning from the other. In our modern age, it is common for both sides to take up arms in public with colorful language.

The lesson for us spectators is the importance of understanding how data is collected, how data is defined, how data is processed, etc.

***

The plots Martin made do give me a chance to talk about a few interesting statistical issues relating to plotting data.

First, this bar chart of the number of earthquakes over time seems to indicate that earthquake activity has been increasing over time:

The chart doesn't have a vertical scale so it is hard to judge whether the growth is meaningful or not. The US Geological Service doesn't seem to think this is a remarkable event, as the other Martin pointed out. One explanation is the increasing sensitivity of measuring equipment, which leads to more small-magnitude earthquakes being recorded over time.

Whether or not this hypothesis is true in the context of earthquakes, this is a very important phenomenon that occurs often. One of the mysteries in the epidemiology of autism is the recent rise in the number of cases but this is made complicated by the much higher probability of diagnosis in recent times. Similarly, as the stigma against reporting rapes, harassment and other crimes dissipate, more such criminal reports will be filed, and how does one distinguish between higher reporting and higher incidence, both of which lead to higher counts?

This same phenomenon happened during the Toyota brake scare last year. It appeared as if the problem was getting worse while the scandal brewed. But as the awareness of the potential risk increases, so too will the probability of someone reporting an issue.

One other feature of the bar chart worth noting is the faux plunge in the number of earthquakes in 2011. My recommendation is either to omit the incomplete year, or to forecast the end-of-year count (labelling it clearly as a forecast).

Finally, for data relating to rare events like earthquakes, one should try to take as long a view as possible. Twenty years is good but a hundred years would be better.

***

Next, Martin created this histogram that plots the number of earthquakes of different magnitudes over the last two decades or so.

As Martin noted, Richter scale is a log scale, meaning a 1-unit increase on the horizontal direction is really a 10 fold increase.

Martin's point is to show how strong the earthquake in Japan was compared to historical earthquakes. This is a useful and interesting question to ask, and his choice of graphics cannot be faulted.

What the other Martin was complaining about is the third corner of our Trifecta checkup: whether the right data has been used. Martin used what he was able to find. The histogram in fact looked quite nice, was rather symmetric, and suggested a tractable mathematical model. But if we proceed along that path, we would have chosen the wrong model.

The trap is a missing data problem that will not be evident to non-experts. As discussed above, small-magnitude quakes are not being tracked. Thus, the left side of the histogram severely under-represents the true frequency of such small tremors.

There is a Guthenberg-Richter law relating earthquake magnitude and frequency. It's a power law. Roughly speaking, for each 1 unit decrease in Richter magnitude, earthquakes of that magnitude occur about one-tenth as frequently. This law fits the actual data really well when Richter magnitude is high but at lower magnitudes, geologists believe that our records are severely under-counted.

In the chart shown on the right, which comes from this paper (PDF link) about predicting earthquakes, the dots sit on the line at higher magnitudes but the observed frequencies fall markedly below the line (projection) for magnitude lower than 2. It suggests that the bars on the left side of the histogram above should have been much taller than we are seeing. For a proper power law, the bars on the left should be the tallest of all!