... when all he's saying is something like I can tell you there is global warming but I can't tell you whether it will rain tomorrow.
And I am not the least bit convinced.
The reporter seems to be confused about what exactly this guy claims to be able to do. At one point, he tells us:
Clauset hopes, for example, that his work will enable predictions of when terrorists might get their hands on a nuclear, biological or chemical weapon — and when they might use it.
Later, the reporter says:
Clauset’s method is unlikely to predict exactly where or when an attack might occur. Instead, he deals in probabilities that unfold over months, years and decades.
So which is it? Is he predicting or not?
If you read the full article, you'll find that Clauset is very modest about what his research could do so it is more of a journalistic flourish which gives the impression of something ground-breaking. Or put differently, he may be contributing to understanding patterns of occurrence at a high level of abstraction but he is not in any way focused on predicting the next attack.
***
This research has one fundamental limitation: it cannot be falsified. This is the same problem with macroeconomic forecasting or climate modeling; we only have one history (technically, one sample path). Any number of models, with sufficient complexity, can fit that history.
I think of these kinds of data mining exercises as essentially descriptive in nature. I have a problem with conflating them with "predictive modeling". The only way to assess predictive models is their predictive performance. It is very difficult to validate predictions of events that occur extremely rarely. Because of this, people in the terrorist prediction business get a free pass: how do you know whether they can predict the future or not?
It turns out that this researcher has already had to retract a "prediction":
In a 2005 draft of their paper, Clauset and his collaborators projected that another 9/11-magnitude attack would occur within seven years, a finding that sparked newspaper headlines (“Physicists Predict Next 9/11 In Seven Years”). Clauset now says there were too many uncertainties in the data to make such a specific prediction. “What we had said was, if the future is exactly like the past and the assumptions of the model are correct, this is what you would expect,” he says. “But that number I don’t trust.”
This failure is presented as a lesson learned. In fact, this is one of the first lessons of data mining, perhaps even the first lesson. And yet, we are asked to trust him. We are told he "has been invited to consult with the Department of Defense, the Department of Homeland Security and other government agencies."
So, when he makes a prediction like "It’s well within the realm of possibility within the next 50 years that a low-yield nuclear bomb is detonated as a terrorist attack somewhere in the world," I don't know on what basis I should believe it. But the reporter has no such issues -- according to him, "Clearly, that is an eventuality society might want to be prepared for." I am unable to locate this "clarity".
***
The second lesson of data mining (or any kind of statistical modeling for that matter) can be what I have called the false belief in true models. And unfortunately, Clauset is not immune to this either. Consider this passage (my italics):
For example, knowing a group’s size should enable governments and law enforcement to gauge the true threat it poses (because the power law proves that size determines the frequency with which it can attack).
“It tells you that while a lot of things are flexible — different terrorist organizations are very different — there are a couple of things that they can’t change,” Clauset says. “That means that even if they know that we know this, they can’t do anything about it.”
So, because he's found patterns "over months, years, and decades", these patterns will simply have to persist in the future because the mathematics say they have to. These patterns is a kind of gravity that people can do nothing about. There is an implicit assumption that past history contains all useful information about the future.
***
If you're still reading, this is turning into a primer on data mining. Third lesson: models do not make decisions; people make decisions. This means that you can never keep politics out. Besides, the person building models has to make lots of choices during the construction process, and each such choice is subjective, and can have strong impacts on the nature of the models. (You only have to sample the controversies surrounding climate models to see what I mean.)
So it is folly to believe that this work is "less prone to ideological distortion" or that "it forces the conversation to remain analytical and apolitical". The right reference point is not model v. no model but model A v. model B. My experience is that a standoff between model A and model B is every bit as political and ideological as you could imagine. And that's because there is no such thing as a "true model" (certainly not in the social science or business setting.)


Regarding the "Clearly..." comment, remember this:
When someone says "Clearly," it's not so clear (otherwise they wouldn't have needed to say "clearly" in the first place).
When someone says "I'm sure that," they're not sure.
When someone says "Obviously," etc.
This is a famous principle in mathematics. If someone writes, "We omit the proof because it is so simple," it really means that the proof is difficult.
Posted by: Andrew Gelman | 01/06/2011 at 09:04 AM
von Neumann: "With four parameters I can fit an elephant, and with five I can make him wiggle his trunk"
Posted by: Tom West | 01/06/2011 at 10:14 AM
“Terrorist attacks happen less often in the developed world, but when they do happen, they’re often bigger than in the developing world"
Doesn't this invalidate the notion of a universal power law relating the size and frequency of attacks? Or at least render it an artifact of the mixture of a number of dissimilar functions that apply under different circumstances?
Posted by: Morgan | 01/06/2011 at 11:15 AM
"Prediction is very difficult, especially if it's about the future." is the famous quote. Although in fact it is quite easy for rare events. If I claim a 1 in 1000 chance of something happening this year, it will either happen or it wont, either way I'm right.
Posted by: Ken | 01/06/2011 at 04:41 PM
James Wei showed in the 70s that you need 20 or 30 parameters to fit an elephant. There's a demo on Wolfram: http://demonstrations.wolfram.com/FittingAnElephant/
Posted by: Alex Cook | 01/09/2011 at 08:28 AM