The overheard conversation from the other day suggests a curiosity towards a supremely important aspect of statistical predictions -- that of validating models.

In essence, the men on the street were using the outcome (rain drops) to lay down a judgment on the weather forecasters.

How should one use statistical thinking to deal with this problem?

***

Forecasters issue probabilities e.g. 20 percent chance of raining today. At the end of each day, the one-day-ahead forecast from the prior day could be evaluated. Either it rained or it did not rain. While the forecast is in probability, the outcome is a binary yes/no. And this is why validation is not straightforward. As the men of the street showed, the one day's worth of data (one Yes or one No) does not provide sufficient evidence to judge the accuracy of the forecast.

When a statistician predicts 20 percent chance of something happening, she is imagining alternative universes. As she exists today, she imagines all possible tomorrows; in one universe, tomorrow would bring 1 inch of rain, and in a different universe, tomorrow would bring 0.1 inch, etc. etc. When she says 20 percent chance, she means in 20 percent of all possible scenarios for tomorrow, she expects to see rain.

Since tomorrow only occurs once, we have a problem: we cannot directly verify such an assertion. What to do?

***

We can indirectly verify this assertion. Do the following:

- On every day in which the forecast is given to be 20 percent, note the actual outcome (rain or no rain).
- Calculate the proportion of such days in which it actually rained.
- Do this also for 30-percent forecasts, 40-percent forecasts, etc. (in reality, you would establish ranges of probabilities rather than exact probabilities)
- If the actual proportion of rainy days mirrors the forecasted probabilities, then we have confidence that the forecasters are doing a good job.

Notice the sleight of hand. In the idealistic version, the base consists of all possible tomorrows, and each day-ahead forecast could be validated. In this realistic version, we abandon hope of validating each forecast and shift our attention to validating groups of forecasts.

Please leave a comment if you have seen this sort of validation for weather forecasts, or if you know of other ways to do validation.

I've always been amused at weather sites like wunderground.com and weather.com that will claim a 10% chance of rain on a given day, but then you click the link for the hour-by-hour forecast, and it lists a 10% chance of rain at 9AM, 10% chance at 10AM, 10% chance at 11AM... If all of these probabilities were independent (which I realize they aren't), then a 10% chance of rain every hour for 24 hours would lead to around a 92% chance of rain on that day. I highly doubt these hourly forecasts would stand up well to validation.

Posted by: Brad | 06/04/2010 at 08:29 AM

I managed to dig up an article that I remembered running across a while ago that discusses this:

http://www.nssl.noaa.gov/users/brooks/public_html/prob/Probability.html

Ideally, a validation system would do a few things simultaneously; however, I'm not sure it's actually possible to do all of them at the same time.

1) Make sure that, 10% of the time that a 10% forecast is given, it rains.

2) "Punish" the forecaster for playing the system - for example, if the 10% forecasts have been averaging 8%, giving a 10% forecast on a couple of days it's extremely likely to rain to balance it out.

3) "Reward" the forecaster for making their forecast as different as possible from the past average for that time period. If all the days in July over the past 50 years have had rain on 15% of the days, making a 15% forecast provides no additional knowledge.

Posted by: Nick | 06/04/2010 at 10:54 AM

I always thought probability also included an element of spatial probability (even if there is rain in an "area" it may not hit the whole area) and Nick's NOAA link confirms that (although it does convey a lot of things that I had not considered). Any validation would involve a very complex physical mapping of rain "coverage".

Posted by: floormaster squeeze | 06/04/2010 at 12:53 PM

I remember when I was young my father telling me that a 30% chance of rain meant that at any given time, it would be raining over 30% of the forecast area. I asked him, wouldn't that be 100% chance of rain?

Posted by: gary | 06/04/2010 at 03:04 PM

Have you look at "proper scoring rules"? They might be just what you are looking for.

For example, each day the forecaster says that the probability of rain is p. Then each day give a score of 1+ln p if it rains and a score of 1+ln(1-p) if it doesn't.

This rule gives the right incentives to the forecasters to choose a p that best represents their knowledge. I believe that some weather forecasters are scored based on proper scoring rules such as this one.

Posted by: Bill | 06/04/2010 at 05:25 PM

10 or so years ago when I took judgment and decision making, we discussed the relationship between accuracy and confidence and, indeed, that type of measure was brought up to test that relationship. And I distinctly remembered that meteorologists were actually very good at this.

This type of measure is used when you research meta-cognition. So, for example, when we research eyewitness memory here at Lund we ask people to first recall what happened in a film, and then they note how confident they are in their memory. And, as part of the analysis of meta-cognition (how good you are at judging how good your memories are), we come up with that kind of chart - and a number of additional measures.

This has also been done on metacognition about your own performance (main finding - those who perform the worst tend to be bad at judging how bad they are).

http://www.informaworld.com/smpp/content~content=a905567536&db=all

That's a link to our paper (which I would guess is behind a university library wall, for those that do not have university access.

Posted by: Åse | 06/06/2010 at 02:46 AM

Yudkowsky explains scoring rules pretty well at http://yudkowsky.net/rational/technical#probability-density.

See http://predictionbook.com/ for the only publicly available free app I know of that'll help you keep track of your calibration as you suggest (disclaimer: personal interest in PredictionBook).

Posted by: Matthew Fallshaw | 06/06/2010 at 02:51 AM