You can follow this conversation by subscribing to the comment feed for this post.

I've always been amused at weather sites like wunderground.com and weather.com that will claim a 10% chance of rain on a given day, but then you click the link for the hour-by-hour forecast, and it lists a 10% chance of rain at 9AM, 10% chance at 10AM, 10% chance at 11AM... If all of these probabilities were independent (which I realize they aren't), then a 10% chance of rain every hour for 24 hours would lead to around a 92% chance of rain on that day. I highly doubt these hourly forecasts would stand up well to validation.

I managed to dig up an article that I remembered running across a while ago that discusses this:
http://www.nssl.noaa.gov/users/brooks/public_html/prob/Probability.html
Ideally, a validation system would do a few things simultaneously; however, I'm not sure it's actually possible to do all of them at the same time.
1) Make sure that, 10% of the time that a 10% forecast is given, it rains.
2) "Punish" the forecaster for playing the system - for example, if the 10% forecasts have been averaging 8%, giving a 10% forecast on a couple of days it's extremely likely to rain to balance it out.
3) "Reward" the forecaster for making their forecast as different as possible from the past average for that time period. If all the days in July over the past 50 years have had rain on 15% of the days, making a 15% forecast provides no additional knowledge.

I always thought probability also included an element of spatial probability (even if there is rain in an "area" it may not hit the whole area) and Nick's NOAA link confirms that (although it does convey a lot of things that I had not considered). Any validation would involve a very complex physical mapping of rain "coverage".

I remember when I was young my father telling me that a 30% chance of rain meant that at any given time, it would be raining over 30% of the forecast area. I asked him, wouldn't that be 100% chance of rain?

Have you look at "proper scoring rules"? They might be just what you are looking for.

For example, each day the forecaster says that the probability of rain is p. Then each day give a score of 1+ln p if it rains and a score of 1+ln(1-p) if it doesn't.

This rule gives the right incentives to the forecasters to choose a p that best represents their knowledge. I believe that some weather forecasters are scored based on proper scoring rules such as this one.

10 or so years ago when I took judgment and decision making, we discussed the relationship between accuracy and confidence and, indeed, that type of measure was brought up to test that relationship. And I distinctly remembered that meteorologists were actually very good at this.

This type of measure is used when you research meta-cognition. So, for example, when we research eyewitness memory here at Lund we ask people to first recall what happened in a film, and then they note how confident they are in their memory. And, as part of the analysis of meta-cognition (how good you are at judging how good your memories are), we come up with that kind of chart - and a number of additional measures.

This has also been done on metacognition about your own performance (main finding - those who perform the worst tend to be bad at judging how bad they are).

http://www.informaworld.com/smpp/content~content=a905567536&db=all

That's a link to our paper (which I would guess is behind a university library wall, for those that do not have university access.

Yudkowsky explains scoring rules pretty well at http://yudkowsky.net/rational/technical#probability-density.

See http://predictionbook.com/ for the only publicly available free app I know of that'll help you keep track of your calibration as you suggest (disclaimer: personal interest in PredictionBook).

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

## NEW BOOTCAMP

See our curriculum, instructors. Apply.
Business analytics and data visualization expert. Author and Speaker. Founder of Principal Analytics Prep, MS Applied Analytics at Columbia. See my full bio.

## Next Events

Sep: 18 Statistical Communications (guest of Gelman) , NYC

Sep: 19 Raw Haus: Humanizing Leadership , NYC

Sep: 20 NBA Hackathon (judge) NYC

Sep: 26 Analytics Resume Workshop w/ NYPL NYC

See here

## Future Courses (New York)

Summer: Statistical Reasoning & Numbersense, Principal Analytics Prep (4 weeks)

Summer: Applied Analytics Frameworks & Methods, Columbia (6 weeks)

## Junk Charts Blog

Graphics design by Amanda Lee

## Search3

•  only in Big Data