You can follow this conversation by subscribing to the comment feed for this post.

I've always been amused at weather sites like wunderground.com and weather.com that will claim a 10% chance of rain on a given day, but then you click the link for the hour-by-hour forecast, and it lists a 10% chance of rain at 9AM, 10% chance at 10AM, 10% chance at 11AM... If all of these probabilities were independent (which I realize they aren't), then a 10% chance of rain every hour for 24 hours would lead to around a 92% chance of rain on that day. I highly doubt these hourly forecasts would stand up well to validation.

I managed to dig up an article that I remembered running across a while ago that discusses this:
http://www.nssl.noaa.gov/users/brooks/public_html/prob/Probability.html
Ideally, a validation system would do a few things simultaneously; however, I'm not sure it's actually possible to do all of them at the same time.
1) Make sure that, 10% of the time that a 10% forecast is given, it rains.
2) "Punish" the forecaster for playing the system - for example, if the 10% forecasts have been averaging 8%, giving a 10% forecast on a couple of days it's extremely likely to rain to balance it out.
3) "Reward" the forecaster for making their forecast as different as possible from the past average for that time period. If all the days in July over the past 50 years have had rain on 15% of the days, making a 15% forecast provides no additional knowledge.

I always thought probability also included an element of spatial probability (even if there is rain in an "area" it may not hit the whole area) and Nick's NOAA link confirms that (although it does convey a lot of things that I had not considered). Any validation would involve a very complex physical mapping of rain "coverage".

I remember when I was young my father telling me that a 30% chance of rain meant that at any given time, it would be raining over 30% of the forecast area. I asked him, wouldn't that be 100% chance of rain?

Have you look at "proper scoring rules"? They might be just what you are looking for.

For example, each day the forecaster says that the probability of rain is p. Then each day give a score of 1+ln p if it rains and a score of 1+ln(1-p) if it doesn't.

This rule gives the right incentives to the forecasters to choose a p that best represents their knowledge. I believe that some weather forecasters are scored based on proper scoring rules such as this one.

10 or so years ago when I took judgment and decision making, we discussed the relationship between accuracy and confidence and, indeed, that type of measure was brought up to test that relationship. And I distinctly remembered that meteorologists were actually very good at this.

This type of measure is used when you research meta-cognition. So, for example, when we research eyewitness memory here at Lund we ask people to first recall what happened in a film, and then they note how confident they are in their memory. And, as part of the analysis of meta-cognition (how good you are at judging how good your memories are), we come up with that kind of chart - and a number of additional measures.

This has also been done on metacognition about your own performance (main finding - those who perform the worst tend to be bad at judging how bad they are).

http://www.informaworld.com/smpp/content~content=a905567536&db=all

That's a link to our paper (which I would guess is behind a university library wall, for those that do not have university access.

Yudkowsky explains scoring rules pretty well at http://yudkowsky.net/rational/technical#probability-density.

See http://predictionbook.com/ for the only publicly available free app I know of that'll help you keep track of your calibration as you suggest (disclaimer: personal interest in PredictionBook).

The comments to this entry are closed.

##### Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

## Search3

•  only in Big Data
Amazon - Barnes&Noble

Numbersense:
Amazon - Barnes&Noble

## Junk Charts Blog

Graphics design by Amanda Lee

## Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

## Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here