Earlier in the month, Prof. Gelman linked to Brandon's fascinating analysis of on-line weather forecasting accuracy. I have done some additional analysis of the data and the result can be visualized as follows.

I'll concentrate my comments on three observations:

- CNN was the clear winner in forecasting accuracy during this period based on two criteria: its median error in forecasting daily lows, and its median error in forecasting daily highs. Moreover, both the median errors were zero, which gives us confidence about its accuracy. The Weather Channel (TWC) and Intellicast (INT) were not far behind.
- The ability to forecast highs was better across the board than that of forecasting lows (except BBC). I am not sure why this should be so.
- Overall, our weather forecasters were much too risk-averse. Notice that the errors were heavily biased in the lower left quadrant. A negative error on low temperatures means
*predicted*low is higher than*actual*low; a negative error on high temperatures means*predicted*high is lower than*actual*high. Taking these together, we observe that the range of actual temperatures have generally been larger than the range of predicted temperatures! No one was willing to go out on a limb, so to speak, to forecast extremes.

Actually, I believe this inability or unwillingness to forecast extreme values is endemic to all forecasting methodologies.

Before closing, I mention that the graph was based on a subset of Brandon's data. I only considered same-day forecasts, did not consider Unisys (because they didn't provide forecasts for lows), and also noted that there might be bias since there were breaks in the time series. Also, I retained the sign information and didn't take absolute values as Brandon did.

Is computing medians enough? If i was to predict the median temperature every day, wouldn't I get a median difference of 0 (half would be above and half would be below)? Or did you take this into account in some other way?

Posted by: Hadley | Feb 25, 2007 at 07:24 PM

You write, "we observe that the range of actual temperatures have generally been larger than the range of predicted temperatures . . . this inability or unwillingness to forecast extreme values is endemic to all forecasting methodologies."

I think you want to be careful in separating two issues:

1. Predicted low temps tended to be lower than actual low temps (or the other way around, I couldn't quite follow). This is a real bias which could possibly be explained by the fact that it was an unusual season; I think you'd want to check with several years of data.

2. Point predictions often do (and, in fact, should) show less variation than actual data. Call the prediction y.pred, the actual value y, and "data" as the data used in the prediction. Mathematically,

var(y) = var(E(y|data)) + E(var(y|data)).

The first term on the right side of this equation is the variance of the predicted values. Thus, predicted values (and Bayesian estimates in general) are less variable than data (and true values in general). Tom Louis has written some papers on this.

Posted by: Andrew Gelman | Feb 25, 2007 at 08:50 PM

Hadley: it's the median error, where error is defined as (predicted temp - actual temp). The sample for each website was its daily predictions over the period of observation (n = 40 approx), so the median error of zero means half of those days, the website made an error >= 0, and half <= 0.

If this were a research project, I'd also look at mean error, max error, range of error and other factors. I was going to re-do this using mean error which is better in this case but ran out of time.

Andrew: on 1, surely a longer observation period would have made the analysis better. But a reliable forecasting system should work across different periods... unless as I think you are suggesting, the temperatures were so extreme as to cause extrapolation errors in this particular period.

Predicted lows were generally above actual lows, as I define error to be predicted - actual for both highs and lows.

on 2, thanks for the pointer. The consequence of this seems to be a bound on the predictive ability.

Posted by: Kaiser | Feb 25, 2007 at 10:14 PM

"Actually, I believe this inability or unwillingness to forecast extreme values is endemic to all forecasting methodologies."

On-line weather forecasts are produced from numerical weather prediction model output with little, if any human intervention.

In the US, there are at least three models run up to four times each day.

These models solve the basic equations of state on grid-point or spectral geographic domains as an initial value problem with 'small' time steps out to 384 hours.

The raw model output is subject to statistical 'adjustments' to account for known bias (too slow, too wet, too cold, etc) called 'model output statistics,' or MOS.

Some on-line sites use MOS from the Nested Grid Model (NGM). Some use MOS from the Global Forecast System (GFS). Some use North American Model (NAM) MOS.

If the model it right...the on-line forecast is right.

Nice blog. Tufte would be proud.

Posted by: TQ | Mar 01, 2007 at 10:02 PM

What are the standard deviations like? Because, who ever has the lowest one is the winner in my book. I can just recalculate the weather manually.

Posted by: Jeremy Kandah | Mar 08, 2007 at 09:47 PM

The standard deviation can be visualized using the charts I created on a latter day.

Information gain and loss

Posted by: Kaiser | Mar 08, 2007 at 10:52 PM