« Bubbles of death 2 | Main | Mean and median »



Is computing medians enough? If i was to predict the median temperature every day, wouldn't I get a median difference of 0 (half would be above and half would be below)? Or did you take this into account in some other way?

Andrew Gelman

You write, "we observe that the range of actual temperatures have generally been larger than the range of predicted temperatures . . . this inability or unwillingness to forecast extreme values is endemic to all forecasting methodologies."

I think you want to be careful in separating two issues:

1. Predicted low temps tended to be lower than actual low temps (or the other way around, I couldn't quite follow). This is a real bias which could possibly be explained by the fact that it was an unusual season; I think you'd want to check with several years of data.

2. Point predictions often do (and, in fact, should) show less variation than actual data. Call the prediction y.pred, the actual value y, and "data" as the data used in the prediction. Mathematically,
var(y) = var(E(y|data)) + E(var(y|data)).
The first term on the right side of this equation is the variance of the predicted values. Thus, predicted values (and Bayesian estimates in general) are less variable than data (and true values in general). Tom Louis has written some papers on this.


Hadley: it's the median error, where error is defined as (predicted temp - actual temp). The sample for each website was its daily predictions over the period of observation (n = 40 approx), so the median error of zero means half of those days, the website made an error >= 0, and half <= 0.

If this were a research project, I'd also look at mean error, max error, range of error and other factors. I was going to re-do this using mean error which is better in this case but ran out of time.

Andrew: on 1, surely a longer observation period would have made the analysis better. But a reliable forecasting system should work across different periods... unless as I think you are suggesting, the temperatures were so extreme as to cause extrapolation errors in this particular period.

Predicted lows were generally above actual lows, as I define error to be predicted - actual for both highs and lows.

on 2, thanks for the pointer. The consequence of this seems to be a bound on the predictive ability.


"Actually, I believe this inability or unwillingness to forecast extreme values is endemic to all forecasting methodologies."

On-line weather forecasts are produced from numerical weather prediction model output with little, if any human intervention.

In the US, there are at least three models run up to four times each day.

These models solve the basic equations of state on grid-point or spectral geographic domains as an initial value problem with 'small' time steps out to 384 hours.

The raw model output is subject to statistical 'adjustments' to account for known bias (too slow, too wet, too cold, etc) called 'model output statistics,' or MOS.

Some on-line sites use MOS from the Nested Grid Model (NGM). Some use MOS from the Global Forecast System (GFS). Some use North American Model (NAM) MOS.

If the model it right...the on-line forecast is right.

Nice blog. Tufte would be proud.

Jeremy Kandah

What are the standard deviations like? Because, who ever has the lowest one is the winner in my book. I can just recalculate the weather manually.


The standard deviation can be visualized using the charts I created on a latter day.

Information gain and loss

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Good Books

Keep in Touch

follow me on Twitter