Before the Vancouver Olympics, economist Daniel Johnson used a predictive model to forecast the final medals table. Now that the games are over, Slate's Daniel Gross compared the forecast to the reality ("America's Joy, Austria's Sorrow", March 2). The following observation is typical:
"Not only did they [the Americans] win the most of any nation ever, but they outperformed the model by the biggest margin ever," Johnson said. (At this point in the interview I interrupted Johnson's discourse with aggressive chants of "U.S.A.! U.S.A.!")
Unfortunately, Gross apparently did not speak with a statistician before writing this article. A statistical modeler would compute
REALITY - FORECAST = ERROR,
which is different from the reporter's perspective that
REALITY - EXPECTATION = OVERPERFORMANCE (or UNDERPERFORMANCE).
What's the difference?
***
The statistician takes reality as the truth, and any deviation from reality as a modeling error while the reporter takes "expectation" (the forecasts) as the truth, and any deviation from expectation as overperformance (if positive) or underperformance (if negative). When the model's forecasts are treated as yardsticks for performance, one is assuming that the expectations are set correctly, which means the model can never be wrong!
And this predictive model was disastrous! As Gross pointed out, it predicted 5 golds for Canada when it won 14, 26 total medals for the U.S. when it won 37, 25 for Austria when it won just 16, 19 for Italy when it won 5, 8 golds for Russia when it won 3, 2 golds for China when it won 5, and so on.
These deviations from forecast are forecasting errors. Different predictive models yield different expectations, and the relative accuracy of the models is revealed when the games are over, and the total error is computed for each set of predictions.
However, if the errors are interpreted as over/under performance, different predictive models cannot be compared because there are no "errors". Johnson's model has low expectations for Canada so he concludes that Canada overperformed. Someone else's model may have high expectations (say, 18 golds), in which case she concludes that Canada underperformed. Whose model is better?
In this interpretation, each modeler proclaims his or her model to be "true". This is a big blunder.
***
This reminds me of my post on the placebo effect in clinical trials. The effectiveness of a new drug is measured by the response of the treatment group minus the response of the placebo group. When this difference is very small, is it that the drug is ineffective, or is it that the placebo is "too effective"?
Comments