The website for financial skeptics, Zero Hedge, highlighted a chart showing the estimation errors committed by the statisticians at the Bureau of Labor Statistics relating to the weekly jobless claim number. A sobering sight:
Typically, the BLS publishes one number, and then subsequently, publishes revisions. The data being plotted are the differences between the revised estimates and the original estimates by week. They plotted this series all the way back to 2000 which allows us to observe the magnitude of these errors during both a boom period as well as a bust period.
What is striking is the clear positive bias, i.e. the revised estimates of the jobless claims were more likely to be higher than the original estimates. If the statisticians were making errors at random, we expect the area above the zero line to be about the same as that below. In an ideal model, the errors would be random but that is the ideal, not the reality.
What is even more disturbing is that the positive bias persists through the entire business cycle! I'd have guessed that a positive bias is likely during a bust period because the statistical procedure may "lag" (fail to catch up with) the actual growth in jobless claims but that theory went out the window when the same direction of bias remained the same even during the boom years.
"Tyler Durden" at Zero Hedge takes this as evidence of "propaganda":
The implication is that fraudulent (and we sure hope this is inadvertent, although a 90% error rate definitely would invite a criminal investigation into just who and how stands to benefit from such an manipulative upward bias) data reporting is responsible for a persistent upward bias in data, and that fundamentals have been disconnected from the "government's reality" for years.
He is right to draw our attention to these numbers, and we should take a few things away from this discussion:
- Statisticians pay a lot of attention to studying the so-called "error term" because they tell us a lot about the quality of our models.
- Notice the difference between "bias" and "variance". Bias is the red line in the graph that shows that on average, the revision is about 3,000 worse than the original estimate. Variance consists of the fluctuations around this red line.
- The performance of any estimation procedure will be more easily improved by reducing the large variance (the huge spikes) than by reducing the bias (average error). This is one way in which the main idea behind Chapter 1 enters.
- We should always pay attention to the incentives of the people working with numbers; this includes the statisticians compiling the data, the government officials reporting the data, and last but not least, the pundits commenting on the data (including Zero Hedge). This is one of the key messages in Numbers Rule Your World. This is also why the entire "official statistics" field, which was one of the earliest fields in statistics, has always emphasized the need to be politically independent, and an honest reporter of the numbers.