Good and simply better
What is seasonal adjustment and why is it used?

Late-to-the-gate depression brought to you by the Census

In this space was originally intended a post about seasonal adjustments to time-series data. That now has to wait. Because I am recovering from a bout of late-to-the-gate depression: you know the feeling, having arrived at the airport just in time, you half-run, half-walk to get to the gate, only to learn that the gate has just closed, and all the strain has been for naught.

I didn't miss a flight. I was knocked over by the Census Bureau, mentally exhausted. Any of you who have processed lots of data know this feeling. Just before you decide to publish your results (and thankfully, before and not after), you discovered that the data you analyzed contained such egregious errors as to be nonsensical.

Census_housingstartswebSo I present you the data on "new privately owned housing units started", or commonly known as "housing starts". The offending spreadsheet can be downloaded at the Census Bureau here. (Screen shot on left).

The file contains four sets of data: annual data, raw monthly numbers (not seasonally adjusted), seasonally adjusted monthly numbers, and the seasonal adjustment factors (which is just the ratio of the unadjusted to adjusted numbers).


The shocker: the "seasonally adjusted" series is 10 times as big as the "unadjusted" series. I kid you not. In October 2000, the raw data found 140,000 units of housing started; after adjustment, we magically had 1.5 million units started.

Census_housingstartscompareEither I'm misreading the spreadsheet, or quality control is seriously missing at the Census Bureau.

Since the seasonal adjustment factors were provided, I tried to reconcile the two sets of numbers. Perhaps a factor of 10 adjustment is enough. This caused more headaches.

According to the footnote, the factor is defined as "the ratio of unadjusted housing units started to the seasonally adjusted housing units started". For October 2000, this factor was given as 108, which I took to mean that the adjustment took the raw data down by about 8%.

But the digits wouldn't cooperate. Multiplying or dividing by 10 cannot resolve the fact that the seasonally adjusted "549" is larger than the unadjusted "397".


This is the unglamorous side of doing analytics and working with data. When I recover, I will write that post about seasonal adjustments.




Matthew F.

I think the key is that it is the seasonally adjusted ANNUAL rate.

Matthew F.

To go into a little more detail: The "StartsSA" tab says its data is the "Seasonally Adjusted Annual Rate". This means they apply the adjustment factor of 108% to get 129.6k housing starts for the month to adjust for October typically having more housing starts than usual. They then multiply that by 12 to get the annual rate, based on the assumption that the rate reflects a trend that can be used to predict future sales. That's where you get the 1.5 million starts.

You see the same thing with car sales data: Every month when sales figures are released, they calculate the seasonally adjusted annual rate based on the month's performance relative to its usual percentage of the year's sales. Hence the extrapolation of an SAAR of 12.25 million auto sales - few outlets even bother to report the actual number of sales, because it's useless for comparison to previous months.


Matthew: Thanks for the clarification. I wish they made clear they are doing two things, not one. The real seasonal adjustment is at the monthly level; then, there is a projection based on a simple linear model from monthly to annual. The second step is aesthetics, not statistics: it just inflates each number by a factor of 12. Not saying it's not useful, just that it merely changes the scale.
The good news is that my previous work is not all wasted so look for a real post soon.


I hope the next post will clarify several issues I could not really analyze in this one.
Such as the number of sales per month, and several outcomes from this topic.

Kaiser, Good job on this one, now going to the next post.

Charles Wilson

The writer should have shown the census instead of downloading it from other page so that the readers do not have hard time looking for it.


the errors posted by the census bureau proved to be non factual.they should have not missed any pertinent data so as to prevent this kind of mishap. think again.

The comments to this entry are closed.