Like reader Chris P., I'm underwhelmed by this Wall Street Journal chart showing car sales by the top 5 brands during the past 5 years.
It's a stacked column chart for which I have never found a good use. While it contains a lot of data, readers can truly comprehend only the lowest series (in this case, GM) and the total. For any of the other series, we can never be sure whether the number went up or down because it depends on whether the cumulative total of the series below it went up or down and by how much. We have a lot of ink, a lot of data but almost no information.
Chris pointed out that several WSJ readers have complained about seasonal effects: car sales are not even throughout the year, and so a chart like the one above may encourage readers to make month-on-month comparisons, rather than year-on-year comparisons.
Their point is valid but misplaced. If you look at the GM data over time (the dark red bits), it is clear that the seasonal effect has already been removed, as the trend is rather flat within any given year. What they plotted is the so-called SAAR (seasonally adjusted annual rate). Think of SAAR as the "run rate" for annual car sales. Divide the number by 12, you get the run rate at the monthly level. Now remove the seasonal adjustment, and you get the actual sales for that month.
From the point of view of the Trifecta checkup, I have no problems with the business question or the data, but I don't like the graphical construct.
To illustrate the point further, I'm switching to a different data set with a similar structure (I can't find a complete data set for the car sales SAAR). As reader Matthew F. pointed out in his comment on my previous post, the housing starts series published by the Census Bureau is also computed as SAAR. I just need to substitute car brand for region of country, and cars sold for housing starts.
In the panel on the right, focus on the top row of charts, which plot the unadjusted data. I have the housing starts separated by region, and within each region, I plotted the annual trend, one line for each year. (I smoothed the lines to bring out the seasonal pattern.)
What you see is that almost every line is an inverted U. This means that no matter what year, and what region, housing starts peak during the summer and ebb during the winter.
So if you compare the June starts with the October starts, it is a given that the October number will be lower than June. So reporting a drop from June to October is meaningless. What is meaningful is whether this year's drop is unusually large or unusually small; to assess that, we have to know the average historical drop between October and June.
Statisticians are looking for explanations for why housing starts vary from month to month. Some of the change is due to the persistent seasonal pattern. Some of the change is due to economic factors or other factors. The reason for seasonal adjustments is to get rid of the persistent seasonal pattern, or put differently, to focus attention on other factors deemed more interesting.
The bottom row of charts above contains the seasonally adjusted data (I have used the monthly rather than annual rates to make it directly comparable to the unadjusted numbers.) Notice that the inverted U shape has pretty much disappeared everywhere.
Comparing the line for 2009 2008 for the South region (first column) is instructive. The unadjusted line shows October sales below June sales, and the familiar inverted U shape. But was it just a seasonal pattern or was there something else driving sales down? The bottom line shows clearly that after accounting for the seasonality, the number of housing starts was trending down the entire year in 20092008, so indeed something else was going on.
[PS. Contrast this with 2003 when the unadjusted data show the usual inverted U shape but we learn that housing starts actually increased over that year relative to the average year.]
I think people have major problems with this because they think of each number (in the bottom row of charts) as an estimate of the car sales for that month. And they would be right -- we cannot take the seasonally adjusted monthly housing starts as an estimate of the true monthly housing starts.
However, statisticians invented seasonal adjustments for a different purpose. If you are a policy maker, you would like to know if the housing market is healthy or not. There are some factors you can't control, for example, the fact that construction companies are more active in the summer than in the winter. But such factors affect the trend in a major way: every winter, housing starts decline. This means that a decline in housing starts during the winter is not necessarily an indication of a weak housing market. However, if the winter decline is steeper than in a typical year, then the housing market must have weakened. The purpose of seasonal adjustment is to remove the seasonal effect so that the policy maker can see what is happening to housing starts (beyond seasonal effects). To use the adjustment properly, we should look at comparisons, not the individual numbers.
In the top chart, we see the inverted Us again. All this up and down action distract us from seeing whether housing starts have improved or deteriorated in each region and period of time.
In the bottom chart, we can clearly see that the South has the greatest run up prior to 2005, and then suffered a severe contraction till 2009, ending up almost half the 2000 amount. By contrast, the NorthEast has seen no significant trend over the last 10 years.
The bottom chart is actually a variant of the WSJ stacked column chart; the only difference is that there is no stacking. The total housing starts across all 4 regions is not immediately visible from this chart. It is a trade-off with which I'm willing to live.
As I noted in my comment to Matthew's comment, the SAAR is just 12 times the seasonally adjusted data I plotted above and this means, the SAAR chart will look exactly the same with a different vertical scale.
PS. As pointed out by Joe M., the original version of the chart showing non adjusted monthly rate across 4 regions plotted the data in the wrong order. The new version fixed this problem. Also, the year labels were off by 1.