Sense of proportion
Charts, charts, charts

Points of comparison

Econ_mortgage In light of the current housing crisis, arising from mortgage defaults, I pulled this graphic from a Jan 2007 opinion piece that plotted historical default rates of mortgages.  Notice the high degree of stretching on the vertical axis that exaggerates the volatility: essentially, the annual delinquency rate ranged from 1.75% to 2.65% during the last six years or so.  One might be forgiven to think that a 2% default rate is quite acceptable.

Nyt_mortgage_2 Compare the above chart to the pair that showed up in the NYT in Oct 2007 (see right).  The default rates here are in the 10-20% range, very alarming indeed.

The two graphics illustrate a key issue of "aggregation" in statistical analysis.  The first graphic is super-aggregated: all types of mortgages of all ages are put together to calculate each year's default rate.  The second graphic hones in on subprime mortgages only.

More importantly, the second graphic presents data in "vintages".  Each line represents loans originated during a particular year (a "vintage").  This establishes comparability.  On the first chart, each point in time represents the default rate of mortgages averaged over all ages (some loans may be only a few months old; others may be 15 years old).  Since the default rate is much higher for very young mortgages than for older mortgages, such averaging hides crucial information.

Overall, the NYT graphic very effectively conveys the alarming trend of new mortgages performing much worse, especially those originated in 2007.

Redo_mortgage It can benefit from two slight edits: adding a few more years, and using vertical lines (the most critical comparisons are default rates for loans of a given age!)  Something like this...

Sources: "As Defaults Rise, Washington Worries", New York Times, Oct 16 2007; "Mounting Mortgage Credit Problems",, Jan 23 2007.


Feed You can follow this conversation by subscribing to the comment feed for this post.


I nearly always agree with your fine reasoning, but in this case I find that the NYT graph more clearly gives me a quick and understandable snapshot of the data.

I understand that the vertical lines in your graph give a finer grain of "comparability", but the NYT graph is easier to interpret at a glance.

Rosie Redfield

I agree with Tim. I assumed ( and still think) that the lines in the NYT graph are the cumulative default rates. I still don't understand what your vertical lines represent.

Andrew Gelman


"Vintage" is an interesting word to use. The standard term in statistics is "cohort," I believe, but maybe "vintage" is easier to remember.


Andrew: in business, both words are used; I was wondering which one to use here, now we have both!

Tim & Rosie: Let me try again: there are two things going on in this chart. There is the maturation of each vintage/cohort. But more importantly, we are interested in the comparison between these vintages/cohorts. For example, the 2007 series ends at 7 months. An interesting question is: at seven months, how are the 2007 loans performing compared to 2006 at seven months?

If you drop your finger down from the top of the 2007 line and note when you hit the 2006 line, you'll notice that the vertical distance is much, much larger than what it seems. This distortion is due to the fact that both lines start at zero and very thick lines were used, so our eyes tend to look at the white space leading to an underestimation of the actual difference.

Robert Kosara

So the key would be to combine the charts: use Kaiser's lines as the background for the NYT chart (perhaps with thinner lines) so that you can follow the individual years and tell the differences using the vertical lines as guides.


"Vintage"? I've been in a business setting for decades and have never heart that used instead of "cohort".

But there's different jargon all over the place. I'm curious as to what type of business uses this term (other than the wine industry, obviously).


Calculated Risk exhibited a plot done by Moodys using a longer time horizon here.

For what it's worth, they also use the term "vintage". Not sure why this ought to be controversial; vintage seems to be an apt metaphor.


That Calculated Risk graph is a bit too busy for my tastes: I would have preferred something plotting foreclosures against "vintage" for a selection of months age, like so.


In fact we don't really have to settle for only a selection of ages. Because the percentage rises monotonically for any given "vintage", the curve for the eight years could be given for all 21 ages without any confusing crossing over of the lines. A legend with 21 different entries would be too complicated, but would be unnecessary; just make the lines thin and all the same colour, and bold and label every fifth line, like the contours on a map.

I'm not going to extract the data points out of the chart to do it, though :-)

Chris G

Uh-oh, the choice-of-word issue has been broached. Let me add that "hones in" should be "homes in." "Hone" is a knife-sharpening word also used to mean "to perfect." It has recently crept into pseudo-legitimacy because of widespread misuse.


The comments to this entry are closed.