The right scale
A budding field

Rise and fall

Via Adam came this "colorful" chart of the rise and fall of house prices since 2000, as measured by the Case-Shiller index.  He commented that this showed the old saw "the taller they are, the harder they fall".


A different chart allows us to test this theory directly.  From the above, we noted that each curve was composed of two phases, a long rise from 2000 to roughly mid-2000s followed by a steep decline.  We computed two data series: the average monthly growth rate during the inflation phase and the average monthly decline during the deflation phase.  The scatter plot showed the correlation. 


The dots displayed pretty strong correlation, confirming that on average, the faster they rise, the steeper they fall.

The diagonal line indicated equal rates of growth and subsequent decline.  The cities above the line, especially Boston and New York, have witnessed declines that were much slower than the earlier rises.  On the other end, cities like Detroit, Cleveland, Atlanta and Dallas suffered price deflation much faster than earlier inflation.  Indeed, the ratio of decline to rise rates is given by the slope from the origin to the dot.

As for the original chart, it showed all the signs of Excel defaults.  It just does not make sense for a charting program to pick a different color for each time series, no matter how many there are.  Beyond four or five colors, it is impossible for readers to tell the lines apart.  In these situations, we should adopt a foreground / background strategy: decide on the key lines, highlight those with color, gray out the remaining lines.

Reference: Standard & Poor


Feed You can follow this conversation by subscribing to the comment feed for this post.


And they've got shadows with everything, and the ubiquitous massively-labelled scales.

Actually the year scale is the most hilarious. It consists of the same year repeated three times over before moving to the next.

Yihui Xie

Only a trivial suggestion: perhaps it's better to set the range of x-axis of the scatter plot to be [0%, 2.5%] too (the same as y-axis) so that people can easily know where is the real "diagonal line".


I agree that the relational plot is highly more explanatory than the time series, but presuming the divide chosen was the max for each individual city, don't we have to take the apparent message of the plot with a grain of salt, as the deflationary time series for some of these cities is a lot shorter (so far) than that for other cities? I'd also argue that the steepness of the decline relative to the incline is impossible to gage without a better definition of the start points (in the case of the inclining side) and end points (in the case of the declining side).


Martin: good points. With time series data, especially indices, knowing which time point was chosen as the reference level is very important; here, I didn't change the data, it's Jan 2000 = 100.

Note that the rates plotted are for compound monthly growth rates and so the length of the decline does not matter. One can grouse that we should model the curve (exponential decay, etc.); I'll just leave it to others to explore this avenue.

The precise phase definitions were Jan 2000 to peak month for inflationary phase, and from peak month to May 2008 (current) for deflationary.


Yihui: I usually prefer to square out the plot as suggested. I tried it here as well but didn't like the result; by doing this, the entire right half of the plot would be empty.


You may put your annotation on the right inside the plot so the space will not be wasted :-) (and draw a diagonal line from top-left to bottom-right)


I get your drift, but I read Iacono (the source of the chart) regularly, and he does the 20 color bit because the folk following the Case-Shiller index love, just love, to be able to contrast and compare the 20 cities all at the same time - imagine what happens when C-S widens it out to 30 or 40 cities....

P.S. - I'm sending him a link to this post - he'll get a kick!


I too would be love to be able to compare and contrast all 20 cities at the same time. The trouble is that the Excel default zillion-color scheme doesn't confer that ability. And it's not just an Excel default problem either; basically color choices that would genuinely allow the readers to do that are almost impossible for even the most skilled graph designer to arrange.

(I don't say completely impossible; I can see how a hierarchical hue-luminance set might make some sense of the pack. But the work hasn't been put in here)

Google for "William Cleveland", who did the early work on people's actual graphical abilities that showed that they weren't really able to see information beyond a certain complexity, if the means used to present that complexity were too far down a hierarchy that has color at the bottom, and area and angle near the bottom. (this is why pie charts are so bad for understanding, no matter how popular they are for their prettiness)

I wonder if there's a graphical equivalent of the Dunning-Kruger effect, where people who can't read a pretty colored graph don't know they can't, so they think it's a great graph? Not quite the same: I'm thinking of it as a property of the graph here, not of the people.

Jon Peltier has used complex collections of data to make an interactive chart that compares any two cases, against a background of all cases. This would work quite well for the Case-Schiller set

Hisham Abdel Maguid

Epic Systems together with Beemode ( have developed a Data Visualization software "Trend Compass" almost ready to be released soon. It is an extension to Gapminder which was invented by a Swedish Professor. You can view it :


It is a new concept in viewing statistics and trends in an animated way. It could be used in presentation, analysis,research, decision making, etc.

Here are some links :
- Part of what we did with some Governmental institution:

- A project we did with Princeton University on US unemployment :

- April 2008 Media Monitoring on Cars TV ads (ad duration vs occurences over time) :

- Ads Monitoring on TV Sattelite Channels during April 2008. Pick Duration (Ads daily duration) vs Repeat (Ads repetition per day).

I hope you could evaluate it and give me your comments. So many ideas are there.

You can test the software by uploading data on our website and getting the corresponding Flash charts. This is for a limited number of users.


Eng. Hisham Abdel Maguid


Kaiser, which software do you use to prepare your charts?

If you have not do it before, a post about tools and software will be extremely valuable...


Stan Tyan

Data science is useless if you can’t communicate your findings to others, and visualizations are imperative if you’re speaking to a non-technical audience. If you come into a board room without presenting any visuals, you’re going to run out of work pretty soon.

More than that, visualizations are very helpful for data scientists themselves. Visual representations are much more intuitive to grasp than numerical abstractions. That’s just human nature, whether you’re a data scientist or not.

The comments to this entry are closed.