Illusion or junk? 3
Bumps charts and NYT

Transparent circles

HousepricetoearningsratiolargeJens from Library House sent us this chart featuring house price to earnings ratios.  In his own words:

"the key thing that I just love is that they have included the data points, but not as points, but as little transparent circles. This allows you to understand by how much two data points are spaced apart from each other, visualising growth and making this chart look very dynamic. I have never seen this in this form before: very nice. Beyond this, the axes are clearly labelled, all in all a very simple chart, beautifully executed."


Feed You can follow this conversation by subscribing to the comment feed for this post.


They do leave out the, er, country. UK data, is it? Doesn't look right for what I know of US residential property markets. Also, I am not a fan of price multiples. Give me the inverse every time; a yield doesn't 'blow up' since its denonminator rarely gets near zero.

Little things, sure, but they matter. I won't mention that yield data for one asset class, here residences, are always more interesting in context of a yields generally, like government bonds.

Long story short: pretty chart, yes, but weak analysis.

derek c

I've experimented with open circles myself, for much the same reasons Jens gives (but also to convey uncertainty, so no one is under any illusions that a sharp point on the graph represents an equally sharp number in the field).

But this chart mostly ruins the effect by adding in a load of junk:

* The shaded area under the graph is meaningless. It would be an acceptable decorative feature on its own, but not with the circles.

* The line would be an acceptable feature on its own, or even maybe together with the open circles, but together with the open circles *and* the shading, it's too much junk.

* The alternating shaded bands between the years: chart junk. It's especially redundant as there are gridlines in there *as well*. The gridlines themselves could have been justified, and the shaded bands could have been argued as a replacement for gridlines, but the two together... this whole chart seems to have been designed by a committee, or someone afraid to come down to a straight decision between one effect or another.

* I personally think the y-axis has too many labelled ticks, and would have preferred labels only every one percent, with ticks alone remaining at every half percent.

* Not a chart junk issue, but I believe the y-axis scale should have been on the right, as we are more interested in the year 2006 than 1953.


Oops, sorry, not "percent", it's a ratio.


Cleveland was (is?) a big fan of the open circles too, so the idea has been around for a while. They can also be useful in situations of slight overplotting. The disadvantages are:

* it's difficult to tell exactly where the point is (eg 1950's-1970's is indeterminate mess). If you want to show uncertainty, it's probably better to use another device, preferrably calibrated for your data.

* it's harder to use colour to add additional information

derek c

Oh, another objection: the area shading is especially egregious, as the scale starts at 2.0, not 0.0. It gives the impression that a ratio of 6.0 is four times as high as 3.0, when it's only twice as high.


It appears that the primary information conveyed by the dots is the amount and timing of the data across time. One alternative is to plot short thin lines on the x-axis to indicate when there is data.


Are the points not equally spaced in the y-direction? I had assumed they were given the type of data.

derek c

Hadley, if that's true (and it looks like it, it looks like they're quarterly) then short thin lines on the y-axis might be a nice supplementary use of data-ink, to indicate how frequently the ratio has been at various levels over the last fifty or so years.

Ticks on the x-axis would tell you how frequently the data were collected over the years, which seems less interesting in this case.


yes if lines were plotted on the y axis, that gives a univariate frequency distribution... but it'd only work well if the values are mostly distinct


aka a rug plot, also see some of Tufte's ideas for encoding more information in to the axes. See for some examples with code in R.


aka a rug plot, also see some of Tufte's ideas for encoding more information in to the axes. See for some examples with code in R.

The comments to this entry are closed.