## Necessity is the mother of invention

##### Feb 20, 2012

When there's a need to vow audiences with smart data analysis, there's invention.

Let's start with the U.S. home ownership data. The total occupied homes are subdivided into owner-occupied and renter-occupied. Thus, in any given year, we can compute the proportion of homes that are owner- or renter-occupied. We use blue for owner and red for renter, as follows:

Just to confirm, if we superimpose these two charts, we see that the proportions add up to 100%. One chart is the mirror image of the other:

Now we have confirmed the data is okay, we pull the charts apart. We change the scale of the renter chart so that the change over time is more clearly displayed. Since the home ownership bubble burst, it's the rental market that has grown.

It's time for some magic! We superimpose the charts again to obtain this:

[Ed: The remainder of the post below is modified from the original version based on reader comments]

The chart designer managed to make the two data series look different even though one series is the mirror image of the other.

***

The inspiration of this post came from reader Leanne C. who submitted this MSNBC chart:

Initially, I mistakenly assumed what is plotted are proportions. It just so happened that the total occupied units in the U.S. is in the 100M range and the owner v. rental are split 70M / 30M. I looked at the left end of the chart, and saw in 2001, about 33 of rental and about 69 of owner, which happens to add up to 100 (with rounding error). But if I had looked at right-end of the chart, where rental is 39 and owner is 75, then it would have been clear it's not adding up.

In any case, this chart looks different if we make the scales the same. In the following, each unit of both axes represents 2M units. There really is no justifiable reason why the scales should be different given that they both measure the same objects.

But using different ranges on each axis also presents a challenge: it is tempting to read meaning into the gaps between the two lines but these gaps merely reflect the choice of axis ranges.

Instead, we should convert all these units into growth indices. Let 100 be the year 2001 units. The following chart then shows what's really going on in housing:

Between 2001 and 2008, rental- and owner-occupied units experienced the same total growth (about 4%) although the trajectories were different... owner-occupied units went up steadily during this period while renter-occupied declined till 2004 and then experienced a faster growth rate between 2004-2008. Since 2008, renter-occupied continued about the same growth rate while owner-occupied flattened out and may be slightly declining.

You can follow this conversation by subscribing to the comment feed for this post.

Really? They don't look like mirror images, even with the scale changed (in my head). Care to plot them on the same scale?

Awesome! I love how simply you broke this one down. Is there ever a time when dual-y axis plots are a good idea?

Personally, I would always include the origin so you get a better absolute idea of change (there are exceptions based on audience but for a general audience this holds). You can make miniscule changes look big with scale. Playing around with scale on two axis means that I can make charts say nearly anything I want without any need for "reality" to play a role.

As it has been already noted the series are obviously not mirroring each other (one is constant in 2001-2004 while the other is growing). It's also easy to verify that they are not adding to 100%. You don't even need to look at the axes they're crossing at different levels at the beginning and at the end of the period, there is no way the sum could be constant. I guess you can still say the chart is evil because it doesn't start at zero or something... but it doesn't show percentages and it doesn't pretend to: "(in millions of units)".

Commenter #1 and #4: See my revised post. My point would be better made if the chart plotted proportions. But I hope you get the bigger message which is that the dual-axes chart is open to a lot of mischief.

Adam: I almost never use dual-y axis. The only time I'd use it is if the two data series have exactly the same scale (in terms of both the range and the units), e.g. two data series both of which are proportions. But then in those cases, why not use a scatter plot?

The comments to this entry are closed.