Clearing a forest of labels
Wayward legend takes sides in a chart of two sides, plus data woes

Clarifying comparisons in censored cohort data: UK housing affordability

If you're pondering over the following chart for five minutes or more, don't be ashamed. I took longer than that.


The chart accompanied a Financial Times article about inter-generational fairness in the U.K. To cut to the chase, a recently released study found that younger generations are spending substantially higher proportions of their incomes to pay for housing costs. The FT article is here (behind paywall). FT actually slightly modified the original chart, which I pulled from the Home Affront report by the Intergenerational Commission.


One stumbling block is to figure out what is plotted on the horizontal axis. The label "Age" has gone missing. Even though I am familiar with cohort analysis (here, generational analysis), it took effort to understand why the lines are not uniformly growing in lengths. Typically, the older generation is observed for a longer period of time, and thus should have a longer line.

In particular, the orange line, representing people born before 1895 only shows up for a five-year range, from ages 70 to 75. This was confusing because surely these people have lived through ages 20 to 70. I'm assuming the "left censoring" (missing data on the left side) is because of non-existence of old records.

The dataset is also right-censored (missing data on the right side). This occurs with the younger generations (the top three lines) because those cohorts have not yet reached certain ages. The interpretation is further complicated by the range of birth years in each cohort but let me not go there.

TL;DR ... each line represents a generation of Britons, defined by their birth years. The generations are compared by how much of their incomes did they spend on housing costs. The twist is that we control for age, meaning that we compare these generations at the same age (i.e. at each life stage).


Here is my version of the same chart:


Here are some of the key edits:

  • Vertical blocks are introduced to break up the analysis by life stage. These guide readers to compare the lines vertically i.e. across generations
  • The generations are explicitly described as cohorts by birth years
  • The labels for the generations are placed next to the lines
  • Gridlines are pushed to the back
  • The age axis is explicitly labeled
  • Age labels are thinned
  • A hierarchy on colors
  • The line segments with incomplete records are dimmed

The harmful effect of colors can be seen below. This chart is the same as the one above, except for retaining the colors of the original chart:





Feed You can follow this conversation by subscribing to the comment feed for this post.


Hi Kaiser - it's not obvious to me why the more colorful chart is bad?

The comments to this entry are closed.