Getting to first before going to second
Visual design is hard, brought to you by NYC subway

Start at zero, or start at wherever

Andrew's post about start-at-zero helps me refine my own thinking on this evergreen topic.

The specific example he gave is this one:

Andrewgelman_invitezeroin

The dataset is a numeric variable (y) with values over time (x). The minimum numeric value is around 3 and the range of values is from around 3 to just above 20. His advice is "If zero is in the neighborhood, invite it in". (Link)

The rule, as usual, sounds simpler than it really is. In the discussion, Andrew highlights several considerations.

Is zero a meaningful reference value? In his example, we assume it is and so we invite zero in. But, as Andrew also says, if zero is meaningless, then recall the invitation. So context must be accounted for.

In Chapter 1 of Numbersense (link), I looked at some SAT score data of applicants to competitive colleges. Is zero a meaningful reference value for SAT scores? Someone might argue yes, since it is the theoretical minimum score that anyone could get from the test. Any statistician will likely say no, since a competitive college will have never seen an applicant submitting a score of zero, or anywhere close to zero. Thus, starting such a chart at zero inserts a lot of whitespace and draws attention to a useless insight - how far above the theoretical worst performer is someone's score.

***

What about the left panel of Andrew's chart makes us uncomfortable? I ask myself this question. My answer is that the horizontal axis highlights an arbitrary value that distracts from the key patterns of the data.

As shown below, the arbitrary value is ~2.5. This is utterly meaningless.

Redo_andrewgelman_invitezeroin

What if 0 is also a meaningless value for this dataset? I'd recommend "bench the axis". Like this:

Redo_andrewgelman_benchtheaxis

An axis is a tool to help readers understand a chart. If it isn't serving a function, an axis doesn't need to be there. When I choose a line chart for time-series data, I'm drawing attention to temporal change in the numeric values, or the range of values. I'm not saying something about the values relative to some reference number.

From this example, we also see that the horizontal axis should not be regarded as a hanger for time labels. Time labels can exist by themselves.

 

 

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Colin

Um... Isn't... Isn't the minimum score on the SAT currently 400? Like you literally cannot get a 0. I don't think you've ever been able to get less than 200 on any section of it.

Kaiser

Colin: yes, you're right. Here's the official page on SAT scores, so the minimum is 400. https://collegereadiness.collegeboard.org/sat/scores/understanding-scores/structure

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name is required. Email address will not be displayed with the comment.)