« Habits are hard to shake off | Main | End-of-week light entertainment: return of maritime nation »

Comments

Andrew Gibson

I'm mostly in agreement here but with one caveat: I think there may be cases where the "base" y-axis value is not 0.

Immediately, I'm thinking of metrics plotted on a percentage scale where we are interested in how much less than 100% they are. In the supply-chain domain there are examples in forecast accuracy, or order fill-rates which would ideally hit 100% and it's the degree to which they are less than 100% that is of interest, not the distance from 0. In these cases I think the y-axis should "end at 100%" and I'm not concerned at all with where it starts.

Or how about temperature scales, where the 0 is essentially arbitrary (unless you are using degrees Kelvin) ?

I imagine there are other examples where the base/reference /key-value for the y-axis (what should this be called) is non-zero too.

derek

Where the 0 is essentially arbitrary, you shouldn't be using a bar chart at all.

Chad Smith

In the first example you've cited, Andrew, would it not be best to plot the variance using 100% as the basleline value and you're data points appearing above or below the baseline as needed? This could be accomplished with equal effect using a column or line chart, although a line is probably preferable if used for plotting time series values.

Andrew Gibson

Chad - I could plot it with 100% as the baseline but then I'm not really plotting the metric I started with, I'm (at least visually) representing it's difference from 100%.

Andrew Gibson

Derek - you're right of course, perhaps that's a bad example. Let's stick with the percentage scale idea.

Forecast Accuracy is a VERY common business metric (actually defined as 1 - PercentageError). It's defined that way, I think, so that bigger is better, but what I really care about is the absolute and relative size of the error NOT the relative size of "accuracy". I can't really change the metric, it's deeply embedded in business usage.

Sometimes this is for time-series data where I use a line-chart and let both limits float. Where I'm comparing results across categories, my approach has been to force 100% as the maximum of the y-axis and let the minimum float.

Surely this isn't the only example of a non-0 base ?

jlbriggs

The whole point of a zero base for bar/column (and area) charts is that the length of the bar encodes the totality of a value, starting from nothing. The length of the bar tells you nothing at all of it doesn't start from zero.

If you can't start the bar at (a meaningful) zero, then a bar chart is not the appropriate way to chart your data.

In the case of a variance chart, regardless what label you put on the base line, you are still in reality encoding that data from a zero base, where the value against which you are measuring variance is the zero mark.

There are plenty of cases for non-zero bases for most other chart types - most often you will *not* want a zero base for a line chart, scatter plot or box plot (though obviously if zero is, incidentally, appropriate to the data set it should be used).

Andrew Gibson

Appreciate the comments from everyone as it's helping me clarify my own thinking. Mainly that this example that jumped to the front of my mind is probably not a widespread issue.

This particular metric, forecast accuracy, defined as 1- [ForecastErrorRate], varies in the range (-inf, 100%) with typical values in the (20%, 100%) range. I did not invent the metric it's a business construction so that bigger numbers are better.

As a stats guy I would prefer to see an error rate which I could plot against a 0 base without causing anyone offense and get reasonably accurate perception of absolute and relative values in the plot.

However, most business users do not recognize/understand the error metric and really want to see the accuracy values. I most definitely do NOT want to force a 0 base for this on any chart because it is meaningless.

By forcing 100% as the top of the axis, the gap between the top-axis and the plotted data (whatever chart construct I use) now becomes meaningful in absolute/relative terms. This seems better to me than leaving the axis to float at both ends, but I'm certainly open to other suggestions.

jlbriggs

@andrew - perhaps you can link to an example of a chart such as what you are describing.

jlbriggs

I have most often seen budget accuracy comparisons plotted as either

a) a variance chart, showing % above/below budget for each period (usually bars but also lines or areas)
b) a line plot with a line for actual and a line for budget
c) a bullet graph, or similar style bar chart

Chad Smith

@ Andrew - I'd also be curious to see an example if possible. I'm also working with budget data and would like to see the presentation of the metric you've described.

Andrew Gibson

Sorry for the delay. I think the problem may be in my communication. While I know it sounds like it, "Forecast accuracy" is not a typical variance measure.

We can define it as:
1 - [(weighted) Mean Absolute Percentage Error]
= 1 - SUM(ABS(Forecast-Actual))/SUM(Actual)

There are many variations on this, and its the subject of heated debate on forecasting forums but they all share similar characteristics:
- perfect forecasts would have no error and return 100%
- there is no effective lower bound on the metric, negative numbers of any scale are possible
- it's really the error that we are interested in relative values for even though business users insist on looking at "accuracy".

I'll post an example next.

Andrew Gibson

I can't seem to add graphics into the comments section so I'll post it on my blog and drop a link in here.

Andrew Gibson

OK - I added a post on my blog so I can cover this in more detail. In particular I expanded on the characteristics of the forecast accuracy metric as I think that is key to my problem. (I also thought of a handful of other supply chain metrics with the same issue)

Link below.

Visualizing Forecast Accuracy. When not to use the "start at zero" rule ?

I appreciate the feedback from all the contributors here. Comments are welcome.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues