Light entertainment: fruits in space
Three axes or none

Where a scatter plot fails

Found this chart in the magazine that Charles Schwab sends to customers:


When there are two variables, and their correlation is of interest, a scatter plot is usually recommended. But not here!

The text labels completely dominate this chart and the designer tried very hard to place them but a careful look reveals that some boxes are placed above the dots while others are placed to their right and the dot for "Short Treasuries" holds refuge quite a while away from the dot. This means the locations of the text boxes do not substitute for the dots.


Here is a different view of this data:


I am using a bumps-style chart, which allows the labels to be written horizontally outside the canvass. Instead of all categories plotted on the same chart, I use a small multiples setup to differentiate three types of risk-return relationships.


Feed You can follow this conversation by subscribing to the comment feed for this post.


What does it mean for an investment to have "20% risk"? Is this directly comparable to a 20% return? I'm wondering whether the correspondence is real. If not, does the redesign not suffer from a dual-axis problem?


"When there are two variables, and their correlation is of interest, a scatter plot is usually recommended. But not here!"

I don't know that I can agree with that statement.

What you really mean is that the [giant, blocked, saturated] text labels don't work, not that a scatter plot doesn't work.

I feel a scatter plot is still the right way to show the data - it just leaves a labeling problem to solve.


Interesting slopegraph:

However, this chart is usually used to show data of the same type on both the right and left sides, and the same unit of measurement. The second criterion addresses Jeff's concern and to help clarify, 20% often implies a standard deviation measure of the returns.


I agree with jlbriggs. The original chart just needed simpler, less conspicuous labels.


I don't like that fact that your chart has lines. In my opinion, a line always implies connecting two different data points. You are actually connecting two different attributes of the same data point. I agree with everyone above. A cleaner scatterplot with a legend on the side would work much better.


If one of you can send me a scatterplot with cleaner labels, I'd be happy to post it.

Anne: there is a whole class of graphs in which lines connect different entities. The Bumps chart is one that I'm particularly fond of. Others go by the names of parallel coordinates, profile charts, etc. I have addressed this issue in the past. If you use a column chart, you are asking readers to trace out the lines in their heads so why not explicitly draw the lines?

Jonnie/Jeff: Risk is measured by the standard deviation of returns. Standard deviation always has the same unit as the quantity it is measuring. What you're reacting to is the shortfall of using standard deviations. It does not give much intuition: what does an SD of 5% mean? Also, this risk-return view creates a two-dimensional problem, and it is hard to order all the classes of investment which is what the reader is likely attempting to do.


Not having access to the original data, I just Photoshopped the original chart. I removed the distracting labels and replotted the points:


Data values are quickly guessed, but this seems to work for me:

I am often on the fence with your use of bumps charts. This case is no different. I think it is an interesting way to show the data, though I think the actual example is a little clunky.

I think I would prefer to see them reversed - risk on the left, return on the right, which would result in higher return on risk sloping up, seeming more positive, and vice-versa.

And though it may be nit-picking, I would show the full plot area for the 3rd section, for visual consistency and clarity.


TML/jbriggs: thanks for the contributions. Your versions are better than the original for sure. I'd argue that the text labels still draw one's attention away from the data. For this dataset, it is marginally acceptable. When there are more points, or when there are long labels, the labeling problem just exacerbates.

On the third panel, I like to break the small-multiples rule to draw attention to special features of the data. In this case, the two classes have completely different risk-return profiles compared to the other two groups--both risk and return are much lower.


Kaiser - re the 3rd panel: It doesn't look like you've broken the rule at all. The scaling is the same as the other two panels, with the exception that you don't extend the empty portion of it to the top.

I understand separating them to highlight them in that way, I would just like to see the plot extended to the same extent as the others to show that yes, these are in fact on the same scale, and to keep the visual continuity.

Regarding the labels, I do agree they are still a distraction, but I find it acceptable at this extent.

If there were a lot more data points, labels would be pointless.

If there were a few more data points, surely acceptable abbreviations could be determined.


The problem with the scatterplot is the large *and varying* size of the labels - they look like they're encoding a third dimension of data, which is spurious and therefore unhelpful. In a way, it's a scatter plot with the markers sized proportionally to the number of characters in the label:-0

Geburtstag Einladung


The comments to this entry are closed.