« Light entertainment: fruits in space | Main | Three axes or none »



What does it mean for an investment to have "20% risk"? Is this directly comparable to a 20% return? I'm wondering whether the correspondence is real. If not, does the redesign not suffer from a dual-axis problem?


"When there are two variables, and their correlation is of interest, a scatter plot is usually recommended. But not here!"

I don't know that I can agree with that statement.

What you really mean is that the [giant, blocked, saturated] text labels don't work, not that a scatter plot doesn't work.

I feel a scatter plot is still the right way to show the data - it just leaves a labeling problem to solve.


Interesting slopegraph: http://charliepark.org/slopegraphs/

However, this chart is usually used to show data of the same type on both the right and left sides, and the same unit of measurement. The second criterion addresses Jeff's concern and to help clarify, 20% often implies a standard deviation measure of the returns.


I agree with jlbriggs. The original chart just needed simpler, less conspicuous labels.


I don't like that fact that your chart has lines. In my opinion, a line always implies connecting two different data points. You are actually connecting two different attributes of the same data point. I agree with everyone above. A cleaner scatterplot with a legend on the side would work much better.


If one of you can send me a scatterplot with cleaner labels, I'd be happy to post it.

Anne: there is a whole class of graphs in which lines connect different entities. The Bumps chart is one that I'm particularly fond of. Others go by the names of parallel coordinates, profile charts, etc. I have addressed this issue in the past. If you use a column chart, you are asking readers to trace out the lines in their heads so why not explicitly draw the lines?

Jonnie/Jeff: Risk is measured by the standard deviation of returns. Standard deviation always has the same unit as the quantity it is measuring. What you're reacting to is the shortfall of using standard deviations. It does not give much intuition: what does an SD of 5% mean? Also, this risk-return view creates a two-dimensional problem, and it is hard to order all the classes of investment which is what the reader is likely attempting to do.


Not having access to the original data, I just Photoshopped the original chart. I removed the distracting labels and replotted the points: http://i.imgur.com/emBvzVy.png


Data values are quickly guessed, but this seems to work for me:


I am often on the fence with your use of bumps charts. This case is no different. I think it is an interesting way to show the data, though I think the actual example is a little clunky.

I think I would prefer to see them reversed - risk on the left, return on the right, which would result in higher return on risk sloping up, seeming more positive, and vice-versa.

And though it may be nit-picking, I would show the full plot area for the 3rd section, for visual consistency and clarity.


TML/jbriggs: thanks for the contributions. Your versions are better than the original for sure. I'd argue that the text labels still draw one's attention away from the data. For this dataset, it is marginally acceptable. When there are more points, or when there are long labels, the labeling problem just exacerbates.

On the third panel, I like to break the small-multiples rule to draw attention to special features of the data. In this case, the two classes have completely different risk-return profiles compared to the other two groups--both risk and return are much lower.


Kaiser - re the 3rd panel: It doesn't look like you've broken the rule at all. The scaling is the same as the other two panels, with the exception that you don't extend the empty portion of it to the top.

I understand separating them to highlight them in that way, I would just like to see the plot extended to the same extent as the others to show that yes, these are in fact on the same scale, and to keep the visual continuity.

Regarding the labels, I do agree they are still a distraction, but I find it acceptable at this extent.

If there were a lot more data points, labels would be pointless.

If there were a few more data points, surely acceptable abbreviations could be determined.


The problem with the scatterplot is the large *and varying* size of the labels - they look like they're encoding a third dimension of data, which is spurious and therefore unhelpful. In a way, it's a scatter plot with the markers sized proportionally to the number of characters in the label:-0

Geburtstag Einladung


Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Good Books

Keep in Touch

follow me on Twitter