« The "data" corner of the Trifecta | Main | Reading »



I often find it worthwhile to plot such things (data sets which crowd the bottom left corner) on log scales. This spreads out the data points, and turns the radiating diagonals into parallel diagonals. The danger is that it can freak out people who don't understand log scales.

Tom West

You say "In other words, the survey revealed very little of use about those categories" just after you say "those are categories people don't care too much about, and among those people who care, there isn't a consensus about good or bad."
The useful thing the survey tells is the latter - those cartegories don't matter. To me, that's a useful piece of information.


+1 Tom West. The degree of engagement is worth knowing. I would suggest improving the scatter plot by sizing the data points to the votes cast in each one, which would emphasise the significant points. If you wanted to be funkier, you could even colour the ones that represent a statistically significant result.


Derek: Log scales should be used with great care. In this case, I prefer not to use log scales because it will exaggerate the least interesting points of the chart.

Tom: Point taken. If you dare click on the link that says "show user created categories", you'd see why I think none of that data is meaningful, the way this survey was designed.

Graham Wills

Both charts would benefit by focusing on the re-formulation of {#RED, #GREEN} -> {#RED+#GREEN, #GREEN-#RED} -- essentially a typical sum-difference chart.

The bar chart originally showed just the sums, and the re-formulation works much better as it highlights the differences. However it seems to have randomized the categorical dimension. the original chart sorted by #RED, but I would suggest that sorting by the sum or the difference makes more sense.

The scatterplot screams out for a sum-difference chart -- it makes the orientation correct and makes the h/v directions meaningful.

A log scale is not a great plan as counts are rarely exponential in nature. Counts often benefit from a square root transform (see intermediate statistical texts for reasons why) so that might be attempted if the sum-difference chart does not fix the issue


What software did reader John G. use for his charts?

Canadian Nathan

That one scatterplot label should be revised to say "Top 3/Bottom 17 vote count," not "Top 4." One of the strengths of the scatterplot (sum-difference would do this too) is that it highlights three separate clusters of vote counts: the 10 categories people don't care too much about (<100 total votes), the 7 they kind of care about (150 to about 200) and the 3 they care tons about (from about 300 and up).

I'd be tempted to scrap the blue and yellow lines separating the vote count clusters (they add to the chart's confusion of criss-crossing lines at first glance), and see what it would look to just circle the clusters, or maybe colour-code them. Sum-difference would give more options for highlighting the different clusters visually too.

Cool charts as always - and cool that John G. sent in improvements.


John: It's Excel as far as I can tell. Obviously, he's a power user.

John G

Graham: Excellent suggestions, the charts would be better, thank you.

John and Kaiser: Yes, I did use Excel 2003. Is your use of the term 'power user' genuine, or is it tongue-in-cheek?

Canadian Nation: it actualy is 4 and 17: there are 21 categories, and the fourth one above the line is just barely on the uphill side, hugging the 50/50 line: 'Product/System Quality'

Everyone: I love the comments, the more I learn the better I'll get!

Jeff Weir

John G - it would be interesting if you also added a dot marker to show the actual net margin of victory. Can you post your file somewhere so that I could do this, plus have a play with the data?

Michel B

"He also orders the categories by "margin of victory", whichin effect is the net promoter score, with the category needing the most attention at the top."

Well actually that has nothing to do with the Net Promoter Score...For more details go to www.netpromoter.com.

Secondly, when you are talking about customer loyalty the net card picture is not necessarily relevant. Technical Support is a case in point. Although there are many more green cards than there are red, there are still a significant number of customers that are dissatified, so John G's chart puts it way down on the lsit of priorities whereas in my book it would certainly warrant more attention.

Thirdly your point about the scatter chart and those issues that customers do not comment on much says just that...that these are categories that are not important to the customer, thereby allowing the company carrying out the survey to focus on those issues that are being apid more attention to by customers.

My view is that John G hasn't really understood what he is looking at.


Haven't seen this post earlier, but I worked on the system this chart comes from. Few remarks:
- The categories are ordered this way because business people wanted it this way - the area to focus on was the one with most negative comments.

- The chart doesn't have much to do with NPS, but the first thing a surveyed person does is answering the NPS question and choosing score on a 0-10 scale. So it measures NPS (see the link, it lists NPS score as 23%)
It also allows (but does not require) respondents to mark some categories as "good" or "bad" (green/red) and provide comments. This allowed the mentioned chart to be created, but also some additional analysis - e.g. estimating what would the NPS score be for various categories (I don't remember the details anymore).

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter