How to print cash, graphically
Light entertainment: crime is no joke

Showing three dimensions using a ternary plot

Long-time reader Daniel L. isn't a fan of this chart, especially when it is made to spin, as you can see at this link:

Datascienceblogfactoranalysis25datascienceskills.png

Like other 3D charts, this one is hard to read. The vertical lines are both good and bad: They make the one dimension very easy to read but their very existence makes one realize the challenges of reading the other dimensions without guidelines.

This dataset allows me to show a ternary plot. The ternary plot is an ingenious way of putting three dimensions onto a flat surface. I have found few good uses of this chart type, though.

Redo_datascience_v1

Let's get to the core of the issue: the analyst started with 25 skills that are frequently required by data science and analytics jobs, and his goal is to classify these skills into three groups. The underlying method used to create these groups is factor analysis.

Each dot above is a skill. The HQ of each grouping of skills (known as a factor) is a corner of the plot. The closer the dot is to the corner, the more relevant that skill is to the skill group.

In the above chart, I highlighted four skills that are not clearly in one or another skill group. For example, Commuication straddles the Math/Stats and Business dimensions but scores lowly on the Technology/Programming dimension.

***

The ternary plot has a few problems. Like any scatter plot, once you have 10 or more dots, it is hard to fit all the data labels. Further, the axis labels must be carefully done to help readers understand the plot. 

Before long, the chart looks very cluttered. There just isn't enough room to get all your words in. Here is another version of the same chart -- wiht a different set of annotation.

Redo_datascience_v2

Instead of drawing attention to those skills that have no clear home, this version of the chart focuses on the dots close to each corner.

In two cases, I classified two of the skills differently from the original. The Machine Learning skill is part of Math/Stats on my charts but it is part of Technology/Programming on the original.

The ternary plot is interesting and unusual but is only useful in selected problems.

Comments

Jeffrey Showman

This is the first time I've seen a ternary plot, thank you for the introduction. I can see how it has limited usefulness, though. Aside from the challenge of identifying or labeling individual data points, I find it difficult to keep track of the "high/low" ends of each axis.

Marc

I think it is important to note that the ternary diagram is showing the relative amount of a component with respect to the others. In other words, the total is not important. Therefore, you can see your observations summing up to a certain constant (1 as you plotted, or 100 or whatever).
In your example, a person with a high level of Math.Stats, Technology.Programming and Business, and other person with poor level of them are going to be plotted in the same place (in the center).

Berry

It's quite useful in some geosciences, like soil science.
Classifying a soil by grain size distribution (which has all sorts of implications on how the plants can take up water, how water is retained in the soil and thus e.g. how irrigation needs to be done) is often done on a ternery diagram called texture triangle like this one:

http://i.stack.imgur.com/r7fYI.png

clay are the very small particles (not visible to naked eye), sand is what you'd typically know as grains from the beach, silt is inbetween. There's an exact definition for grain size classes, btw.

So if you have several soil samples, on the diagramm it's very easy to get a rough idea of soil texture and thus, usability for agriculture.


Berry

In case it's interesting: clay soil doesn't let water in easily (but retains nutrients well), sand doesn't hold water, silt is more prone to erosion.
So the 'optimum' soil has a mixture of all (called loam) and resides in the middle of the texture triangle.
(Yes, this is a strong symplification. Read a soil science book for the nuances and details^^)

Mark

I have to disgree with the choice of a ternary plot for this data.

Such plots are perfect when the sums of the components are constant. This is the case in the soil example that Berry posted and also for the phase diagrams for ternary alloy systems where I first encountered them.

However, looking at the data provided in the table in the original article, the sum is not constant and, therefore, this plot represents and unnecessary distortion of the data.

My personal preference would have been to take a top-down projection of the scatter plot and encode the z-axis using marker size allowing you to still group the data by color as in the original.

The comments to this entry are closed.