Several tips for visualizing matrices

Nov 07, 2023

Continuing my review of charts that were spammed to my inbox, today I look at the following visualization of a matrix of numbers:

The matrix shows pairwise correlations between the returns of 16 investment asset classes. Correlation is a number between -1 and 1. It is a symmetric scale around 0. It embeds two dimensions: the magnitude of the correlation, and its direction (positive or negative).

The correlation matrix is a special type of matrix: a bit easier to deal with as the data already come “standardized”. As with the other charts in this series, there is a good number of errors in the chart's execution.

I’ll leave the details maybe for a future post. Just check two key properties of a correlation matrix: the diagonal consisting of self-correlations should contain all 1s; and the matrix should be symmetric across that diagonal.

***

For this post, I want to cover nuances of visualizing matrices. The chart designer knows exactly what the message of the chart is - that the asset class called "art" is attractive because it has little correlation with other popular asset classes. Regardless of the chart's errors, it’s hard for the reader to find the message in the matrix shown above.

That's because the specific data carrying the message sit in the bottom row (and the rightmost column). The cells in this row (and column) has a light purple color, which has been co-opted by the even lighter gray color used for the diagonal cells. These diagonal cells pop out of the chart despite being the least informative (they have the same values for all correlation matrices!)

***

Several tactics can be deployed to push the message to the fore.

First, let's bring the key data to the prime location on the chart - this is the top row and left column (for cultures which read top to bottom, left to right).

For all the drafts in this post, I have dropped the text descriptions of the asset classes, and replaced them with numbers so that it's easier to follow the changes. (For those who're paying attention, I also edited the data to make the matrix symmetric.)

Second, let's look at the color choice. Here, the designer made a wise choice of restricting the number of color levels to three (dark, medium and light). I retained that decision in the above revision - actually, I used four colors but there are no values in one of the four sections, therefore, effectively, only three colors appear. But let's look at what happens when the number of color levels is increased.

The more levels of color, the more strain it puts on our processing... with little reward.

Third, and most importantly, the order of the categories affects perception majorly. I have no idea what the designer used as the sorting criterion. In step one of the fix, I moved the art category to the front but left all the other categories in the original order.

The next chart has the asset classes organized from lowest to highest average correlation. Conveniently, using this sorting metric leaves the art category in its prime spot.

Notice that the appearance has completely changed. The new version brings out clusters in the data much more effectively. Most of the assets in the bottom of the chart have high correlation with each other.

Finally, because the correlation matrix is symmetric across the diagonal of self-correlations, the two halves are mirror images and thus redundant. The following removes one of the mirrored halves, and also removes the diagonal, leading to a much cleaner look.

Next time you visualize a matrix, think about how you sort the rows/columns, how you choose the color scale, and whether to plot the mirrored image and the diagonal.

You can follow this conversation by subscribing to the comment feed for this post.

After this exercise, the graph seems to indicate that Asset=3 also has small correlations with the other assets. The heat map implies that art is not unique in being a portfolio diversifier (contrary to the title on the initial graph).

You might consider an alternative five-color map that encodes the bins
[-1, -0.6), [-0.6, -0.2), [-0.2, 0.2], (-0.2, 0.6], (0.6, 1].
This assigns the same color to small correlations, regardless of sign. These bins correspond to "Large Neg," "Med Neg," "Almost Zero", "Med Pos," and "Large Pos". If you use a nearly white shade for the "Almost Zero" bin, I think the message would be even clearer.

Obviously a correlation matrix should be symmetric. And so this is not a correlation matrix, because it’s not symmetric. Your revision only plots half the data.

It says “correlation” at the top, but they must be computing something else.

Let me rephrase that: the numbers appear symmetric, but the coloring is not. It also doesn’t match the color legend. Weird!

RW: that would be "Commodities". So your observation is accurate. (This is probably the right place to point out that I'm using R's levelplot function, for which I have to code the assets according to the order in which I want them to be sorted, thus asset=3 is different for plots with different sorting strategies. I could have replaced those numbers with labels; however, for the purpose of illustration, I decided it's easier to look up column and row numbers than to look up words and phrases. )

Cris: Weird it is. And if you take a closer look, you'll find other strangeness.

The comments to this entry are closed.