« January 2024 | Main | March 2024 »

Lost in the middle class

Washington Post asks people what it means to be middle class in the U.S. (link; paywall)

The following graphic illustrates one type of definition, purely based on income ranges.

Wpost_middleclass

For me, this chart is more taxing to read than it appears.

It can be read column by column. Each column represents a hypotheticial annual income for a family of four. People are asked whether they consider that family lower/working class, middle class or upper class. Be careful as the increments from column to column are not uniform.

Now, what's the question again? We're primarily interested in what incomes constitute middle class.

So, we should be looking at the deep green blocks that hang in the middle of each column. It's not easy to read the proportion of middle blocks in a stacked column chart.

***

I tried separating out the three perceived income classes, using a small-multiples design.

Junkcharts_redo_wpost_middleclass

One can more directly see what income ranges are most popularly perceived as being in each income class.

***

The article also goes into alternative definitions of middle class, using more qualitative metrics, such as "able to pay all bills on time without worry". That's a whole other post.

 


Neither the forest nor the trees

On the NYT's twitter feed, they featured an article titled "These Seven Tech Stocks are Driving the Market". The first sentence of the article reads: "The S&P 500 is at an all-time high, and investors have just a handful of stocks to thank for it."

Without having seen any data, I'd surmise from that line that (a) the S&P 500 index has gone up recently, and (b) most if not all of the gain in the index can be attributed to gains in the tech stocks mentioned in the headline. (For purists, a handful is five, not seven.)

The chart accompanying the tweet is a treemap:

Nyt_magnificentseven

The treemap is possibly the most overhyped chart type of the modern era. Its use here is tangential to the story of surging market value. That's because the treemap presents a snapshot of the composition of the index, but contains nothing about the trend (change over time) of the average index value or of its components.

***

Even in representing composition, the treemap is inferior to, gasp, a pie chart. Of course, we can only use a pie chart for small numbers of components. The following illustration takes the data from the NYT chart on the Magnificent Seven tech stocks, and compares a treemap versus a pie chart side by side:

Junkcharts_redo_nyt_magnificent7

The reason why the treemap is worse is that both the width and the height of the boxes are changing while only the radius (or angle) of the pie slices is varying. (Not saying use a pie chart, just saying the treemap is worse.)

There is a reason why the designer appended data labels to each of the seven boxes. The effect of not having those labels is readily felt when our eyes reach the next set of stocks – which carry company names but not their market values. What is the market value of Berkshire Hathaway?

Even more so, what proportion of the total is the market value of Berkshire Hathaway? Indeed, if the designer did not write down 29%, it would take a bit of work to figure out the aggregate value of yellow boxes relative to the entire box!

This design sucessfully draws our attention to the structural importance of various components of the whole. There are three layers - the yellow boxes (Magnificent Seven), the gray boxes with company names, and the other gray boxes. I also like how they positioned the text on the right column.

***

Going inside the NYT article itself, we find two line charts that convey the story as told.

Here's the first one:

Nyt_magnificent7_linechart1

They are comparing the most recent stock prices with those from October 12 2022, which is identified as the previous "low". (I'm actually confused by how the most recent "low" is defined, but that's a different subject.)

This chart carries a lot of good information, even though it does not plot "all the data", as in each of the 500 S&P components individually. Over the period under analysis, the average index value has gone up about 35% while the Magnificent Seven's value have skyrocketed by 65% in aggregate. The latter accounted for 30% of the total value at the most recent time point.

If we set the S&P 500 index value in 2024 as 100, then the M7 value in 2024 is 30. After unwinding the 65% growth, the M7 value in October 2022 was 18; the S&P 500 in October 2022 was 74. Thus, the weight of M7 was 24% (18/74) in October 2022, compared to 30% now. Consequently, the weight of the other 473 stocks declined from 76% to 70%.

This isn't even the full story because most of the action within the M7 is in Nvidia, the stock most tightly associated with the current AI hype, as shown in the other line chart.

Nyt_magnificent7_linechart2

Nvidia's value jumped by 430% in that time window. From the treemap, the total current value of M7 is $12.3 b while Nvidia's value is $1.4 b, thus Nvidia is 11.4% of M7 currently. Since M7 is 29% of the total S&P 500, Nvidia is 11.4%*29% = 3% of the S&P. Thus, in 2024, against 100 for the S&P, Nvidia's share is 3. After unwinding the 430% growth, Nvidia's share in October 2022 was 0.6, about 0.8% of 74. Its weight tripled during this period of time.


A nice plot of densities, but what's behind the colors?

I came across this chart by Planet Anomaly that compares air quality across the world's cities (link). The chart is in long form. The top part looks like this:

Visualcapitalist_airqualityinches_top

The bottom part looks like this:

Visualcapitalist_airqualityinches_bottom

You can go to the Visual Capitalist website to see the entire chart.

***

Plots of densities are relatively rare. The metric for air quality is micrograms of fine particulate matter (PM) per cubic meter, so showing densities is natural.

It's pretty clear the cities with the worst air quality at the bottom has a lot more PM in the air than the cleanest cities shown at the top.

This density chart plays looser with the data than our canonical chart types. The perceived densities of dots inside the squares do not represent the actual concentrations of PM. It's certainly not true that in New Delhi, the air is packed tightly with PM.

Further, a random number generator is required to scatter the red dots inside the circle. Thus, different software or designers will make the same chart look a bit different - the densities will be the same but the locations of the dots will not be.

I don't have a problem with this. Do you?

***

Another notable feature of this chart is the double encoding. The same metric is not just presented as densities; it is also encoded in a color scale.

Visualcapitalist_airqualityinches_color_scale

I don't think this adds much.

Both color and density are hard for humans to perceive precisely so adding color does not convey  precision to readers.

The color scale is gradated, so it effectively divided the cities into seven groups. But I don't attach particular significance to the classification. If that is important, it would be clearer to put boxes around the groups of plots. So I don't think the color scale convey clustering to readers effectively.

There is one important grouping which is defined by WHO's safe limit of 5 pg/cubic meter. A few cities pass this test while almost every other place fails. But the design pays no attention to this test, as it uses the same hue on both sides, and even the same tint changes on either side of the limit.

***

Another notable project that shows densities as red dots is this emotional chart by Mona Chalabi about measles, which I wrote about in 2019.

Monachalabi_measles

 


The art of making simple things harder

It's no longer a shock when a TV network such as MSNBC plays loose with the scaling of the column heights, as in this recent example:

Rachelbitecofer_markp_2024candidatescashonhand

Hat tip to Mark P. for forwarding the image, and Rachel for the original tweet.

***

What's shocking is that the designer appears to believe that the column heights of a column chart can be determined without reference to the data.

There is not a single relationship that has been retained on this chart. The designer just picks whatever size column is desired.

One obvious distortion is between the Biden and Trump columns. Trump's number is about 1/3 of Biden's (120 vs 40), and yet the red column's height is 70% of the blue's.

Furthermore, amongst the red columns, the heights are also haphazard. Trump's number is almost 3 times larger than Haley's; the ratio of column heights is almost 4 times. Haley's number is just a tad higher than DeSantis and yet Haley's column is twice the height of DeSantis.

Junkcharts_msnbc_candidatecash_analysis

***

There is a further, subtle distortion of the column's widths. By curving the chart canvas, certain columns are widened more than others. The diagram above retains the distorted widths and you can see that the Desantis column is wider than that of Haley's.

Here is what the undistorted column chart looks like:

Junkcharts_redo_msnbc_candidatecash

It's easy to make such a chart in Excel or any charting software, so it's mystery why this type of distortion happens. Did the designer open up an empty canvas and start putting up columns of any size?