Energy efficiency deserves visual efficiency

Long-time contributor Aleksander B. found a good one, in the World Energy Outlook Report, published by IEA (International Energy Agency).

Iea_balloonchart_emissions

The use of balloons is unusual, although after five minutes, I decided I must do some research to have any hope of understanding this data visualization.

A lot is going on. Below, I trace my own journey through this chart.

The text on the top left explains that the chart concerns emissions and temperature change. The first set of balloons (the grey ones) includes helpful annotations. The left-right position of the balloons indicates time points, in 10-year intervals except for the first.

The trapezoid that sits below the four balloons is more mysterious. It's labelled "median temperature rise in 2100". I debate two possibilities: (a) this trapezoid may serve as the fifth balloon, extending the time series from 2050 to 2100. This interpretation raises a couple of questions: why does the symbol change from balloon to trapezoid? why is the left-right time scale broken? (b) this trapezoid may represent something unrelated to the balloons. This interpretation also raises questions: its position on the horizontal axis still breaks the time series; and  if the new variable is "median temperature rise", then what determines its location on the chart?

That last question is answered if I move my glance all the way to the right edge of the chart where there are vertical axis labels. This axis is untitled but the labels shown in degree Celsius units are appropriate for "median temperature rise".

Turning to the balloons, I wonder what the scale is for the encoded emissions data. This is also puzzling because only a few balloons wear data labels, and a scale is nowhere to be found.

Iea_balloonchart_emissions_legend

The gridlines suggests that the vertical location of the balloons is meaningful. Tracing those gridlines to the right edge leads me back to the Celsius scale, which seems unrelated to emissions. The amount of emissions is probably encoded in the sizes of the balloons although none of these four balloons have any data labels so I'm rather flustered. My attention shifts to the colored balloons, a few of which are labelled. This confirms that the size of the balloons indeed measures the amount of emissions. Nevertheless, it is still impossible to gauge the change in emissions for the 10-year periods.

The colored balloons rising above, way above, the gridlines is an indication that the gridlines may lack a relationship with the balloons. But in some charts, the designer may deliberately use this device to draw attention to outlier values.

Next, I attempt to divine the informational content of the balloon strings. Presumably, the chart is concerned with drawing the correlation between emissions and temperature rise. Here I'm also stumped.

I start to look at the colored balloons. I've figured out that the amount of emissions is shown by the balloon size but I am still unclear about the elevation of the balloons. The vertical locations of these balloons change over time, hinting that they are data-driven. Yet, there is no axis, gridline, or data label that provides a key to its meaning.

Now I focus my attention on the trapezoids. I notice the labels "NZE", "APS", etc. The red section says "Pre-Paris Agreement" which would indicate these sections denote periods of time. However, I also understand the left-right positions of same-color balloons to indicate time progression. I'm completely lost. Understanding these labels is crucial to understanding the color scheme. Clearly, I have to read the report itself to decipher these acronyms.

The research reveals that NZE means "net zero emissions", which is a forecasting scenario - an utterly unrealistic one - in which every country is assumed to fulfil fully its obligations, a sort of best-case scenario but an unattainable optimum. APS and STEPS embed different assumptions about the level of effort countries would spend on reducing emissions and tackling global warming.

At this stage, I come upon another discovery. The grey section is missing any acronym labels. It's actually the legend of the chart. The balloon sizes, elevations, and left-right positions in the grey section are all arbitrary, and do not represent any real data! Surprisingly, this legend does not contain any numbers so it does not satisfy one of the traditional functions of a legend, which is to provide a scale.

There is still one final itch. Take a look at the green section:

Iea_balloonchart_emissions_green

What is this, hmm, caret symbol? It's labeled "Net Zero". Based on what I have been able to learn so far, I associate "net zero" to no "emissions" (this suggests they are talking about net emissions not gross emissions). For some reason, I also want to associate it with zero temperature rise. But this is not to be. The "net zero" line pins the balloon strings to a level of roughly 2.5 Celsius rise in temperature.

Wait, that's a misreading of the chart because the projected net temperature increase is found inside the trapezoid, meaning at "net zero", the scientists expect an increase in 1.5 degrees Celsius. If I accept this, I come face to face with the problem raised above: what is the meaning of the vertical positioning of the balloons? There must be a reason why the balloon strings are pinned at 2.5 degrees. I just have no idea why.

I'm also stealthily presuming that the top and bottom edges of the trapezoids represent confidence intervals around the median temperature rise values. The height of each trapezoid appears identical so I'm not sure.

I have just learned something else about this chart. The green "caret" must have been conceived as a fully deflated balloon since it represents the value zero. Its existence exposes two limitations imposed by the chosen visual design. Bubbles/circles should not be used when the value of zero holds significance. Besides, the use of balloon strings to indicate four discrete time points breaks down when there is a scenario which involves only three buoyant balloons.

***

The underlying dataset has five values (four emissions, one temperature rise) for four forecasting scenarios. It's taken a lot more time to explain the data visualization than to just show readers those 20 numbers. That's not good!

I'm sure the designer did not set out to confuse. I think what happened might be that the design wasn't shown to potential readers for feedback. Perhaps they were shown only to insiders who bring their domain knowledge. Insiders most likely would not have as much difficulty with reading this chart as did I.

This is an important lesson for using data visualization as a means of communications to the public. It's easy for specialists to assume knowledge that readers won't have.

For the IEA chart, here is a list of things not found explicitly on the chart that readers have to know in order to understand it.

  • Readers have to know about the various forecasting scenarios, and their acronyms (APS, NZE, etc.). This allows them to interpret the colors and section titles on the chart, and to decide whether the grey section is missing a scenario label, or is a legend.
  • Since the legend does not contain any scale information, neither for the balloon sizes nor for the temperatures, readers have to figure out the scales on their own. For temperature, they first learn from the legend that the temperature rise information is encoded in the trapezoid, then find the vertical axis on the right edge, notice that this axis has degree Celsius units, and recognize that the Celsius scale is appropriate for measuring median temperature rise.
  • For the balloon size scale, readers must resist the distracting gridlines around the grey balloons in the legend, notice the several data labels attached to the colored balloons, and accept that the designer has opted not to provide a proper size scale.

Finally, I still have several unresolved questions:

  • The horizontal axis may have no meaning at all, or it may only have meaning for emissions data but not for temperature
  • The vertical positioning of balloons probably has significance, or maybe it doesn't
  • The height of the trapezoids probably has significance, or maybe it doesn't

 

 


A graphical compass

A Twitter user pointed me to this article from Washington Post, ruminating about the correlation between gas prices and measures of political sentiment (such as Biden's approval rating or right-track-wrong-track). As common in this genre, the analyst proclaims that he has found something "counter intuitive".

The declarative statement strikes me as odd. In the first two paragraphs, he said the data showed "as gas prices fell, American optimism rose. As prices rose, optimism fell... This seems counterintuitive."

I'm struggling to see what's counterintuitive. Aren't the data suggesting people like lower prices? Is that not what we think people like?

The centerpiece of the article concerns the correlation between metrics. "If two numbers move in concert, they can be depicted literally moving in concert. One goes up, the other moves either up or down consistently." That's a confused statement and he qualifies it by typing "That sort of thing."

He's reacting to the following scatter plot with lines. The Twitter user presumably found it hard to understand. Count me in.

Washingtonpost_gasprices

Why is this chart difficult to grasp?

The biggest puzzle is: what differentiates those two lines? The red and the gray lines are not labelled. One would have to consult the article to learn that the gray line represents the "raw" data at weekly intervals. The red line is aggregated data at monthly intervals. In other words, each red dot is an average of 4 or 5 weekly data points. The red line is just a smoothed version of the gray line. Smoothed lines show the time trend better.

The next missing piece is the direction of time, which can only be inferred by reading the month labels on the red line. But the chart without the direction of time is like a map without a compass. Take this segment for example:

Wpost_gaspricesapproval_directionoftime

If time is running up to down, then approval ratings are increasing over time while gas prices are decreasing. If time is running down to up, then approval ratings are decreasing over time while gas prices are increasing. Exactly the opposite!

The labels on the red line are not sufficient. It's possible that time runs in the opposite direction on the gray line! We only exclude that possibility if we know that the red line is a smoothed version of the gray line.

This type of chart benefits from having a compass. Here's one:

Wpost_gaspricesapproval_compass

It's useful for readers to know that the southeast direction is "good" (higher approval ratings, lower gas prices) while the northwest direction is "bad". Going back to the original chart, one can see that the metrics went in the "bad" direction at the start of the year and has reverted to a "good" direction since.

***

What does this chart really say? The author remarked that "correlation is not causation". "Just because Biden’s approval rose as prices dropped doesn’t mean prices caused the drop."

Here's an alternative: People have general sentiments. When they feel good, they respond more positively to polls, as in they rate everything more positively. The approval ratings are at least partially driven by this general sentiment. The same author apparently has another article saying that the right-track-wrong-track sentiment also moved in tandem with gas prices.

One issue with this type of scatter plot is that it always cues readers to make an incorrect assumption: that the outcome variables (approval rating) is solely - or predominantly - driven by the one factor being visualized (gas prices). This visual choice completely biases the reader's perception.

P.S. [11-11-22] The source of the submission was incorrectly attributed.


Painting the corner

Found an old one sitting in my folder. This came from the Wall Street Journal in 2018.

At first glance, the chart looks like a pretty decent effort.

The scatter plot shows Ebitda against market value, both measured in billions of dollars. The placement of the vertical axis title on the far side is a little unusual.

Ebitda is a measure of business profit (something for a different post on the sister blog: the "b" in Ebitda means "before", and allows management to paint a picture of profits without accounting for the entire cost of running the business). In the financial markets, the market value is claimed to represent a "fair" assessment of the value of the business. The ratio of the market value to Ebitda is known as the "Ebitda multiple", which describes the number of dollars the "market" places on each dollar of Ebitda profit earned by the company.

Almost all scatter plots suffer from xyopia: the chart form encourages readers to take an overly simplistic view in which the market cares about one and only one business metric (Ebitda). The reality is that the market value contains information about Ebitda plus lots of other factors, such as competitors, growth potential, etc.

Consider Alphabet vs AT&T. On this chart, both companies have about $50 billion in Ebitda profits. However, the market value of Alphabet (Google's mother company) is about four times higher than that of AT&T. This excess valuation has nothing to do with profitability but partly explained by the market's view that Google has greater growth potential.

***

Unusually, the desginer chose not to utilize the log scale. The right side of the following display is the same chart with a log horizontal axis.

The big market values are artificially pulled into the middle while the small values are plied apart. As one reads from left to right, the same amount of distance represents more and more dollars. While all data visualization books love log scales, I am not a big fan of it. That's because the human brain doesn't process spatial information this way. We don't tend to think in terms of continuously evolving scales. Thus, presenting the log view causes readers to underestimate large values and overestimate small differences.

Now let's get to the main interest of this chart. Notice the bar chart shown on the top right, which by itself is very strange. The colors of the bar chart is coordinated with those on the scatter plot, as the colors divide the companies into two groups; "media" companies (old, red), and tech companies (new, orange).

Scratch that. Netflix is found in the scatter plot but with a red color while AT&T and Verizon appear on the scatter plot as orange dots. So it appears that the colors mean different things on different plots. As far as I could tell, on the scatter plot, the orange dots are companies with over $30 billion in Ebitda profits.

At this point, you may have noticed the stray orange dot. Look carefully at the top right corner, above the bar chart, and you'll find the orange dot representing Apple. It is by far the most important datum, the company that has the greatest market value and the largest Ebitda.

I'm not sure burying Apple in the corner was a feature or a bug. It really makes little sense to insert the bar chart where it is, creating a gulf between Apple and the rest of the companies. This placement draws the most attention away from the datum that demands the most attention.

 

 

 


Finding the right context to interpret household energy data

Bloomberg_energybillBloomberg's recent article on surging UK household energy costs, projected over this winter, contains data about which I have long been intrigued: how much energy does different household items consume?

A twitter follower alerted me to this chart, and she found it informative.

***
If the goal is to pick out the appliances and estimate the cost of running them, the chart serves its purpose. Because the entire set of data is printed, a data table would have done equally well.

I learned that the mobile phone costs almost nothing to charge: 1 pence for six hours of charging, which is deemed a "single use" which seems double what a full charge requires. The games console costs 14 pence for a "single use" of two hours. That might be an underestimate of how much time gamers spend gaming each day.

***

Understanding the design of the chart needs a bit more effort. Each appliance is measured by two metrics: the number of hours considered to be "single use", and a currency value.

It took me a while to figure out how to interpret these currency values. Each cost is associated with a single use, and the duration of a single use increases as we move down the list of appliances. Since the designer assumes a fixed cost of electicity (shown in the footnote as 34p per kWh), at first, it seems like the costs should just increase from top to bottom. That's not the case, though.

Something else is driving these numbers behind the scene, namely, the intensity of energy use by appliance. The wifi router listed at the bottom is turned on 24 hours a day, and the daily cost of running it is just 6p. Meanwhile, running the fridge and freezer the whole day costs 41p. Thus, the fridge&freezer consumes electricity at a rate that is almost 7 times higher than the router.

The chart uses a split axis, which artificially reduces the gap between 8 hours and 24 hours. Here is another look at the bottom of the chart:

Bloomberg_energycost_bottom

***

Let's examine the choice of "single use" as a common basis for comparing appliances. Consider this:

  • Continuous appliances (wifi router, refrigerator, etc.) are denoted as 24 hours, so a daily time window is also implied
  • Repeated-use appliances (e.g. coffee maker, kettle) may be run multiple times a day
  • Infrequent use appliances may be used less than once a day

I prefer standardizing to a "per day" metric. If I use the microwave three times a day, the daily cost is 3 x 3p = 9 p, which is more than I'd spend on the wifi router, run 24 hours. On the other hand, I use the washing machine once a week, so the frequency is 1/7, and the effective daily cost is 1/7 x 36 p = 5p, notably lower than using the microwave.

The choice of metric has key implications on the appearance of the chart. The bubble size encodes the relative energy costs. The biggest bubbles are in the heating category, which is no surprise. The next largest bubbles are tumble dryer, dishwasher, and electric oven. These are generally not used every day so the "per day" calculation would push them lower in rank.

***

Another noteworthy feature of the Bloomberg chart is the split legend. The colors divide appliances into five groups based on usage category (e.g. cleaning, food, utility). Instead of the usual color legend printed on a corner or side of the chart, the designer spreads the category labels around the chart. Each label is shown the first time a specific usage category appears on the chart. There is a presumption that the reader scans from top to bottom, which is probably true on average.

I like this arrangement as it delivers information to the reader when it's needed.

 

 

 


Where have the graduates gone?

Someone submitted this chart on Twitter as an example of good dataviz.

Washingtonpost_aftercollege

The chart shows the surprising leverage colleges have on where students live after graduation.

The primary virtue of this chart is conservation of space. If our main line of inquiry is the destination states of college graduations - by state, then it's hard to beat this chart's efficiency at delivering this information. For each state, it's easy to see what proportion of graduates leave the state after graduation, and then within those who leave, the reader can learn which are the most popular destination states, and their relative importance.

The colors link the most popular destination states (e.g. Texas in orange) but they are not enough because the designer uses state labels also. A next set of states are labeled without being differentiated by color. In particular, New York and Massachusetts share shades of blue, which also is the dominant color on the left side.

***

The following is a draft of a concept I have in my head.

Junkcharts_redo_washpost_postgraddestinations_1

I imagine this to be a tile map. The underlying data are not public so I just copied down a bunch of interesting states. This view brings out the spatial information, as we expect graduates are moving to neighboring states (or the states with big cities).

The students in the Western states are more likely to stay in their own state, and if they move, they stay in the West Coast. The graduates in the Eastern states also tend to stay nearby, except for California.

I decided to use groups of color - blue for East, green for South, red for West. Color is a powerful device, if used well. If the reader wants to know which states send graduates to New York, I'm hoping the reader will see the chart this way:

Junkcharts_redo_washpost_postgraddestinations_2

 


Trying too hard

Today, I return to the life expectancy graphic that Antonio submitted. In a previous post, I looked at the bumps chart. The centerpiece of that graphic is the following complicated bar chart.

Aburto_covid_lifeexpectancy

Let's start with the dual axes. On the left, age, and on the right, year of birth. I actually like this type of dual axes. The two axes present two versions of the same scale so the dual axes exist without distortion. It just allows the reader to pick which scale they want to use.

It baffles me that the range of each bar runs from 2.5 years to 7.5 years or 7.5 years to 2.5 years, with 5 or 10 years situated in the middle of each bar.

Reading the rest of the chart is like unentangling some balled up wires. The author has created a statistical model that attributes cause of death to male life expectancy in such a way that you can take the difference in life expectancy between two time points, and do a kind of waterfall analysis in which each cause of death either adds to or subtracts from the prior life expectancy, with the sum of these additions and substractions leading to the end-of-period life expectancy.

The model is complicated enough, and the chart doesn't make it any easier.

The bars are rooted at the zero value. The horizontal axis plots addition or substraction to life expectancy, thus zero represents no change during the period. Zero does not mean the cause of death (e.g. cancer) does not contribute to life expectancy; it just means the contribution remains the same.

The changes to life expectancy are shown in units of months. I'd prefer to see units of years because life expectancy is almost always given in years. Using years turn 2.5 months into 0.2 years which is a fraction, but it allows me to see the impact on the reported life expectancy without having to do a month-to-year conversion.

The chart highlights seven causes of death with seven different colors, plus gray for others.

What really does a number on readers is the shading, which adds another layer on top of the hues. Each color comes in one of two shading, referencing two periods of time. The unshaded bar segments concern changes between 2010 and "2019" while the shaded segments concern changes between "2019" and 2020. The two periods are chosen to highlight the impact of COVID-19 (the red-orange color), which did not exist before "2019".

Let's zoom in on one of the rows of data - the 72.5 to 77.5 age group.

Screen Shot 2022-09-14 at 1.06.59 PM

COVID-19 (red-orange) has a negative impact on life expectancy and that's the easy one to see. That's because COVID-19's contribution as a cause of death is exactly zero prior to "2019". Thus, the change in life expectancy is a change from zero. This is not how we can interpret any of the other colors.

Next, we look at cancer (blue). Since this bar segment sits on the right side of zero, cancer has contributed positively to change in life expectancy between 2010 and 2020. Practically, that means proportionally fewer people have died from cancer. Since the lengths of these bar segments correspond to the relative value, not absolute value, of life expectancy, longer bars do not necessarily indicate more numerous deaths.

Now the blue segment is actually divided into two parts, the shaded and not shaded. The not-shaded part is for the period "2019" to 2020 in the first year of the COVID-19 pandemic. The shaded part is for the period 2010 to "2019". It is a much wider span but it also contains 9 years of changes versus "1 year" so it's hard to tell if the single-year change is significantly different from the average single-year change of the past 9 years. (I'm using these quotes because I don't know whether they split the year 2019 in the middle since COVID-19 didn't show up till the end of that year.)

Next, we look at the yellow-brown color correponding to CVD. The key feature is that this block is split into two parts, one positive, one negative. Prior to "2019", CVD has been contributing positively to life expectancy changes while after "2019", it has contributed negatively. This observation raises some questions: why would CVD behave differently with the arrival of the pandemic? Are there data problems?

***

A small multiples design - splitting the period into two charts - may help here. To make those two charts comparable, I'd suggest annualizing the data so that the 9-year numbers represent the average annual values instead of the cumulative values.

 

 


Speedometer charts: love or hate

Pie chart hate is tired. In this post, I explain my speedometer hate. (Also called gauges,  dials)

Next to pie charts, speedometers are perhaps the second most beloved chart species found on business dashboards. Here is a typical example:

Speedometers_example

 

For this post, I found one on Reuters about natural gas in Europe. (Thanks to long-time contributor Antonio R. for the tip.)

Eugas_speedometer

The reason for my dislike is the inefficiency of this chart form. In classic Tufte-speak, the speedometer chart has a very poor data-to-ink ratio. The entire chart above contains just one datum (73%). Most of the ink are spilled over non-data things.

This single number has a large entourage:

- the curved axis
- ticks on the axis
- labels on the scale
- the dial
- the color segments
- the reference level "EU target"

These are not mere decorations. Taking these elements away makes it harder to understand what's on the chart.

Here is the chart without the curved axis:

Redo_eugas_noaxis

Here is the chart without axis labels:

Redo_eugas_noaxislabels

Here is the chart without ticks:

Redo_eugas_notickmarks

When the tick labels are present, the chart still functions.

Here is the chart without the dial:

Redo_eugas_nodial

The datum is redundantly encoded in the color segments of the "axis".

Here is the chart without the dial or the color segments:

Redo_eugas_nodialnosegments

If you find yourself stealing a peek at the chart title below, you're not alone.

All versions except one increases our cognitive load. This means the entourage is largely necessary if one encodes the single number in a speedometer chart.

The problem with the entourage is that readers may resort to reading the text rather than the chart.

***

The following is a minimalist version of the Reuters chart:

Redo_eugas_onedial

I removed the axis labels and the color segments. The number 73% is shown using the dial angle.

The next chart adds back the secondary message about the EU target, as an axis label, and uses color segments to show the 73% number.

Redo_eugas_nodialjustsegments

Like pie charts, there are limited situations in which speedometer charts are acceptable. But most of the ones we see out there are just not right.

***

One acceptable situation is to illustrate percentages or proportions, which is what the EU gas chart does. Of course, in that situation, one can alo use a pie chart without shame.

For illustrating proportions, I prefer to use a full semicircle, instead of the circular sector of arbitrary angle as Reuters did. The semicircle lends itself to easy marks of 25%, 50%, 75%, etc, eliminating the need to print those tick labels.

***

One use case to avoid is numeric data.

Take the regional sales chart pulled randomly from a Web search above:

Speedometers_example

These charts are completely useless without the axis labels.

Besides, because the span of the axis isn't 0% to 100%, every tick mark must be labelled with the numeric value. That's a lot of extra ink used to display a single value!


Another reminder that aggregate trends hide information

The last time I looked at the U.S. employment situation, it was during the pandemic. The data revealed the deep flaws of the so-called "not in labor force" classification. This classification is used to dehumanize unemployed people who are declared "not in labor force," in which case they are neither employed nor unemployed -- just not counted at all in the official unemployment (or employment) statistics.

The reason given for such a designation was that some people just have no interest in working, or even looking for a job. Now they are not merely discouraged - as there is a category of those people. In theory, these people haven't been looking for a job for so long that they are no longer visible to the bean counters at the Bureau of Labor Statistics.

What happened when the pandemic precipitated a shutdown in many major cities across America? The number of "not in labor force" shot up instantly, literally within a few weeks. That makes a mockery of the reason for such a designation. See this post for more.

***

The data we saw last time was up to April, 2020. That's more than two years old.

So I have updated the charts to show what has happened in the last couple of years.

Here is the overall picture.

Junkcharts_unemployment_notinLFparttime_all_2

In this new version, I centered the chart at the 1990 data. The chart features two key drivers of the headline unemployment rate - the proportion of people designated "invisible", and the proportion of those who are considered "employed" who are "part-time" workers.

The last two recessions have caused structural changes to the labor market. From 1990 to late 2000s, which included the dot-com bust, these two metrics circulated within a small area of the chart. The Great Recession of late 2000s led to a huge jump in the proportion called "invisible". It also pushed the proportion of part-timers to all0time highs. The proportion of part-timers has fallen although it is hard to interpret from this chart alone - because if the newly invisible were previously part-time employed, then the same cause can be responsible for either trend.

_numbersense_bookcoverReaders of Numbersense (link) might be reminded of a trick used by school deans to pump up their US News rankings. Some schools accept lots of transfer students. This subpopulation is invisible to the US News statisticians since they do not factor into the rankings. The recent scandal at Columbia University also involves reclassifying students (see this post).

Zooming in on the last two years. It appears that the pandemic-related unemployment situation has reversed.

***

Let's split the data by gender.

American men have been stuck in a negative spiral since the 1990s. With each recession, a higher proportion of men are designated BLS invisibles.

Junkcharts_unemployment_notinLFparttime_men_2

In the grid system set up in this scatter plot, the top right corner is the worse of all worlds - the work force has shrunken and there are more part-timers among those counted as employed. The U.S. men are not exiting this quadrant any time soon.

***
What about the women?

Junkcharts_unemployment_notinLFparttime_women_2

If we compare 1990 with 2022, the story is not bad. The female work force is gradually reaching the same scale as in 1990 while the proportion of part-time workers have declined.

However, celebrating the above is to ignore the tremendous gains American women made in the 1990s and 2000s. In 1990, only 58% of women are considered part of the work force - the other 42% are not working but they are not counted as unemployed. By 2000, the female work force has expanded to include about 60% with similar proportions counted as part-time employed as in 1990. That's great news.

The Great Recession of the late 2000s changed that picture. Just like men, many women became invisible to BLS. The invisible proportion reached 44% in 2015 and have not returned to anywhere near the 2000 level. Fewer women are counted as part-time employed; as I said above, it's hard to tell whether this is because the women exiting the work force previously worked part-time.

***

The color of the dots in all charts are determined by the headline unemployment number. Blue represents low unemployment. During the 1990-2022 period, there are three moments in which unemployment is reported as 4 percent or lower. These charts are intended to show that an aggregate statistic hides a lot of information. The three times at which unemployment rate reached historic lows represent three very different situations, if one were to consider the sizes of the work force and the number of part-time workers.

 

P.S. [8-15-2022] Some more background about the visualization can be found in prior posts on the blog: here is the introduction, and here's one that breaks it down by race. Chapter 6 of Numbersense (link) gets into the details of how unemployment rate is computed, and the implications of the choices BLS made.

P.S. [8-16-2022] Corrected the axis title on the charts (see comment below). Also, added source of data label.


Funnels and scatters

I took a peek at some of the work submitted by Ray Vella's students in his NYU dataviz class recently.

The following chart by Hosanah Bryan caught my eye:

Rich Get Richer_Hosanah Bryan (v2)

The data concern the GDP gap between rich and poor regions in various countries. In some countries, especially in the U.K., the gap is gigantic. In other countries, like Spain and Sweden, the gap is much smaller.

The above chart uses a funnel metaphor to organize the data, although the funnel does not add more meaning (not that it has to). Between that, the color scheme and the placement of text, it's visually clean and pleasant to look at.

The data being plotted are messy. They are not actual currency values of GDP. Each number is an index, and represents the relative level of the GDP gap in a given year and country. The gap being shown by the colored bars are differences in these indices 15 years apart. (The students were given this dataset to work with.)

So the chart is very hard to understand if one focuses on the underlying data. Nevertheless, the same visual form can hold other datasets which are less complicated.

One can nitpick about the slight misrepresentation of the values due to the slanted edges on both sides of the bars. This is yet another instance of the tradeoff between beauty and precision.

***

The next chart by Liz Delessert engages my mind for a different reason.

The Rich Get Richerv2

The scatter plot sets up four quadrants. The top right is "everyone gets richer". The top left, where most of the dots lie, is where "the rich get richer, the poor get poorer".  This chart shows a thoughtfulness about organizing the data, and the story-telling.

The grid setup cues readers toward a particular way of looking at the data.

But power comes with responsibility. Such scatter plots are particularly susceptible to the choice of data, in this case, countries. It is tempting to conclude that there are no countries in which everyone gets poorer. But that statement more likely tells us more about which countries were chosen than the real story.

I like to see the chart applied to other data transformations that are easier. For example, we can start with the % change in GDP computed separately for rich and for poor. Then we can form a ratio of these two percent changes.

 

 


Who trades with Sweden

It's great that the UN is publishing dataviz but it can do better than this effort:

Untradestats_sweden

Certain problems are obvious. The country names turned sideways. The meaningless use of color. The inexplicable sequencing of the country/region.

Some problems are subtler. "Area, nes" - upon research - is a custom term used by UN Trade Statistics, meaning "not elsewhere specified".

The gridlines are debatable. Their function is to help readers figure out the data values if they care. The design omitted the top and bottom gridlines, which makes it hard to judge the values for USA (dark blue), Netherlands (orange), and Germany (gray).

See here, where I added the top gridline.

Redo_untradestats_sweden_gridline

Now, we can see this value is around 3.6, just over the halfway point between gridlines.

***

A central feature of trading statistics is "balance". The following chart makes it clear that the positive numbers outweigh the negative numbers in the above chart.

Redo_untradestats_sweden

At the time I made the chart, I wasn't sure how to interpret the gap of 1.3%. Looking at the chart again, I think it's saying Sweden has a trade surplus equal to that amount.