Fantastic auto show from the Bloomberg crew

I really enjoyed the charts in this Bloomberg feature on the state of Japanese car manufacturers in the Southeast Asian and Chinese markets (link). This article contains five charts, each of which is both engaging and well-produced.

***

Each chart has a clear message, and the visual display is clearly adapted for purpose.

The simplest chart is the following side-by-side stacked bar chart, showing the trend in share of production of cars:

Bloomberg_japancars_production

Back in 1998, Japan was the top producer, making about 22% of all passenger cars in the world. China did not have much of a car industry. By 2023, China has dominated global car production, with almost 40% of share. Japan has slipped to second place, and its share has halved.

The designer is thoughtful about each label that is placed on the chart. If something is not required to tell the story, it's not there. Consistently across all five charts, they code Japan in red, and China in a medium gray color. (The coloring for the rest of the world is a bit inconsistent; we'll get to that later.)

Readers may misinterpret the cause of this share shift if this were the only chart presented to them. By itself, the chart suggests that China simply "stole" share from Japan (and other countries). What is true is that China has invested in a car manufacturing industry. A more subtle factor is that the global demand for cars has grown, with most of the growth coming from the Chinese domestic market and other emerging markets - and many consumers favor local brands. Said differently, the total market size in 2023 is much higher than that in 1998.

***

Bloomberg also made a chart that shows market share based on demand:

Bloomberg_japancars_marketshares

This is a small-multiples chart consisting of line charts. Each line chart shows market share trends in five markets (China and four Southeast Asian nations) from 2019 to 2024. Take the Chinese market for example. The darker gray line says Chinese brands have taken 20 percent additional market share since 2019; note that the data series is cumulative over the entire window. Meanwhile, brands from all other countries lost market share, with the Japanese brands (in red) losing the most.

The numbers are relative, which means that the other brands have not necessarily suffered declines in sales. This chart by itself doesn't tell us what happened to sales; all we know is the market shares of brands from different countries relative to their baseline market share in 2019. (Strange period to pick out as it includes the entire pandemic.)

The designer demonstrates complete awareness of the intended message of the chart. The lines for Chinese and Japanese brands were bolded to highlight the diverging fortunes, not just in China, but also in Southeast Asia, to various extents.

On this chart, the designer splits out US and German brands from the rest of the world. This is an odd decision because the categorization is not replicated in the other four charts. Thus, the light gray color on this chart excludes U.S. and Germany while the same color on the other charts includes them. I think they could have given U.S. and Germany their own colors throughout.

***

The primacy of local brands is hinted at in the following chart showing how individual brands fared in each Southeast Asian market:

Bloomberg_japancars_seasiamarkets

 

This chart takes the final numbers from the line charts above, that is to say, the change in market share from 2019 to 2024, but now breaks them down by individual brand names. As before, the red bubbles represent Japanese brands, and the gray bubbles Chinese brands. The American and German brands are lumped in with the rest of the world and show up as light gray bubbles.

I'll discuss this chart form in a next post. For now, I want to draw your attention to the Malaysia market which is the last row of this chart.

What we see there are two dominant brands (Perodua, Proton), both from "rest of the world" but both brands are Malaysian. These two brands are the biggest in Malaysia and they account for two of the three highest growing brands there. The other high-growth brand is Chery, which is a Chinese brand; even though it is growing faster, its market share is still much smaller than the Malaysian brands, and smaller than Toyota and Honda. Honda has suffered a lot in this market while Toyota eked out a small gain.

The impression given by this bubble chart is that Chinese brands have not made much of a dent in Malaysia. But that would not be correct, if we believe the line chart above. According to the line chart, Chinese brands roughly earned the same increase in market share (about 3%) as "other" brands.

What about the bubble chart might be throwing us off?

It seems that the Chinese brands were starting from zero, thus the growth is the whole bubble. For the Malaysian brands, the growth is in the outer ring of the bubbles, and the larger the bubble, the thinner is the ring. Our attention is dominated by the bubble size which represents a snapshot in the ending year, providing no information about the growth (which is shown on the horizontal axis).

***

For more discussion of Bloomberg graphics, see here.


Aligning V and Q by way of D

In the Trifecta Checkup (link), there is a green arrow between the Q (question) and V (visual) corners, indicating that they should align. This post illustrates what I mean by that.

I saw the following chart in a Washington Post article comparing dairy milk and plant-based "milks".

Vitamins

The article contains a whole series of charts. The one shown here focuses on vitamins.

The red color screams at the reader. At first, it appears to suggest that dairy milk is a standout on all four categories of vitamins. But that's not what the data say.

Let's take a look at the chart form: it's a grid of four plots, each containing one square for each of four types of "milk". The data are encoded in the areas of the squares. The red and green colors represent category labels and do not reflect data values.

Whenever we make bubble plots (the closest relative of these square plots), we have to solve a scale problem. What is the relationship between the scales of the four plots?

I noticed the largest square is the same size across all four plots. So, the size of each square is made relative to the maximum value in each plot, which is assigned a fixed size. In effect, the data encoding scheme is that the areas of the squares show the index values relative to the group maximum of each vitamin category. So, soy milk has 72% as much potassium as dairy milk while oat and almond milks have roughly 45% as much as dairy.

The same encoding scheme is applied also to riboflavin. Oat milk has the most riboflavin, so its square is the largest. Soy milk is 80% of oat, while dairy has 60% of oat.

***

_trifectacheckup_imageLet's step back to the Trifecta Checkup (link). What's the question being asked in this chart? We're interested in the amount of vitamins found in plant-based milk relative to dairy milk. We're less interested in which type of "milk" has the highest amount of a particular vitamin.

Thus, I'd prefer the indexing tied to the amount found in dairy milk, rather than the maximum value in each category. The following set of column charts show this encoding:

Junkcharts_redo_msn_dairyplantmilks_2

I changed the color coding so that blue columns represent higher amounts than dairy while yellow represent lower.

From the column chart, we find that plant-based "milks" contain significantly less potassium and phosphorus than dairy milk while oat and soy "milks" contain more riboflavin than dairy. Almond "milk" has negligible amounts of riboflavin and phosphorus. There is vritually no difference between the four "milk" types in providing vitamin D.

***

In the above redo, I strengthen the alignment of the Q and V corners. This is accomplished by making a stop at the D corner: I change how the raw data are transformed into index values. 

Just for comparison, if I only change the indexing strategy but retain the square plot chart form, the revised chart looks like this:

Junkcharts_redo_msn_dairyplantmilks_1

The four squares showing dairy on this version have the same size. Readers can evaluate the relative sizes of the other "milk" types.


The curse of dimensions

Usually the curse of dimensions concerns data with many dimensions. But today I want to talk about a different kind of curse. This is the curse of dimensions in mapping.

We are only talking about a few dimensions, typically between 3 and 6, so small number of dimensions. And yet it's already a curse. Maps are typically drawn in two dimensions. Those two dimensions are usually spoken for: they show the x- and y-coordinate of space. If we want to include a third, fourth or fifth dimension of data on the map, we have to appeal to colors, shapes, and so on. Cartographers have long realized that adding dimensions involves tradeoffs.

***

Andrew featured some colored bubble maps in a recent post. Here is one example:

Dorlingmap_percenthispanic

The above map shows the proportion of population in each U.S. county that is Hispanic. Each county is represented by a bubble pinned to the centroid of the county. The color of the bubble shows the data, divided into demi-deciles so they are using a equal-width binning method. The size of a bubble indicates the size of a county.

The map is sometimes called a "Dorling map" after its presumptive original designer.

I'm going to use this map to explore the curse of dimensions.

***

It's clear from the design that county-level details are regarded as extremely important. As there are about 3,000 counties in the U.S., I don't see how any visual design can satisfy this requirement without giving up clarity.

More details require more objects, which spread readers' attention. More details contain more stories, but that too dilutes their focus.

Another principle of this map is to not allow bubbles to overlap. Of course, having bubbles overlap or print on top of one another is a visual faux pas. But to prevent such behavior on this particular design means the precise locations are sacrificed. Consider the eastern seaboard where there are densely populated counties: they are not pinned to their centroids. Instead, the counties are pushed out of their normal positions, similar to making a cartogram.

I remarked at the start – erroneously but deliberately – that each bubble is centered at the centroid of each county. I wonder how many of you noticed the inaccuracy of that statement. If that rule were followed, then the bubbles in New England would have overlapped and overprinted. 

This tradeoff affects how we perceive regional patterns, as all the densely populated regions are bent out of shape.

Another aspect of the data that the designer treats as important is county population, or rather relative county population. Relative – because bubble size don't portray absolutes, plus the designer didn't bother to provide a legend to decipher bubble sizes.

The tradeoff is location. The varying bubble sizes, coupled with the previous stipulation of no overlapping, push bubbles from their proper centroids. This forced displacement disproportionately affects larger counties.

***

What if we are willing to sacrifice county-level details?

In this setting, we are not obliged to show every single county. One alternative is to perform spatial smoothing. Intuitively, think about the following steps: plot all these bubbles in their precise locations, turn the colors slightly transparent, let them overlap, blend away the edges, and then we have a nice picture of where the Hispanic people are located.

I have sacrificed the county-level details but the regional pattern becomes much clearer, and we don't need to deviate from the well-understood shape of the standard map.

This version reminds me of the language maps that Josh Katz made.

Joshkatz_languagemap

Here is an old post about these maps.

This map design only reduces but does not eliminate the geographical inaccuracy. It uses the same trick as the Dorling map: the "vertical" density of population has been turned into "horizontal" span. It's a bit better because the centroids are not displaced.

***

Which map is better depends on what tradeoffs one is making. In the above example, I'd have made different choices.

 

One final thing – it's minor but maybe not so minor. Most of the bubbles on the map especially in the middle are tiny; as most of them have Hispanic proportions that are on the left side of the scale, they should be showing light orange. However, all of them appear darker than they ought to be. That's because each bubble has a dark border. For small bubbles, the ratio of ink on the border is a high proportion of the ink for the entire object.


Two metrics in-fighting

The Wall Street Journal shows the following chart which pits two metrics against each other:

Wsj_salaries25to29

The primary metric is the change in median yearly salary between the two periods of time. We presume it's primary because of its presence in the chart title, and the blue bars being more readable than the green bubbles. The secondary metric is the median yearly salary in the later period.

That, I believe, was the intended design. When I saw this chart, my eyes went to the numbers inside the green bubbles. Perhaps it's because I didn't read the chart title first, and the horizontal axis wasn't labelled so it wasn't obvious what the blue bars coded.

As with most bubble charts, the data labels exist to cover up the inadequacy of circular areas. The self-sufficiency test - removing the data labels - shows this well:

Redo_wsj_salaries25to29

It's simply impossible to know what values should be in each bubble, or to perceive the relative sizes of those bubbles.

***

Reversing the order of the blue bars also helps:

Redo_wsjsalaries25to29_2

The original order is one of the more annoying features in most visualization packages. Because internally, the categories are numbered 1, 2, 3, ..., and because the convention is to have values run higher as they run up the vertical axis, these packages would place the top-ranked item at the bottom of the chart.

Most people read top to bottom, which means that they read the least important item first, and the most important item last!

In most visualization packages, it takes only 1 click or 1 action to reverse the order of the items. Please do it!

***

For change over time, I like using a Bumps chart, otherwise called a slope graph:

Redo_wsjsalaries25to29_3


What is the question is the question

I picked up a Fortune magazine while traveling, and saw this bag of bubbles chart.

Fortune_global500 copy

This chart is visually appealing, that must be said. Each circle represents the reported revenues of a corporation that belongs to the “Global 500 Companies” list. It is labeled by the location of the company’s headquarters. The largest bubble shows Beijing, the capital of China, indicating that companies based in Beijing count $6 trillion dollars of revenues amongst them. The color of the bubbles show large geographical units; the red bubbles are cities in Greater China.

I appreciate a couple of the design decisions. The chart title and legend are placed on the top, making it easy to find one’s bearing – effective while non-intrusive. The labeling signals a layering: the first and biggest group have icons; the second biggest group has both name and value inside the bubbles; the third group has values inside the bubbles but names outside; the smallest group contains no labels.

Note the judgement call the designer made. For cities that readers might not be familiar with, a country name (typically abbreviated) is added. This is a tough call since mileage varies.

***

As I discussed before (link), the bag of bubbles does not elevate comprehension. Just try answering any of the following questions, which any of us may have, using just the bag of bubbles:

  • What proportion of the total revenues are found in Beijing?
  • What proportion of the total revenues are found in Greater China?
  • What are the top 5 cities in Greater China?
  • What are the ranks of the six regions?

If we apply the self-sufficiency test and remove all the value labels, it’s even harder to figure out what’s what.

***

_trifectacheckup_image

Moving to the D corner of the Trifecta Checkup, we aren’t sure how to interpret this dataset. It’s unclear if these companies derive most of their revenues locally, or internationally. A company headquartered in Washington D.C. may earn most of its revenues in other places. Even if Beijing-based companies serve mostly Chinese customers, only a minority of revenues would be directly drawn from Beijing. Some U.S. corporations may choose its headquarters based on tax considerations. It’s a bit misleading to assign all revenues to one city.

As we explore this further, it becomes clear that the designer must establish a target – a strong idea of what question s/he wants to address. The Fortune piece comes with a paragraph. It appears that an important story is the spatial dispersion of corporate revenues in different countries. They point out that U.S. corporate HQs are more distributed geographically than Chinese corporate HQs, which tend to be found in the key cities.

There is a disconnect between the Question and the Data used to create the visualization. There is also a disconnect between the Question and the Visual display.


One bubble is a tragedy, and a bag of bubbles is...

From Kathleen Tyson's twitter account, I came across a graphic showing the destinations of Ukraine's grain exports since 2022 under the auspices of a UN deal. This graphic, made by AFP, uses one of the chart forms that baffle me - the bag of bubbles.

Ukraine_grains_bubbles

The first trouble with a bag of bubbles is the single bubble. The human brain is just not fit for comparing bubble sizes. The self-sufficiency test is my favorite device for demonstrating this weakness. The following is the European section of the above chart, with the data labels removed.

Redo_junkcharts_afp_ukrainegrains_europe_1

How much bigger is Spain than the Netherlands? What's the difference between Italy and the Netherlands? The answers don't come easily to mind. (The Netherlands is about 40% the size of Spain, and Italy is about 20% larger than the Netherlands.)

While comparing relative circular areas is a struggle, figuring out the relative ranks is not. Sure, it gets tougher with small differences (Germany vs S. Korea, Belgium vs Portugal) but saying those pairs are tied isn't a tragedy.

***

Another issue with bubble charts is how difficult it is to assess absolute values. A circle on its own has no reference point. The designer needs to add data labels or a legend. Adding data labels is an act of giving up. The data labels become the primary instrument for communicating the data, not the visual construct. Adding one data label is not enough, as the following shows:

Redo_junkcharts_afpukrainegrains_2

Being told that Spain's value is 4.1 does little to help estimate the values for the non-labelled bubbles.

The chart does come with the following legend:

Afp_ukrianegrains_legend

For this legend to work, the sample bubble sizes should span the range of the data. Notice that it's difficult to extrapolate from the size of the 1-million-ton bubble to 2-million, 4-million, etc. The analogy is a column chart in which the vertical axis does not extend through the full range of the dataset.

The designer totally gets this. The chart therefore contains both selected data labels and the partial legend. Every bubble larger than 1 million tons has an explicit data label. That's one solution for the above problem.

Nevertheless, why not use another chart form that avoids these problems altogether?

***

In Tyson's tweet, she showed another chart that pretty much contains the same information, this one from TASS.

Ukraine_grains_flows

This chart uses the flow diagram concept - in an abstract way, as I explained in previous post.

This chart form imposes structure on the data. The relative ranks of the countries within each region are listed from top to bottom. The relative amounts of grains are shown in black columns (and also in the thickness of the flows).

The aggregate value of movements within each region is called out in that middle section. It is impossible to learn this from the bag of bubbles version.

The designer did print the entire dataset onto this chart (except for the smallest countries grouped together as "other"). This decision takes away from the power of the underlying flow chart. Instead of thinking about the proportional representation of each country within its respective region, or the distribution of grains among regions, our eyes hone in on the data labels.

This brings me back to the principle of self-sufficiency: if we expect readers to consume the data labels - which comprise the entire dataset, why not just print a data table? If we decide to visualize, make the visual elements count!


Showing both absolute and relative values on the same chart 1

Visual Capitalist has a helpful overview on the "uninsured" deposits problem that has become the talking point of the recent banking crisis. Here is a snippet of the chart that you can see in full at this link:

Visualcapitalist_uninsureddeposits_top

This is in infographics style. It's a bar chart that shows the top X banks. Even though the headline says "by uninsured deposits", the sort order is really based on the proportion of deposits that are uninsured, i.e. residing in accounts that exceed $250K.  They used a red color to highlight the two failed banks, both of which have at least 90% of deposits uninsured.

The right column provides further context: the total amounts of deposits, presented both as a list of numbers as well as a column of bubbles. As readers know, bubbles are not self-sufficient, and if the list of numbers were removed, the bubbles lost most of their power of communication. Big, small, but how much smaller?

There are little nuggets of text in various corners that provide other information.

Overall, this is a pretty good one as far as infographics go.

***

I'd prefer to elevate information about the Too Big to Fail banks (which are hiding in plain sight). Addressing this surfaces the usual battle between relative and absolute values. While the smaller banks have some of the highest concentrations of uninsured deposits, each TBTF bank has multiples of the absolute dollars of uninsured deposits as the smaller banks.

Here is a revised version:

Redo_visualcapitalist_uninsuredassets_1

The banks are still ordered in the same way by the proportions of uninsured value. The data being plotted are not the proportions but the actual deposit amounts. Thus, the three TBTF banks (Citibank, Chase and Bank of America) stand out of the crowd. Aside from Citibank, the other two have relatively moderate proportions of uninsured assets but the sizes of the red bars for any of these three dominate those of the smaller banks.

Notice that I added the gray segments, which portray the amount of deposits that are FDIC protected. I did this not just to show the relative sizes of the banks. Having the other part of the deposits allow readers to answer additional questions, such as which banks have the most insured deposits? They also visually present the relative proportions.

***

The most amazing part of this dataset is the amount of uninsured money. I'm trying to think who these account holders are. It would seem like a very small collection of people and/or businesses would be holding these accounts. If they are mostly businesses, is FDIC insurance designed to protect business deposits? If they are mostly personal accounts, then surely only very wealthy individuals hold most of these accounts.

In the above chart, I'm assuming that deposits and assets are referring to the same thing. This may not be the correct interpretation. Deposits may be only a portion of the assets. It would be strange though that the analysts only have the proportions but not the actual deposit amounts at these banks. Nevertheless, until proven otherwise, you should see my revision as a sketch - what you can do if you have both the total deposits and the proportions uninsured.


Lay off bubbles

Wall Street Journal says that the scale of layoffs in the tech industry recently is worse than those caused by the pandemic lockdown. Here is the chart:

Redo_wsj_tech_layoffs_sufficiency

It's the dreaded bubble chart, complete with overlapping circles. Each bubble represents the total number of employees laid off in the U.S. in a given month.

The above isn't really the chart you find in the Journal. I have removed the two data labels from the chart. Look at the highlighted months of April 2020 and November 2022. Can you guess how much larger is the number of laid-off employees in November 2022 relative to April 2020?

***

If you guessed it's 100% - that the larger bubble is twice the size of the smaller one, then you're much better than I at reading bubble charts. Here is the published chart with the data labels:

Wsj tech layoffs

I like to run this exercise - removing data labels - in order to reveal whether the graphical elements on the page are sufficient to convey the underlying data. Bubbles are typically not great at this. (This is what I call the self-sufficiency test.)

***

Another problem with bubble charts is that the sizes of the bubbles are arbitrary. This allows the designer to convey different messages with the same data.

Take a look at these two bubble charts:

Redo_wsj_layoff_bubbles

The first one has huge bubbles, and lots of overlapping while the second one is roughly the same as the WSJ chart (I pulled a different dataset so the numbers may not be exactly the same).

Both charts are made from exactly the same data! In the second chart, the smallest bubbles are made very small while in the first chart, the smallest bubbles are still quite large.

Think twice before you make a bubble chart.

 


Energy efficiency deserves visual efficiency

Long-time contributor Aleksander B. found a good one, in the World Energy Outlook Report, published by IEA (International Energy Agency).

Iea_balloonchart_emissions

The use of balloons is unusual, although after five minutes, I decided I must do some research to have any hope of understanding this data visualization.

A lot is going on. Below, I trace my own journey through this chart.

The text on the top left explains that the chart concerns emissions and temperature change. The first set of balloons (the grey ones) includes helpful annotations. The left-right position of the balloons indicates time points, in 10-year intervals except for the first.

The trapezoid that sits below the four balloons is more mysterious. It's labelled "median temperature rise in 2100". I debate two possibilities: (a) this trapezoid may serve as the fifth balloon, extending the time series from 2050 to 2100. This interpretation raises a couple of questions: why does the symbol change from balloon to trapezoid? why is the left-right time scale broken? (b) this trapezoid may represent something unrelated to the balloons. This interpretation also raises questions: its position on the horizontal axis still breaks the time series; and  if the new variable is "median temperature rise", then what determines its location on the chart?

That last question is answered if I move my glance all the way to the right edge of the chart where there are vertical axis labels. This axis is untitled but the labels shown in degree Celsius units are appropriate for "median temperature rise".

Turning to the balloons, I wonder what the scale is for the encoded emissions data. This is also puzzling because only a few balloons wear data labels, and a scale is nowhere to be found.

Iea_balloonchart_emissions_legend

The gridlines suggests that the vertical location of the balloons is meaningful. Tracing those gridlines to the right edge leads me back to the Celsius scale, which seems unrelated to emissions. The amount of emissions is probably encoded in the sizes of the balloons although none of these four balloons have any data labels so I'm rather flustered. My attention shifts to the colored balloons, a few of which are labelled. This confirms that the size of the balloons indeed measures the amount of emissions. Nevertheless, it is still impossible to gauge the change in emissions for the 10-year periods.

The colored balloons rising above, way above, the gridlines is an indication that the gridlines may lack a relationship with the balloons. But in some charts, the designer may deliberately use this device to draw attention to outlier values.

Next, I attempt to divine the informational content of the balloon strings. Presumably, the chart is concerned with drawing the correlation between emissions and temperature rise. Here I'm also stumped.

I start to look at the colored balloons. I've figured out that the amount of emissions is shown by the balloon size but I am still unclear about the elevation of the balloons. The vertical locations of these balloons change over time, hinting that they are data-driven. Yet, there is no axis, gridline, or data label that provides a key to its meaning.

Now I focus my attention on the trapezoids. I notice the labels "NZE", "APS", etc. The red section says "Pre-Paris Agreement" which would indicate these sections denote periods of time. However, I also understand the left-right positions of same-color balloons to indicate time progression. I'm completely lost. Understanding these labels is crucial to understanding the color scheme. Clearly, I have to read the report itself to decipher these acronyms.

The research reveals that NZE means "net zero emissions", which is a forecasting scenario - an utterly unrealistic one - in which every country is assumed to fulfil fully its obligations, a sort of best-case scenario but an unattainable optimum. APS and STEPS embed different assumptions about the level of effort countries would spend on reducing emissions and tackling global warming.

At this stage, I come upon another discovery. The grey section is missing any acronym labels. It's actually the legend of the chart. The balloon sizes, elevations, and left-right positions in the grey section are all arbitrary, and do not represent any real data! Surprisingly, this legend does not contain any numbers so it does not satisfy one of the traditional functions of a legend, which is to provide a scale.

There is still one final itch. Take a look at the green section:

Iea_balloonchart_emissions_green

What is this, hmm, caret symbol? It's labeled "Net Zero". Based on what I have been able to learn so far, I associate "net zero" to no "emissions" (this suggests they are talking about net emissions not gross emissions). For some reason, I also want to associate it with zero temperature rise. But this is not to be. The "net zero" line pins the balloon strings to a level of roughly 2.5 Celsius rise in temperature.

Wait, that's a misreading of the chart because the projected net temperature increase is found inside the trapezoid, meaning at "net zero", the scientists expect an increase in 1.5 degrees Celsius. If I accept this, I come face to face with the problem raised above: what is the meaning of the vertical positioning of the balloons? There must be a reason why the balloon strings are pinned at 2.5 degrees. I just have no idea why.

I'm also stealthily presuming that the top and bottom edges of the trapezoids represent confidence intervals around the median temperature rise values. The height of each trapezoid appears identical so I'm not sure.

I have just learned something else about this chart. The green "caret" must have been conceived as a fully deflated balloon since it represents the value zero. Its existence exposes two limitations imposed by the chosen visual design. Bubbles/circles should not be used when the value of zero holds significance. Besides, the use of balloon strings to indicate four discrete time points breaks down when there is a scenario which involves only three buoyant balloons.

***

The underlying dataset has five values (four emissions, one temperature rise) for four forecasting scenarios. It's taken a lot more time to explain the data visualization than to just show readers those 20 numbers. That's not good!

I'm sure the designer did not set out to confuse. I think what happened might be that the design wasn't shown to potential readers for feedback. Perhaps they were shown only to insiders who bring their domain knowledge. Insiders most likely would not have as much difficulty with reading this chart as did I.

This is an important lesson for using data visualization as a means of communications to the public. It's easy for specialists to assume knowledge that readers won't have.

For the IEA chart, here is a list of things not found explicitly on the chart that readers have to know in order to understand it.

  • Readers have to know about the various forecasting scenarios, and their acronyms (APS, NZE, etc.). This allows them to interpret the colors and section titles on the chart, and to decide whether the grey section is missing a scenario label, or is a legend.
  • Since the legend does not contain any scale information, neither for the balloon sizes nor for the temperatures, readers have to figure out the scales on their own. For temperature, they first learn from the legend that the temperature rise information is encoded in the trapezoid, then find the vertical axis on the right edge, notice that this axis has degree Celsius units, and recognize that the Celsius scale is appropriate for measuring median temperature rise.
  • For the balloon size scale, readers must resist the distracting gridlines around the grey balloons in the legend, notice the several data labels attached to the colored balloons, and accept that the designer has opted not to provide a proper size scale.

Finally, I still have several unresolved questions:

  • The horizontal axis may have no meaning at all, or it may only have meaning for emissions data but not for temperature
  • The vertical positioning of balloons probably has significance, or maybe it doesn't
  • The height of the trapezoids probably has significance, or maybe it doesn't

 

 


Finding the right context to interpret household energy data

Bloomberg_energybillBloomberg's recent article on surging UK household energy costs, projected over this winter, contains data about which I have long been intrigued: how much energy does different household items consume?

A twitter follower alerted me to this chart, and she found it informative.

***
If the goal is to pick out the appliances and estimate the cost of running them, the chart serves its purpose. Because the entire set of data is printed, a data table would have done equally well.

I learned that the mobile phone costs almost nothing to charge: 1 pence for six hours of charging, which is deemed a "single use" which seems double what a full charge requires. The games console costs 14 pence for a "single use" of two hours. That might be an underestimate of how much time gamers spend gaming each day.

***

Understanding the design of the chart needs a bit more effort. Each appliance is measured by two metrics: the number of hours considered to be "single use", and a currency value.

It took me a while to figure out how to interpret these currency values. Each cost is associated with a single use, and the duration of a single use increases as we move down the list of appliances. Since the designer assumes a fixed cost of electicity (shown in the footnote as 34p per kWh), at first, it seems like the costs should just increase from top to bottom. That's not the case, though.

Something else is driving these numbers behind the scene, namely, the intensity of energy use by appliance. The wifi router listed at the bottom is turned on 24 hours a day, and the daily cost of running it is just 6p. Meanwhile, running the fridge and freezer the whole day costs 41p. Thus, the fridge&freezer consumes electricity at a rate that is almost 7 times higher than the router.

The chart uses a split axis, which artificially reduces the gap between 8 hours and 24 hours. Here is another look at the bottom of the chart:

Bloomberg_energycost_bottom

***

Let's examine the choice of "single use" as a common basis for comparing appliances. Consider this:

  • Continuous appliances (wifi router, refrigerator, etc.) are denoted as 24 hours, so a daily time window is also implied
  • Repeated-use appliances (e.g. coffee maker, kettle) may be run multiple times a day
  • Infrequent use appliances may be used less than once a day

I prefer standardizing to a "per day" metric. If I use the microwave three times a day, the daily cost is 3 x 3p = 9 p, which is more than I'd spend on the wifi router, run 24 hours. On the other hand, I use the washing machine once a week, so the frequency is 1/7, and the effective daily cost is 1/7 x 36 p = 5p, notably lower than using the microwave.

The choice of metric has key implications on the appearance of the chart. The bubble size encodes the relative energy costs. The biggest bubbles are in the heating category, which is no surprise. The next largest bubbles are tumble dryer, dishwasher, and electric oven. These are generally not used every day so the "per day" calculation would push them lower in rank.

***

Another noteworthy feature of the Bloomberg chart is the split legend. The colors divide appliances into five groups based on usage category (e.g. cleaning, food, utility). Instead of the usual color legend printed on a corner or side of the chart, the designer spreads the category labels around the chart. Each label is shown the first time a specific usage category appears on the chart. There is a presumption that the reader scans from top to bottom, which is probably true on average.

I like this arrangement as it delivers information to the reader when it's needed.