Unintentional deception of area expansion #bigdata #piechart
The less-is-more story, and its meta

Some like it packed, some like it piled, and some like it wrapped

In addition to Xan's "packed bars" (which I discussed here), there are some related efforts to improve upon the treemap. To recap, treemap is a design to show parts against the whole, and it works by packing rectangles into the bounding box. Frequently, this leads to odd-shaped rectangles, e.g. really thin and really tall ones, and it asks readers to estimate relative areas of differently-scaled boxes. We often make mistakes in this task.

The packed bar chart approaches this challenge by allowing only the width of the box to vary with the data. The height of every box is identical, so readers only have to compare lengths.

Via Twitter, Adil pointed me to this article by him and his collaborators that describes a few alternatives.

One of the options is the "wrapped bar chart" introduced by Stephen Few. Like Xan, he also restricts the variation to legnths of bars while keeping the heights fixed. But he goes further, and abandons packing completely. Instead of packing, Few wraps the bars. Start with a large bar chart with many categories filling up a tall plotting area. He then divides the bars into different blocks and place them side by side. Here is an example showing 50 states, ranked by total electoral votes:


You can see the white space because there is no packing. This version makes it easier to see the relative importance of the different blocks of states but it is tough to tell how much the first block of 13 states accounts for. The wrapped barchart is organized similar to a small multiples, except that the scale in each panel is allowed to vary.

Another option is the "piled bars." This option, presented by Yalçın, Elmqvist, and Bederson, brings packing back. But unlike the packed bars or the treemap, the outside envelope no longer represents the total amount. In the "piled bars" design, the top X categories act as the canvas, and the smaller categories are packed inside these bars rather than around them. Take a look at this example, which plots GDP growth of different countries:


 The inset on the left column is instructive. The green (smallest) and red (medium) bars are packed inside the blue (largest) bars. In this example, it doesn't make sense to add up GDP growth rates, so it doesn't matter that the outer envelope does not equal the total. It would not work as well with the electoral vote data in the previous example.

I wonder whether a piled dot plot works better than a piled bar chart. This piled bar chart shares a problem with the stacked area chart, which is that other than the first piece, all the other pieces represent the differences between the respective data and the next lower category, rather than the value of the data point. Readers are led to compare the green, red and blue pieces but the corresponding values are not truly comparable, or of primary interest.

This problem goes away if the bars are represented by dots.


What strikes me as the most key paragraph in the Yalcin, et. al.'s article is the following:

To understand graphical perception performance, we studied three basic tasks:

1) How accurately can we estimate the difference between two data points?
2) How accurately can we estimate the rank of a data point among all the rest?
3) How accurately can we guess the distribution characteristic of the whole dataset?

As a chart designer, we have to prioritize these tasks. There is unlikely to be a single chart form that will prevail on all three tasks. So if the designer starts with the question that he or she wants to address, that leads to the key task that the visualization should enable, which leads to the chart form that facilitates that task the best.






Thanks for the review, Kaiser!

I am glad you found the three tasks (comparison, ranking, distribution) highly appropriate! In our GI2017 paper, we compared treemaps vs. wrapped bars for these tasks, and found that wrapped bars is perceptually and significantly more accurate than treemaps in many cases. It is probably no surprise (area encoding on a flexible layout vs. length encoding in a targeted columnar layout). However, given the wide practice of throwing tree-maps for the visual appeal and ease of learning, I think this is an important take-away for practitioners.

We had also tested piled bars design using crowdsourcing. Unfortunately, the results couldn't make it into published work. However, we found that getting used to reading piled bars design is not as trivial as it is to learn treemaps or piled bars. The overlapping nature likely made people discard the section of the bars under others, or that the bars do not extend to the baseline.

As a last note, I want to touch on your mention of the use of dot-plots instead of bars. It seems Daniel Zvinca had a similar idea, potentially developed separately. He worked with Stephen Few to generate some results and discussions:
https://www.perceptualedge.com/articles/visual_business_intelligence/journey_to_zvinca.pdf Their report extends on the design space of this multi-column, single-scale bar chart idea to present many more numeric values in a compact space.

Let's hope these chart variations will help others better communicate a dense set of numeric values with higher accuracy and visibility. To create wrapper and piled bars design, I released an open-source JS library adilyalcin.me/chubuk.js I hope that's a good first step for re-use and further development:)



The comments to this entry are closed.