The art of contaminating data
Unintentional deception of area expansion #bigdata #piechart

What do we think of the "packed" bar chart?

Xan Gregg - my partner in the #onelesspie campaign to replace terrible Wikipedia pie charts one at a time - has come up with a new chart form that he calls "packed bars". It's a combination of bar charts and the treemap.

Here is an example of a packed barchart, in which the top 10 companies on the S&P500 index are displayed:


What he's doing is to add context to help interpret the data. So frequently these days, we encounter data analyses of the "Top X" or "Bottom Y" type. Such analyses are extremely limited in utility as it ignores the bulk of the data. The extreme values have little to nothing to say about the rest of the data. This problem is particularly acute in skewed data.

Compare the two versions:


The left chart is a Top 10 analysis. The reader knows nothing about the market cap of the other 490 companies. The right chart provides the context. We can see that the Top 10 companies have a combined market cap that is roughly a quarter of the total market cap in the S&P 500. We also learn about the size of the next 10 versus the Top 10, etc.

As with any chart form, a nice dataset can really surface its power. I really like what the packed barchart reveals about the election data by county:


(Thanks to Xan for providing me this image.)

Notice the preponderance of red on the right side and the gradual shift from blue/purple to pink/red moving left to right. This is very effective at showing one of the most important patterns in American politics - the small counties are mostly deep red while the Democratic base is to be found primarily in large metropolitan areas. I have previously featured a number of interesting election graphics here. Washington Post's nation of peaks is another way to surface this pattern.

Xan would love to get feedback about this chart type. He has put up a blog post here with more details. I also love this animation he created to show how the packing occurs.





Feed You can follow this conversation by subscribing to the comment feed for this post.


As Xan points out, a weakness of this viz is that it invites interpretation as a stacked bar. The fact that the vertical ordinal ranking applies only to the leftmost cells, while the order of everything to the right is just an artifact of the tiling algorithm, creates confusion.

I confess I'm not a fan of treemaps. At least the axes of a treemap have no pretension to numeric value. Here the vertical axis conveys size ranking for some of the leaves but not for the others. It seems to violate some cardinal principle of good design.


Can I meekly raise my hand, about the consistency of the data in the second chart?

I happen to live in Hennepin County in MN. Is it really possible that Hennepin County's 1,232,483 souls has more voters than the 2,333,054 folks in Queens County, NY?

That aside, I think the chart type is good. I like the sense of the total it gives.


@James: I had to go look it up, because that surprised me too. I assume your total population values are correct, and I took the election results from politico. It looks like the Minnesota ballot had much more variety than New York (9 candidates vs 4), but even with this extra "spread", there are 9112 more Republican and Democratic voters in Hennepin county than in New York.

Hennepin also seems to bring out a much larger proportion of its population to vote: 54.5% vs 26.9%. There must be some significant demographic differences between Queens and the Twin Cities, but as I haven't spent time in either county, I'll leave you to look into how those differences come into play on election day.



Thanks for the follow up there. I was struggling to find any data in NY; voting and politics is near and dear, the data not so much.

I do know that historically, MN turns out a higher proportion of voters on election day than most other states. I've lived in and around the Twin Cities most of my life, and I can say that the polls are steadily busy on election day.

Anecdotally, at my polling place in 2008, I got in line at 7:45 and it was already several hundred people long. I walked out with my "I Voted" sticker around 8:15 - 8:20 and the line was just as long.

At there any handy links you can point me at?


Also, for what it is worth, I think the ease with which Minnesotan's can vote is huge factor in turnout too. Same-day registration, no-excuse absentee balloting and so on.

The comments to this entry are closed.