## On the bubble

##### Aug 08, 2007

A couple of you noticed this table of bubbles in the Times, and asked what I think of it.  Dustin J suggested that this could be considered a decent application of bubble charts.  I agree, with some reservations.

• Which Presidential candidates are getting the most face time?
• Are candidates seen equally often across the stations?
• Are there differences between network and cable stations in terms of total face time?  In terms of individual face time?
• Are there Democratic/Republican leanings by station?  by type of station?

The intrepid can even build a regression out of it.

The bubble chart contains answers to all those questions but nothing jumps out. Okay, it's easy to see the station that gives each candidate the most face time.  Anything else requires moderate to a lot of effort.  Here's the junkart version.

The list of things done to the data is long:

• Candidates are grouped together by party
• Candidates within each party are arranged in order of decreasing maximum face time
• Stations are arranged by increasing total face time, this order happens to retain the network vs cable divide
• A heat map construct is used instead of bubbles: the legend is missing but there are four hues for each color: darkest = top 10%; medium = 10th - 50th percentile; light = bottom 50th percentile excepting zeroes; white = no face time.  In raw numbers, 90th percentile = 81 minutes, 50th percentile = 19 minutes.
• The only data shown are the totals by candidate and totals by station.
• On the right margin are little bar charts that show the distribution of network/cable for each candidate.
• On the bottom margin are little column charts showing the distribution of party affiliation by station.

A few observations follow:

• Cable stations gave much more face time to the candidates in general.  Fox, no surprise, gives Republicans 85% of its time while all the others were roughly equal.
• The more mainstream the candidate, the balanced was the time spent on networks versus cable.  John McCain (R), Hillary Clinton (D) and John Edwards (D) had the highest proportion of network time.
• More time is not necessarily good since McCain was the clear winner but his campaign is struggling

Source: "Tracking Face Time", New York Times, August 1, 2007.

You can follow this conversation by subscribing to the comment feed for this post.

FYI: The order is a little off for the Republicans

Andrew: If you're talking about the row orders, then yes they weren't ordered by total minutes. I ordered them by the maximum face time in which case, Biden led with 132 minutes on MSNBC and Dodd was second with 128 minutes, etc. It's debatable which one is more useful. I was too lazy to change it back to total minutes.

Very nice. I think I would have put some space between the Democrats and Republicans and added two more little bar charts on the right showing network/cable breakdowns for each party. This would have answered the last of your initial questions.

I find the original to be more effective. While the addition of percentage totals on the margines is useful, I think the heatmap implementation makes the data harder to read, mainly because you have reduced the "resolution" of the data (by re-casting it in to four "bins"). Arguably, a heatmap is best at showing gradients (ie. temperature), and here you have eliminated the gradients that existed in the original data. In which case, I think size is a more legible "retinal variable" than color.

Also, as someone who is color-blind, I had trouble with your color choices. Obviously this is particular to my color-blindness, but I'm ashamed to say it took me a few minutes to realize that you were using two different color sets for the two political parties! This is something that infographic designers really need to be aware of, particularly when presenting data to a wide audience (such as New York Times readers). There are a lot of color-blind people out there.

Finally, to me, the palette you used for Republicans shows up as significantly darker than the one used for Democrats (compare RGB 0,0,255 to RGB 0,0,128 for the highest values). Given a heatmap representation, this runs the risk of presenting the false illusion that the darker Republican colors somehow corresponds to higher values in that category. Again, could be the color-blindness kicking in again, but it is something to be aware of either way.

Very nice redesign, with the exception of the bar charts at the bottom (party affiliation by station). I found the fill patterns used there to be quite distracting; why not just use solid colors here?

miked: I have to admit I didn't exercise any "color choice". I used Excel (some old version) to create the heat map and they had exactly 4-5 levels of blue and red. The blue and red are party colors so I had to stick with tradition. It would be useful if you point us to a resource that would help us think about color-blindness.

I do take issue with the "resolution" argument. Binning data brings out the patterns; that's also what the bubbles were intended to do. If someone is looking for the raw data, then the question he is asking is too detailed for a data graphic. He should just look up a table.

I strongly agree with those who don't favor the heatmap approach. Take for example Christopher Dodd... bubbles show clearly that MSNBC gave him more than double the face time of any other network, including the 2nd highest at CNN. The heatmap shows no distinction between CNN and >2x facetime at MSNBC. More significantly, I think, comparison between candidates in opposing parties is very challenging with the heatmap approach, since one needs to visually distinguish shades of 2 different colors, whereas the colored bubbles use color for party distinction but easily-compared and party-independent bubble size for facetime. Similarly, this characteristic makes it difficult to see the trends within a network as to which candidate might be favored there. I agree that heatamps are nice for trends, but they are only really useful in one dimension, i.e., trends per candidate come out somewhat OK, but trends per network are not clear at all. Finally, I'm not color blind, but I could only pick up 3 color levels for either party, which seems a bit sparse given the variability across the data.

Mrweatherbee: Tufte (et al. I presume) has made the argument that use of circles for comparison purposes is risky. Humans aren't that great at comparing circular areas: pi * radius * radius is not easy to estimate. We are prone to compare diameters of circles rather than areas, which understates the difference in values.

In the Chris Dodd case I doubt that people could clearly tell that MSNBC was more than twice CNN. Including the value in the circle seems more useful for this purpose.

Kaiser: I don't necessarily have a problem with the way you have binned the data; essentially you are normalizing it against total face time per news outlet, and that's a valid thing to do. However, I might argue that that sort of normalization is less meaningful in this case, where the success of a political candidate often depends on the sheer number of "eyeballs reached" rather than their share of face time on a given news network. But this is debatable.

I'm more concerned that the heatmap "visual metaphor" is less effective than sized circles because it is more difficult to visually parse. As I said, my impression is that heatmaps work great for data that is continuous and features discernable gradients (the idea of a heatmap comes from the display of temperature measurements). But this data is unordered. There is no continuity expected between a list of candidates and a list of news networks, so a heatmap presents as randomly ordered blocks of color. To me, this ends up looking more like noise than if we used size as the retinal variable for facetime.

I guess what I'm arguing is that, for me at least, it is easier to distinguish circles of a particular size out of a field of ramdomly-sized circles than it is to pick a particular shade of color out a field of random colors. And I think this is particularly true for "extreme" values, which are arguably the meat of this data (ie. we are mostly interesting in which canditates are getting the "most" of something, and we don't care too much about the intermediate values). To that end, the NYT quite cleverly uses "oversized" circles for particularly high values that break the grid of the chart and immediately draw your attention to them. This doesn't seem to happen with the heatmap. The values don't "pop" as much visually.

Also, as Mrweatherbee says, it actually becomes harder to compare Republicans and Democrats because I have to do both a more complicated color comparison (due to the color scale) and a spatial comparison (comparing a set of things above to a set of things below) than just squinting at the NYT version and immediately seeing roughly the same amount of blue and red.

This attempt of mine is very simple, and I hope not too eccentric. The bars are made by using the | character and the REPT() function in Excel.

I ordered the Republican candidates by ascending face time on Fox News and the Democrats by descending face time on MSNBC, so Giuliani and Fred Thompson are next door to Biden and Dodd in the middle, and the other news channels are displayed in comparison to those respective channels.

Derek -

Good chart. The "bars" are much better for quantitative analysis than either the circles or the heatmap. The ordering of the categories is troubling to me.

Is it possible to rank candidates by total across all outlets? If so, I think we'd see McCain, Hunter, and Huckabee move closer to the bottom of your "red" list, and Obama, Edwards, and Richardson move closer to the top of the "blue" list.

I like keeping the parties separate, but I would have listed both parties in decreasing order, not peaking in the middle where they meet.

Anyone know how to create teh NYT chart? Any such kind of functionality in Excel, before doing the final wrap-up in Illustrator?

Thanks for any ideas!

Excellent redesign! I'd suggest sorting networks by bias, and listing total viewership. Total viewership would help understand the balance between FOX's extreme but solitary Rep bias and all other networks' slight Dem bias.

Also, why do you discretize the heatmap? Why not just use all hues available instead of only using 4 levels or so.

The comments to this entry are closed.