Batmen not as interesting as it seems

When this post appears, I will be on my way to Seattle. Maybe I will meet some of you there. You can still register here.

I held onto this tip from a reader for a while. I think it came from Twitter:

20160326_woc432_1 batman

The Economist found a fun topic but what's up with the axis not starting at zero?

The height x weight gimmick seems cool but on second thought, weight is not the same as girth so it doesn't make much sense!

In the re-design, I use bubbles to indicate weight and vertical location to indicate height. The data aren't as interesting as one might think. All the actors pretty much stayed true to the comic-book ideal, with Adam West being the closest. I also changed the order of the actors.

Redo_batman

I left out the Lego, as it creates a design challenge that does not justify the effort.

 

 


More chart drama, and data aggregation

Robert Kosara posted a response to my previous post.

He raises an important issue in data visualization - the need to aggregate data, and not plot raw data. I have no objection to that point.

What was shown in my original post are two extremes. The bubble chart is high drama at the expense of data integrity. Readers cannot learn any of the following from that chart:

  • the shape of the growth and subsequent decline of the flu epidemic
  • the beginning and ending date of the epidemic
  • the peak of the epidemic*

* The peak can be inferred from the data label, although there appears to be at least one other circle of approximately equal size, which isn't labeled.

The column chart is low drama but high data integrity. To retain some dramatic element, I encoded the data redundantly in the color scale. I also emulated the original chart in labeling specific spikes.

The designer then simply has to choose a position along these two extremes. This will involve some smoothing or aggregation of the data. Robert showed a column chart that has weekly aggregates, and in his view, his version is closer to the bubble chart.

Robert's version indeed strikes a balance between drama and data integrity, and I am in favor of it. Here is the idea (I am responsible for the added color).

Kosara_avianflu2

***

Where I depart from Robert is how one reads a column chart such as the one I posted:

Redo_avianflu2

Robert thinks that readers will perceive each individual line separately, and in so doing, "details hide the story". When I look at a chart like this, I am drawn to the envelope of the columns. The lighter colors are chosen for the smaller spikes to push them into the background. What might be the problem are those data labels identifying specific spikes; they are a holdover from the original chart--I actually don't know why those specific dates are labeled.

***

In summary, the key takeaway is, as Robert puts it:

the point of this [dataset] is really not about individual days, it’s about the grand totals and the speed with which the outbreak happened.

We both agree that the weekly version is the best among these. I don't see how the reader can figure out grand totals and speed with which the outbreak happened by staring at those dramatic but overlapping bubbles.


Is it worth the drama?

Quite the eye-catching chart this:

Wsj_avianflu

The original accompanied this article in the Wall Street Journal about avian flu outbreaks in the U.S.

The point of the chart appears to be the peak in the flu season around May. The overlapping bubbles were probably used for drama.

A column chart, with appropriate colors, attains much of the drama but retains the ability to read the data.

Redo_avianflu2

 


How to tell if your graphic is underpowered?

Some time ago, this chart showed up in a NYT Magazine (it's about sex):

Nytm_circles

In this composition, the visual element (the circles) has no utility. A self-sufficiency test makes this point clear.

All the data (four numbers) are printed on the original graphic. When removed, the reader loses all ability to understand the data.

Nytm_circles_cropped

***

Redo_nytm_circles_1Even when the first number is revealed, it is impossible to know the values of the others.

If one knows the second (and largest) pink circle represents 58 percent, it is still impossible to guess that the adjacent circle is 40 percent.

Even both those numbers are provided, it is still impossible to infer the rest without a calculation.

In order to understand this graphic, readers must look at the data labels.

 

 

***

I made a couple of other versions for comparison.

The first uses the pie chart, which is almost readable without the data labels. 

Redo_nytm_circles_2

The second uses the bar chart, which requires only an axis.

Redo_nytm_circles_3

 

 

 

 

 


Observing Rosling’s Current Visual Style

On the sister blog, I wrote about Hans Rosling’s recent presentation in New York (link). I noted that Rosling has apparently simplified his visual palette.

Rosling is best known as the developer of the Gapminder tool, used to visualize global social statistics data collected by national statistical agencies. I wrote favorably about this tool in a series of posts (link). Gapminder made popular the moving bubble chart, although not the only graphical form present.

Gapminder_screengrab

These animated bubble charts also made Rosling a YouTube star (See here.)

***

In last week’s presentation, Rosling only showed one moving bubble chart. The rest of his graphics are noticeably simpler, something that anyone can produce on Excel or Powerpoint. Here is one example:

Image1
 

I’m particularly impressed by a simple sequence of charts in which Rosling explains the demographic changes the world is expecting to see in the next 50 to 100 years.

  Image2

This is an enhanced area chart. Each slice of area is subdivided into stick figures so that an axis for population counts becomes unnecessary.

Instead, the reader sees two useful dimensions: region of the world, and age group.

How the population ages as it grows is the feature story and the effect of aging is ingeniously portrayed as layers. This becomes apparent as Rosling lets time roll forward, and the layers literally walk off the page. (Unfortunately, I couldn't capture each step fast enough.)

Image3

 (This photo courtesy of Daniel Vadnais.)

When Rosling showed the 2085 projection, we find that the entire rectangle has filled up, so the world population has definitely grown, roughly by 30 percent. The growth happens by filling up of adults; the total number of children has not changed. This is one of the key insights from recent demographic data. The first photo above shows something remarkable: the fertility rate in Asian countries has plunged to about the same level of developed countries already.

***

This set of charts is unusually effective. It represents another level of simplification in visual means. At the same time, the message is sharpened.

As I reported the other day (link), Rosling does not believe modern tools have improved data analysis. This talk which utilized simple tools is a good demonstration of his point.


An infographic showing up here for the right reason

Infographics do not have to be "data ornaments" (link). Once in a blue moon, someone finds the right balance of pictures and data. Here is a nice example from the Wall Street Journal, via ThumbsUpViz.

 

Thumbsupviz_wsj_footballinjuries

 

Link to the image

 

What makes this work is that the picture of the running back serves a purpose here, in organizing the data.  Contrast this to the airplane from Consumer Reports (link), which did a poor job of providing structure. An alternative of using a bar chart is clearly inferior and much less engaging.

Redowsjinjuries_bar

***

I went ahead and experimented with it:

Redo_wsj_nflinjuries

 

I fixed the self-sufficiency issue, always present when using bubble charts. In this case, I don't think it matters whether the readers know the exact number of injuries so I removed all of the data from the chart.

Here are  three temptations that I did not implement:

  • Not include the legend
  • Not include the text labels, which are rendered redundant by the brilliant idea of using the running guy
  • Hide the bar charts behind a mouseover effect.

 


A patently pointless picture

I am mystified by the intention behind this chart, published in NYT Magazine (Sept 14, 2014).

Nytmag_windshield

It is not a data visualization since the circles were not placed to scale. The 650 and 660 should have been further to the right on a horizontal time scale. And if we were to take the radial time axis literally, the 390 circle would be closest to the center.

It is not a work of art. It doesn’t look particularly appealing. Sometimes, designers are inspired by imagery. The accompanying article concerns windshield wipers, and I’m not seeing the imagery.

***

The arrangement of the circles actually interfere with the reader’s comprehension. Here is a straightforward version of the data as a column chart.

Jc_windshield_1
 

Now, let’s turn it on the side, with time running vertically instead of horizontal (the convention).

  Jc_windshield_2

Then, we need to invert convention once again by making the vertical axis run in reverse so that time runs from up to down, instead of down to up.

Jc_windshield_3

Finally, distort the frequency axis, replace the bars with circles, and you have essentially replicated the original.

The point is each step obscures the pattern more. In this case, following conventions makes a better chart.

***

I have a pet peeve about presenting partial data next to complete data, even if it is labeled correctly. On this chart, the number 390 cannot be compared against any of the other numbers because we are not even half way into the decade of the 2010s. Instead of plotting total number of patents per decade, it would have been more useful to plot number of patents per year in each decade. 43, 26, 65, 41, etc. For the 2010s, I am assuming they have data for 3.5 years.

A simple column chart looks like this:

Jc_windshield_4

The per-year view shows that the 2010s is unusual. Of course, I should add a footnote to the chart to make it clear that we only have partial data for 2010, and that the assumption behind the averaging is that the pace of patents will remain the same on average for the remainder of the decade.

In the Trifecta Checkup, this is Type DV.