« December 2012 | Main | February 2013 »

A pretty chart hides the message

James C @annelidworm sent me to this BBC chart, which he thinks is "hard on the eyes":



I find a few things I like, and also a few I don't.

Unlike James, I actually find the chart quite pretty. The use of a small-multiples to compare season tickets with single tickets is also nice. For someone like me, who isn't well versed in the British map, the geography lesson is appreciated - although for a local reader, this may be superfluous. The thickness of lines used to encode the data works alright.

There are a few problems with the chart:

There is a self-sufficiency problem. This is a chart in which every data element is printed on the chart, which means the graphical pieces are merely cosmetic. If the data labels were removed, the reader would be entirely lost. However, this problem can be solved by judicious use of colors.

Consider how color is used here. Blue and yellow distinguishes between season and single tickets but the small-multiples setup already does the job well enough. The tint is used in some arbitrary manner unrelated to the data, as far as I can tell.

Instead, price increases above the rate of inflation should be differentiated from price changes below the rate of inflation by using two colors. The special case of Birmingham's season ticket which increased exactly at the rate of inflation deserves its own color.

Speaking of increases relative to inflation. The analyst helpfully explains via the legend that any number above 66 percent is ahead of inflation, and any number below is behind inflation, meaning prices have actually come down. The entire dataset can be simplified by subtracting 66% from each number to show the "real" price changes.


Take a step back. What is the story in this dataset? The numbers on the right side are all much higher than those on the left, with Shoeburyness being a bit of the exception. It appears that the rail company is trying to push sales of season tickets. Too bad this chart doesn't bring the story to the front.

Here is one attempt using paired bar charts.






Interpreting some charts about guns

Felix linked to a set of charts about guns in the U.S. (and elsewhere). The original charts, by Liz Fosslien, are found here.

I like the clean style used by Fosslien. Some of the charts are thought-provoking. Many of them may raise more questions than they answer. Here are a few that caught my eye.


A simplistic interpretation would claim that banning handguns is futile, and may even have an adverse impact on murder rate. However, this chart does not reveal the direction of causality. Did some countries ban handguns because they are reacting to higher violence? If that is the case, this chart is confirming that the countries with handgun bans are a self-selected group.



The U.S. is an outlier, both in terms of firearm ownership and firearm homicides. This makes the analysis much harder because the U.S. is really in a class of its own. It's not at all clear whether there is a positive correlation in the cluster below, and even if there is, whether we can draw a straight line up to the U.S. dot is also dubious.



Fosslien is being cheeky to deny us the identity of the other outlier, the country with few firearms but even higher death rate from intentional homicide. These scatter plots are great by the way to show bivariate distributions.



I'd still prefer a line chart for this type of data but this particular paired bar chart works for me as well. The contents of this chart is a shock to me.



I just don't get this one. Why is there a fan?

A reader likes the four-point perception range chart

Note to readers: Sorry for the infrequent updates. We'll be back on schedule in February with exciting news.


Perception-abstrusegooseRobert Kosara wrote a rebuttal to my previous post on the chart that shows the human's visual and audio ranges of perception in a box. Here is his full post. The chart under discussion is shown on the right. It appeared at the Abstruse Goose site.

Like me, he has obviously spent time thinking about four points on a chart. I have to say I'm not convinced by his points.


Kosara writes:

The point of this chart is not to communicate a lot of data or to inform, but merely to entertain and perhaps to make people pause and think for a moment.

I buy the first part of the sentence only. For me, the chart is misleading unless, as I said last time, we are told how much important stuff we are missing in the dark regions.

Think of this analogy: for some people, to realize that there are planets, galaxies and universes beyond our own is a wow moment. I need to know more. If you are told that no life exists outside planet earth, that the rest is barren and nothingness, are you still as fascinated?


Kosara likes the log-log axes, saying first:

That provides an interesting comparison, that I don’t think a lot of people have seen before.


The difference in light frequencies contains the sound frequencies many billions of times.


Our perception largely works in a logarithmic way.

My point was that light and sound are measured on completely different scales. To plot them in a bivariate chart would require some kind of standardization. I'd imagine if we can figure out the minimum perceptible difference for each dimension, we'd have made some headway.

That third comment really intrigues me. I have never liked log charts. I always find that audiences can't read them. They have to imagine that each layer is 10 times the size of the one below even though visually they appear exactly the same. In my experience, it leads to underestimating the large values, and massively exaggerating the importance of tiny differences on the small end of the axis. I'd be intrigued to see some scientific studies that show that logarithmic perception is natural.



Ruining the cake with too much icing

Reader Steve S. tried to spoil my new year with this chart he didn't like:


Or maybe he's just chiding me for recommending Bumps charts. This example is very confusing, a tangled mess.

But not so fast.

The dataset has two characteristics that don't sit well with bumps charts. One is too many things being ranked (twenty). Two is too much rank swapping that happens over time (14 periods).

The latter challenge can be tamed by aggregating the time dimension. For some reason, the period under examination was the first half year after the debut of these computers. Do we really need to know the weekly statistics?

We can keep all 14 periods. If so, we should be judicious in selecting the colors, the lines and dashed lines, and gridlines, and so on. In particular, look for a story and use foreground/background techniques to highlight the story.

Here's a version that focuses on the brands that moved the most number of ranks either up or down during this period:


Here's one that tracks how the top five fared over this period of time. It turns out that despite all the noisy movements, not much happened at the top of the rankings:


Not knowing many of these computer brands, I really have no idea why seven colors were used and why different tints of the six colors were chosen. I also don't have a clue why some lines were dashed and others were solid.

Looking closely, I learn that the Sony PC was given a black color because its label does not show up on either side. It was a product that did not rank among the top 20 at the start nor at the end of this time period. This Sony PC should be consigned to the dustbin of history, and yet in the color scheme selected for the original chart, the black solid line is the most visible!


I'd like to see an interactive layer added to this chart that brings out the "information". Two of the tabs can be "top movers" and "top five brands" as discussed above. If you hover over these tabs, the appropriate lines are highlighted.