Darin M. points us to this speedometer chart, produced by IBM (larger version here). They call it the "Commuter Pain Index". I call it a prickly eyebrow eyelashes chart. You be the judge.
The "eyebrows" on this chart are purely ornaments. The only way to read this chart is to read the data labels, so it is a great example of failing the self-sufficiency test.
The simplest way to fix this chart is to unwrap the arc, turning this into a bar chart. The speedometer is a cute idea but very difficult to pull off because the city names are long text fields, and variable in length.
Reader Joel D. submitted this chart showing airline revenues of major airlines around the world, another chart that puts bubbles on top of a map.
Cool, but really quite naughty. Take the different size bubbles of British Airways (12.8) and Air France-KLM (29.7). Grossly disproportionate. I appreciate the designer's attempts to introduce a geographic element to this but the immediate take-outs here from the bubbles are misleading.
Sometimes a bar chart is all life needs.
I think it fails all three facets of the Trifecta checkup: it does not have a well-defined practical question; the data is not processed properly; and the chart type does not work with this data.
Most airlines are multinational companies that make substantial revenues outside their home countries... so the locations of their registered headquaraters are irrelevant. What is the question being addressed? It would appear to be where are the headquarters of the largest airlines in the world? I don't think this is an especially engaging question. What might be more interesting, for example, is the split between domestic and international revenues for different airlines, or the split among airlines of the revenues within each continent.
Besides, the aggregate revenues data is not very useful for comparison purposes. It ignores the population... a circle in an European country is in reality much "larger" than a circle of the same size in China! Because $200 million on 2 billion people is very different from $200 million from say 50 million! The right base for this data is probably something like revenue per passenger or passenger miles.
The inclusion of Fedex also must be thought through thoroughly. I'd imagine that all the large airlines of the world also have freight divisions, and if we really want to address both passenger and commercial air revenues on the same chart (with which I don't agree), we should at least break out the freight revenues.
Have been getting a lot of reader suggestions lately. Thanks to all of you! Some of these, for various reasons, I won't be able to write full posts on. But they are still worth looking at.
Julien D. invites us to interact with a different community, graphic designers -- that's graphics as in fine arts, not statistical charts. This is a great post concerning how they would re-design boarding passes. This is akin to Ed Tufte's effort to re-design bus schedules.
While there is no statistics involved in these designs, the virtues of clean, simple, direct communication of data to people still apply. Pretty is also desirable.
I first saw this infographics of U.S. tax brackets on Felix Salmon's twitter feed (he found this "pretty".) Now, it landed in my inbox via Troy O. My reaction to Felix was "pretty but pointless". Sunset through a smoky lens? Impressionist painting? Scatter plot? Density plot? This one is crying out for your comments.
Julien D., same person as above, also pointed us to an animation of the re-birth of European air routes after the volcanic ash scare. (Vimeo link)
For this sort of exercise, my main interest lies in admiring the behind-the-scenes effort to collect the data, clean and process the data, code the graphics and animation. I'm not sure the animation tells us anything we don't know by reading the news.
In response to my call for positive examples, reader Merle H. sent in an example of how good charts can make our lives simpler and easier.
All of us have seen the following presentation of air travel data.
Not trying to pick on Travelocity - it's the same format whether you use Expedia or any of the airline sites. For those customers who are looking to decide what dates to travel so as to minimize their air fare, this format is very cumbersome to use.
What about this fare chart at FuncTravel.com?
As you mouse along the line chart, the average fare for each day is visible. Clicking on a particular day will fix the departure or return dates.
So much easier, isn't it?
A few caveats, though:
Instead of just providing the historical averages, they should consider including information on variability, such as bars that indicate the middle 50% or 75% of prices. Also, what about a sliding control for customers to decide which period of past history the averages should use? More recent data may be more representative.
This particular feature appeals to the price-sensitive, date-flexible customer segment. Not everyone will pick itineraries based on those criteria. There is an easy fix. If some controls are available for customers to indicate other preferences, e.g. exclude all British Airways flights, include only evening flights, etc., and the chart can update itself based on such selections, then the chart becomes a lot more flexible, and useful to many more customers.
As with many automatically generated charts, the chosen labels on the vertical axis are laughable. That should be relatively easy to fix, you'd think.
A great start. I happen to notice that Travelocity has a beta feature that shows a similar chart. A revolution in how travel sites present data to us is long overdue.
IA said "The general idea is that the history of subway ridership tells a story
about the history of a neighborhood that is much richer than the
Okay but what about these sparklines would clarify that history? From what I can tell, this is a case of making the chart and then making sense of it.
The chart designer did make a memorable comment in his blog entry: "Hammer in hand, I of course saw this spreadsheet as a bucket of nails." The hammer is a piece of software he created; the nails, the data of trips taken.
Nathan at FlowingData gave a reluctant passing grade to this Wall Street Journal bubbles chart illustrating the recent U.S. bank "stress" test.
One should fight grade inflation with an iron fist. (Hat tip to Dean Malkiel at Princeton.) A simple profile chart would work nicely since the focus is primarily on ranks. The bubbles, as usual, add nothing to the chart, especially where one can create any kind of dramatic effect by scaling them differently.
Nathan also pointed to the maps of the seven sins, which garnered some national attention. This set of maps is a great illustration of the weakness of maps to study spatial distribution of anything that is highly correlated with population distribution. Do cows have envy too? See related discussion at the Gelman blog.
The first hint is its asymmetry, the right tail being longer than the left tail.
Further, the helpful labelling of the "average" does not coincide with the peak of the curve.
The author of the annotation seemed to understand, calling the distribution "skewed". A Bell Curve is not skewed.
This is a pity because the designer might have selected a different chart type if she wasn't so enamored by the bell curve object.
The data tells us about users of 30-day unlimited passes in the New York City subway system: how many trips do they typically make? The card costs $81 while each trip costs $2 so anyone taking fewer than 40 trips in those 30 days would have been better off buying individual tickets. The "average" user took 56 trips. The range of trips taken was very wide, perhaps surprisingly so.
Several key pieces of information has been left off the chart. What is the total number of riders? Without this, there is no way for readers to understand 15,185. What is the smallest (and largest) number of trips taken by any rider? Visually, it appears that the horizontal axis does not start at zero.
It would have been better to show a cumulative distribution with percentages of riders on the vertical axis. On such a chart, we can read off the median and any percentiles. In other words, it would be much more informative.
As it stands, I like very much the annotation of the 56 trip and the 100 trip points: they are great aids to help decipher the chart. It would be great to indicate the 40 trip point too.
For those more technically inclined: the graph also begs the question of whether it is an actual or modelled curve. It looks too smooth to be actual data. If it is a model, then it is definitely not a normal distribution. What could it be? A spline?
As this report from the Department of Transportation makes clear, congestion on our roadways causes travellers to add "buffer time" to their planned journeys. So, for instance, one may have to allocate 32 minutes for a trip that would have taken 20 minutes in uncongested traffic if one would like to guarantee on-time arrival. The 12 minutes would either become time spent sitting on the road or wasted time due to arriving too early.
Buffer time can be applied to graphs too. Some graphs require readers to spend time fishing out the information. The chart used to illustrate travel time belongs to this category. The clock analogy fails; in fact, it confuses matters as the hour hand just sits there serving no purpose. The buffer time between staring and comprehending is too much!
Only four numbers underly this chart: travel time when uncongested and buffer time to guarantee on-time arrival, for 1982 and 2001. The following version gets to the point without fuss. It shows that the travel time increased significantly even under uncongested traffic; worse, the buffer time multiplied.
Reducing buffer time is always good but some buffer time may be inevitable. In the traffic analogy, to eliminate all buffer time would mean lots of unused capacity. In the context of graphs, more complicated charts would require more time; the key is whether the reader is rewarded for the time spent figuring out the chart.
Source: "Traffic Congestion and Reliability", Department of Transportation.
However, it is very difficult to find a good way to show the information. In fact, the data contained very little of that. Curiously, the ratings are very dispersed so that each line is graded high on some category and low on others. Here's one view of it:
I have grouped the subway lines together (A/C/E, 4/5/6, etc.). The metrics are plotted left to right in the same order as in the original. Is it all noise and no signal?
(I just realized the vertical axis is reversed: best ratings are at the bottom, worst ratings at the top. Doesn't matter anyway since I can't see any patterns.)
I've been reading my friend's anti-smoking tome, and traced this "infographic" back to its source (World Health Organization).
I was very intrigued by the "lines of death" which seemed to make the point that the risk of death had a spatial correlation: specifically, that the death risk for male smokers was higher in northern hemisphere (above the line), primarily developed countries, as compared to the southern hemisphere, mostly developing nations.
I find that somewhat counter-intuitive but in a fascinating book like this, that brings together both scientific, psychological and societal commentary, I was expecting to learn new things.
Looking at the legend, the red areas were regions in which deaths from tobacco use accounted for over 25% of "total deaths among men and women over 35". This explained some, as perhaps there were more reasons to die (warfare, other diseases, mine accidents, etc.) in developing nations than in developed nations, or that they had larger populations (so more deaths even at lower rates).
However, the description of the "lines of death" raised my eyebrows. It is now claimed that more than 25% of middle-aged people (35-69 years old) die from tobacco use in the red regions.
Did they mean 25% of the dead middle-aged people die from smoking? Or 25% of all middle-aged folks die from smoking? A gigantic difference!
Percentages are very tricky things to use. Every time I see a percentage, the first thing I ask is what is the base population. Here, the baseline appeared to have gotten lost in translation.
This set of maps also shows the peril of focusing too much on entertainment value, and losing the plot.
For those concerned about the effect of smoking on our society and our children, I highly recommend Dr. Rabinoff's highly readable new book, "Ending the tobacco holocaust". It contains lots of interesting tidbits and really brings together every cogent argument that exists, including the common ones you've heard and others you haven't.
One of my scientific heroes and seminal teachers is Professor Frank Kelly at Cambridge. What a pleasant surprise to see his involvement in a data visualization project. To cite his wise words:
The travel-time maps are more than just pretty to look at; they also
demonstrate an innovative way to use and present existing data. We are
entering a world where we have access to vast quantities of data, and ways
of turning that data into information, often involving clever ideas about
visualisation, are becoming more and more important in science, government
and our daily lives.
The little black dot near the center of the map indicates the Mathematics building at Cambridge. The contours (vaguely visible at our scale) represent intervals of 10 minutes by public transportation away from the black dot. Any colored dot on the map refers to the time at which a traveller must leave in order to get to the Math building by 9 am, taking into account traffic situation, time of day, and decisions. The hope of such maps is to help commuters (by public transit) plan their travel.
Professor Kelly has a very nice write-up on the intricacy of generating the data for such a map, which includes techniques of sampling, smoothing, extrapolation and so on. It is rare that we get insights into the chart-making process. He also carries a larger version of the travel-time map.