The New York Fed has put up some impressiveinfographics, depicting the credit conditions around the country.
(I was going to praise the producers for this effort - a rarity among government sites but when I went back to the site today, I find that the graphic is loading only occasionally, and try as I might, I could not get rid of the big brown patch that obscures the West coast. Hope you have better luck!)
As with similar maps, the unresolved problem is that the pattern is strongly dominated by the relative density of people around the country. As we click through different maps, they all look similar.
But this map -- as do most infographics -- does well to put structure onto unwieldy, large data sets. It acts as a table of contents to the data set.
Nick Rapp's team at AP produced this great chart on US life expectancy:
Very sensitive axis labels, putting the line labels into the chart (rather than in a legend box far away), showing only the current data, not every data point, subtle coloring to bring out the decades, etc.
I'm not sure about varying the thickness and tinge of the lines. Is it necessary? It hinders a bit as we try to compare the slopes of the lines. Adding the initial data labels (for 1975) would also help us judge the net change over the period depicted. If it is the change in life expectancy over time that is the main story, consider indicing all lines to the respective 1975 value.
Reference: "CDC says life expectancy in the US is up, deaths not", Miami Herald, Aug 19 2009.
PS. This is really an awful title for the article. It reads as if something bad has happened to the death rate while something good happened to the life expectancy, as if there is a paradox but the article makes clear that the death rate declined, which would make the point redundant.
The original graph threw us off our sense of scale. It seemed to be saying all these oil companies are roughly the same size but one grew much faster than the others. The red color and the setting off of the data above the title of the chart seemed to announce some important find.
The junkart version on the right reversed everything to our normal sense of scale. It is a version of the bumps chart, one of my favorites.
So we find that Total is the smallest of these oil companies, about half the size of ExxonMobil -- you wouldn't know that from those abysmal bubbles! Adding to the problem is that the growth data is used to sort the companies while the actual production data is hidden in the data labels.
Total is indeed growing faster but BP is not far behind. The fall of ExxonMobil and Royal Dutch Shell is equally intriguing.
As a reader noted, this chart is essentially unreadable. It contains data for the composition of diets in four countries during two time periods.
What might we want to learn from this data?
Are there major differences in diet between countries?
Within each country, are there changes in diet composition over the thirty years?
If there were changes in diet inside a country over time, did those reflect a worldwide trend or a trend specific to that country?
Unfortunately, the use of donut charts, albeit in small multiples, does not help the cause. The added dimension of the size of the pies, used to display the total calories per person per day, serves little purpose. Seriously, who out there is comparing the pie sizes rather than reading off the numbers in the donut holes if she wants to compare total calories?
This data set has much potential, and allows me to show, yet again, why I love "bumps charts".
Here is one take on it. (Note that the closest data I found was for six different countries - China, Egypt, Mexico, South Africa, Philippines, India - and for different periods.)
The set of small multiples recognizes that the comparison between 1970 and 2000 is paramount to the exercise. There is a wealth of trends that can be pulled out of these charts. For example, the Chinese and Egyptians take in much more vegetables than the people of the other countries; in particular, the Chinese increased the consumption of vegetables drastically in those 30 years. (top row, second from left)
Or perhaps, for sugars and sweetners, consumption has increased everywhere except for South Africa. In addition, the Chinese eat a lot less sugars than the other peoples. (top row, right)
Egg consumption also shows an interesting pattern. In 1970, the countries had similar levels but by 2000, Mexicans and the Chinese have outpaced the other countries. (bottom row, right)
These charts are very versatile. The example shown above is not yet ready for publication. The designer must now decide what are the key messages, and then can use color judiciously to draw the reader's attention to the relevant parts.
Also, some may not like the default scaling of the vertical axes. That can be easily fixed.
Finally, here is another take which focuses on countries rather than food groups. We note that too many categories of foods make it hard to separate them.
Joran E. pointed to this "icky" chart he found on Clive Crooks' blog at the Atlantic.
He ordered a "junkchart treatment", so here it comes.
First we wanted to process the triangles, dots and squares to make sense of this data. We noted that the data came from a single year (2005) so the chart did not trace the development of the education sector over time. But wait, it used a different route to get at the same idea. The author compared different generations within each country to see if more and more citizens took university degrees. So each vertical "arrow" was kind of a historical record of different generations within a country. Under this criterion, Korea and Japan had come a long way while the US and China stagnated.
The chart is quite impossible to read as designed. There is little reason to sort by 25-34-year-old proportion when the message concerns improvement over generations. Besides, what about countries that apparently retrogressed? (like Russia and Germany)
For this data, I returned to my favored bumps chart. Here is version one. There are two ways to read this chart: across countries, we note that most of the European states (blue) had similar profiles showing roughly a constant rate of growth. The Asian duo of Japan and Korea (brown) had the most marked growth. Of North America (black), Canada diverged from the US since the 35-44 generation.
Alternatively, we can focus on the change generation-over-generation. From 55-64 to 45-54, almost all countries in this sample (except Japan) grew at the same rate. Then between 45-54 and 35-44, the two Asian countries clearly set the pace. The generation between 35-44 and 25-34 is most interesting: Korea has not slowed, Japan has slowed a little but still grew as fast as Canada. A trio of European countries (Spain, Ireland, France) outpaced their neighbors.
Below I show version two. This one combines bumps chart with small multiples. North America, Europe and Asia/Australia are now in separate charts. This removes clutter.
This set of charts covered the back page of one of New York Times' sections this weekend.
Regular readers will share my enthusiasm for the top chart. It makes a clear, cogent case to support the article's thesis concerning the rise of bottled water. Various renditions of this type of chart have appeared here, for example.
Specifically, the smart use of color to cluster the line objects helps interpret the trends. Blue sets out the two primary interests. (It's a mystery to me why the gray lines were separated into darker and lighter hues.)
The twenty-year horizon used is another nice touch. I'd remove the gridlines although they aren't too distracting here.
Sadly, the second graphic does not meet the high standard of the first. The biggest problem concerns the red rectangle, purportedly showing how much of the bottled water was imported. The choice of differently-sized bottles as objects makes it impossible to gauge what proportion of the total was imported. If the rectangle was placed over 1-litre bottles instead, it would look smaller.
The Harvard Social Science Statistics blog pointed to an NYT article about revenue optimization in the airline industry. Huge props to the Times for explaining the science (and art and politics) of one of the most successful applications of operations research.
In short, valuable business travellers want refundable tickets. Because of this and other reasons, about 10% of booked tickets become no shows. Airlines recoup the loss by over-booking. Implicitly, they trade off the potential for dissatifying a few unlucky passengers (who would be bumped from their flights) and the potential for flying with 10% empty seats (in addition to unsold seats). Optimization algorithms (constantly tuned by entry-level staff) try to strike a balance.
Recently, because the average percentage of seats sold has been
going up, the room for such maneuvreing has been squeezed, leading to
higher bump rates, and more travellers being stranded. There is some variation across airlines due to the level of sophistication of their revenue optimization algorithms, corporate strategy, etc.
The following charts present data by airline of the bump rates in 2005 and 2006. One would be interested in answering questions such as:
Which airlines have the best (or worst) bump rate?
Are some airlines consistently better (or worse) at controlling the bump rate?
Which airlines have improved (or worsened) from year to year?
Are the differences of practical significance?
The original chart shown on the left does not reveal the answers readily. My favourite bumps chart offers them up clearly (well, except on the question of significance).
The biggest problem, though, is the header: number of passengers per 10,000 bumped. The data plotted appeared to be the reverse: the number of bumps per 10,000 passengers. Otherwise, there would have been more bumped passengers than passengers!
Many authors have exposed and harangued statistical liars (e.g. "How to Lie with Statistics"). Likewise, I rant here once in a while. However, not every distortion of reality is unwarranted. Sometimes, distorted data actually bring out key insights. I go back to the Bumps chart to illustrate this point.
In a previous post, I remarked that the vertical axis can represent either ranking or boat locations along the river. Reading the chart from left to right as if from start to finish of the race, we suggest the right-side list displays the ending ranks or ending locations of boats.
On second thought, the right-side list cannot give us the ending locations! Physically, the boats would have moved downstream so the entire list needs to be shifted downwards to be precise. But we feel comfortable with the current arrangement: this is a distortion of reality which does not affect our reading ability. Indeed, it enhances our ability to see into the data because now a horizontal line means no change in ranks.
If one is very particular, then one should interpret the right side as next year's starting locations rather than the current year's ending locations. Then all is well.
In many situations, reducing continuous data to ranks introduces significant distortion and is thus not advisable. For the Bumps chart, because the Bumps rules require that all boats start next year the same distance apart, in essence wiping out the year-end separations, the form perfectly fits the function! This distortion removed information not needed to grasp the key point of the chart, so no harm done!
As a side note -- Tim Granger has produced a side-by-side Bumps chart, even more marvellous than the single-period chart. In my junkart version, I removed the horizontal line segments linking one year to the next. These line segments contain no data; besides, based on the discussion above, each right vertical axis should be interpreted as next year's starting locations rather than this year's ending locations, so these line segments are unnecessary.
PS. In case you're wondering, Tim colored some lines red to indicate boats that managed to bump up each of the four days in a specific year. These teams win an award called the "blades". If the purpose of the chart is to identify the rise and fall of boat club dynasties then we would have colored the trajectory of Pembroke (6) and Queens (19), for example.
On the left is my beloved Bumps chart (Cambridge 2005 May Bumps). It has a perfect union of function and form. Here are some salient features:
The horizontal axis records time: the first and second columns of text display the starting and ending orders of the college boats. The zigzagging lines delineate each boat's movement over the four days of the race.
The vertical axis serves dual functions: it both gives the current ranking and maps to the physical location of the boats along the river.
What we care about is the movement of a boat over the four days; what we really care about are boats that have moved a lot, either up or down. The chart manages to highlight precisely what we want to see: the larger the movement, the steeper the line, the more attention it gets from our eyes.
Focusing on #10 and #11: the criss-crossing lines tell a rich story of tit-for-tat over four days, in which the boats exchanged bumps during the first three days, with the Jesus boat leading after day 4.
The story at #1 (Caius) was altogether different: as "Head of the Cam", this strong boat eluded the chasing fleet all four days.
My alma mater started and ended at #3 (Trinity Hall)
A truly spectacular chart can be produced by placing all the historical 4-day charts side-by-side, painting a rich history of the rise and fall of different boat clubs over decades. If anyone has seen such a chart, please send it my way!