The New York Fed has put up some impressiveinfographics, depicting the credit conditions around the country.
(I was going to praise the producers for this effort - a rarity among government sites but when I went back to the site today, I find that the graphic is loading only occasionally, and try as I might, I could not get rid of the big brown patch that obscures the West coast. Hope you have better luck!)
As with similar maps, the unresolved problem is that the pattern is strongly dominated by the relative density of people around the country. As we click through different maps, they all look similar.
But this map -- as do most infographics -- does well to put structure onto unwieldy, large data sets. It acts as a table of contents to the data set.
Nick Rapp's team at AP produced this great chart on US life expectancy:
Very sensitive axis labels, putting the line labels into the chart (rather than in a legend box far away), showing only the current data, not every data point, subtle coloring to bring out the decades, etc.
I'm not sure about varying the thickness and tinge of the lines. Is it necessary? It hinders a bit as we try to compare the slopes of the lines. Adding the initial data labels (for 1975) would also help us judge the net change over the period depicted. If it is the change in life expectancy over time that is the main story, consider indicing all lines to the respective 1975 value.
Reference: "CDC says life expectancy in the US is up, deaths not", Miami Herald, Aug 19 2009.
PS. This is really an awful title for the article. It reads as if something bad has happened to the death rate while something good happened to the life expectancy, as if there is a paradox but the article makes clear that the death rate declined, which would make the point redundant.
When it comes to space or time in graphics, old habits die hard. When we have spatial data, the default is to put it on a map. When we have a time series, the default is to plot time along the horizontal axis. Sometimes, these defaults work; other times, breaking up the map or straight-time-line works better.
Thanks to a reader, I noticed that Google put up a "Flu Trends" website to help us track the flu season. They use two main charts to plot the data, as shown below.
On the right side is the time series, showing the severity of flu cases from month to month. There are many great things about this chart and one serious flaw. I love the fact that they did not plot time on the horizontal axis; they realize the seasonality and they create overlapping lines. They make good use of foreground and background; it's easy for us to compare year to year differences.
The serious flaw: no vertical scale. This was a problem with Google Trends from day one (see my post here). They still haven't fixed it. Because of this, we don't know if the peak shown was 5 cases or 5000 cases. While for Google key word searches, one can excuse them for trying to protect commercial secrets. I would imagine that this public health data is, well, public. Since the apparent purpose of this chart is to allow citizens to declare a flu epidemic (say, when they see the current trend depart from the historical norm), not having the scale is a huge problem.
I also disagree with shifting the months around for the Northern Hemisphere so that the peaks of the graphs are aligned towards the middle. It is better for the peaks to appear on the left and let the order of the months conform to our expectation. (The "peak" would be split on the sides and the chart would look like a valley, which presumably is why they did it this way.)
The charts on the left side plot the spatial data, not surprisingly on maps. Sadly, the standard exhibited on the time-series charts is nowhere found on these maps.
A particular genre of graphics is designed to induce awe: certain bits are allowed to stick out like a sore thumb. Via reader Andre L., and an archive of US Army medical photos and illustrations:
This is a small multiples graph designed to display the somewhat seasonal pattern of deaths due to influenza over years. Basically, we see a U shape in almost every year; however, the height of the peak, and the timing of the peak shows quite a lot of variation. Further, some years exhibit more of an L-shape than U-shape.
But the attention grabber here is the massive peak that occurred between 1918 and 1919. It was unusual in many ways... it was the second big peak during 1918, it occurred late in the year and ellided with the next year's peak. The designer allowed these two components to bleed into the other charts.
From the perspective of scale, readability, cleanliness, this bit sticks out like a sore thumb! But one has to say it is effective.
A log scale is often used to deal with data containing such outliers. But while this makes neater charts, the impact of the orders-of-magnitude difference is lost on the reader, except in her imagination.
Seth followed up his post about graphics with a specific post about pie charts versus bar charts. He prefers pie charts. We happen to agree with his unhappiness of grouped bar charts. Unfortunately he compared an univariate pie chart (depicting point-in-time data) with a multivariate bar chart (iluustrating time-series data).
Here we present a different example, derived from a NYT article on diabetes in America. The original chart is a series of pie charts, one for each age group, and one for the aggregate data.
The junkart version uses a bar chart. Readers can get a more precise comparison of the prevalence rates across age groups because it is easier to judge lengths than areas. This has been scientifically proven by the likes of Cleveland.
Dirty trick, you might say because the original chart actually prints the data in each pie.
So now there is no mistaking the data. This raises a philosophical question: why bother graphing the data if the reader needs to read the data in order to understand the chart? We call this the self-sufficiency test. The graphical elements of a pie chart can't stand on their own.
Christopher P submitted this chart, which is great for our light entertainment series. Apparently it came from the Netherlands and showed how privileged their citizens are compared to the rest of the world. It would appear that they need to reverse the color scheme (and font size?) to highlight the privileged. Comments welcome.
At first, this looks like a decent chart despite the donut construct, which I cannot stand (but the Economist loves).
The accompanying text proclaimed: "Rock stars are famous for excess, and some pay the price". The rest of the paragraph points out drug- and alcohol-related deaths, plus deaths due to "unhealthy lifestyles", which apparently include cancer and cardiovascular disease.
There is a gaping hole between what's on the chart and what's in the text. They just talk past each other.
The chart invites us to compare the European experience to the American experience. Each donut presents the proportion of total deaths by causes of death. The top donut presents American rock-star deaths, the bottom European ones. But this comparison has zilch to do with
the key point, which is how rock stars are different from the rest of
us. The chart tells us nothing about the rest of us. The 20% death by
cancer would be entirely unremarkable if 20% of non-rock-star deaths
also were attributed to cancer!
We must also bear in mind that the base populations are
rock stars who died young. This is a very specific demographic
segment, and so the only valid point of reference are people who died
young. If we think along those lines, then among unmusical people, if
they died young, what might have been the causes of death? Drugs?
Alcohol? Accidents? Suicide? You bet. I am not sure who is the
authoritative source of such data but the CDC reported that among
Americans aged 15-34 who died, the leading causes were "unintentional
injury", suicides, homicides, cancer and heart disease. Not much different from the above list...
The deaths depicted in the two donuts totaled fewer than 100, and yet percentages are given to one decimal place. This creates a false sense of precision not justified by the sample size.
The deaths occurred over about 50 years. It is very likely that the causes of premature death have shifted during this time span, making an aggregate analysis questionable.
Charting is much more than just aesthetics. Some basic statistical common sense goes a long way. This was observed long ago by Huff.
David Leonhardt wrote in the NYT of a shocking incident of statistical abuse committed by Lou Dobbs and the CNN crew.
On several recent occasions, while commenting on the red-hot immigration issue, Lou and company remarked that "there had been 7,000 cases of leprosy in this country over the previous three years, far more than in the past". (Leprosy is a flesh-eating disease prevalent among immigrants, particularly of Asian or Latin American origin.)
When asked about fact-checking, Lou reportedly said: "If we reported it, it's a fact." A quick visit to the government's leprosy program web-site immediately reveals the time-series chart, shown on the left. With annual rates at about 150 in the last 5 years or so, one is hard impressed to find the 7,000 alleged cases!
Furthermore, because this chart lacks comparability, we fail to see that 150 cases out of a population of 300 million represent a minuscule risk.
A slight downward trend is evident in the last 20 years or so; this record is even more impressive when we realize the population grew during this period. These points can be made clearer in multivariate plots.
This graphic appeared on the front page of the British paper, the Independent. I find it to be effective, although defiantly not efficient a la Tufte: the data-to-ink ratio is abysmal. Two data points on the entire page, with both data labels drawn in extra large font!
It can be improved if the 24 guys are given a different color so we can see the amount of improvement between 1971 and "NOW".
Some may complain that the use of percentages obscured population growth during this period. Perhaps there should be fewer men on the left than on the right. Unfortunately, that would in turn obscure the comparison of percentages.
A bit of research into the data (at Cancer Research UK) reveals that the average survival rate hides a very wide range of rates (by type of cancer, by gender, by gender and type, etc.). One might argue that the average is quite meaningless for most users.
An alternative construct is a time series chart showing the increase in survival rate over time. It would plot more data and depict a trend (or lack thereof). I'd have to agree with the editor that such a chart would look unattractive on the newstand.
I've been reading my friend's anti-smoking tome, and traced this "infographic" back to its source (World Health Organization).
I was very intrigued by the "lines of death" which seemed to make the point that the risk of death had a spatial correlation: specifically, that the death risk for male smokers was higher in northern hemisphere (above the line), primarily developed countries, as compared to the southern hemisphere, mostly developing nations.
I find that somewhat counter-intuitive but in a fascinating book like this, that brings together both scientific, psychological and societal commentary, I was expecting to learn new things.
Looking at the legend, the red areas were regions in which deaths from tobacco use accounted for over 25% of "total deaths among men and women over 35". This explained some, as perhaps there were more reasons to die (warfare, other diseases, mine accidents, etc.) in developing nations than in developed nations, or that they had larger populations (so more deaths even at lower rates).
However, the description of the "lines of death" raised my eyebrows. It is now claimed that more than 25% of middle-aged people (35-69 years old) die from tobacco use in the red regions.
Did they mean 25% of the dead middle-aged people die from smoking? Or 25% of all middle-aged folks die from smoking? A gigantic difference!
Percentages are very tricky things to use. Every time I see a percentage, the first thing I ask is what is the base population. Here, the baseline appeared to have gotten lost in translation.
This set of maps also shows the peril of focusing too much on entertainment value, and losing the plot.
For those concerned about the effect of smoking on our society and our children, I highly recommend Dr. Rabinoff's highly readable new book, "Ending the tobacco holocaust". It contains lots of interesting tidbits and really brings together every cogent argument that exists, including the common ones you've heard and others you haven't.