Frequent contributor Bernard L. pointed me to this National Geographic "infographics". This surely belongs to the Art section of the infographics gallery, which I discussed in the "Whither Infographics" post. This fact is acknowledged by the editors who labeled this "Art: Fish Pharm".
It's a very pretty picture. And I'm cool to turn a blind eye to:
the uneven sizes of the pills
the dislocated, non-contiguous areas (diphenhydramine)
the dual-colored area (green-yellow), especially as the same green represented a different pill
the water bubbles treated as part of the fish
but I'm still debating:
Is it an artistic license taken too far to imply that pharma chemicals have completely stuffed the fish (so much as to also infect the exhaled bubbles) when the text actually said the fish contained "traces of pharmaceuticals and toiletries"?
The footnote apologizes for the percentages not adding up to 100 percent, but 100 percent of what?
And by the way, this is the first time I have seen the word "pharmaceutical" used as a noun to represent medicines manufactured by pharmaceutical companies. As a noun, I understand "pharmaceutical" to mean a company that designs and makes medicines.
The gulf between infographics and statistical graphics, that is.
Stan at Mashable praised "5 Amazing Infographics for the Health Conscious". They belong to the class of "pretty things" that are touted all over the Web but from a statistical graphics perspective, they are dull.
Reader Mike L. poked me about the snake oil chart (right) while I was writing up this post. The snake oil chart is by David McCandless whose Twitter chart I liked quite a bit.
This one, not very much.
If the location and cluster membership of the substances depicted have some meaning, I might even feel ok about the effervescence. But I don't think so.
I continue to love his pithy text labels though; the "worth it line", truly.
The data (if verified) is pretty useful though since there are so many health supplements out there, and as a consumer, it's impossible to know which ones are sham. (Ben Goldacre's site may help.)
Now, let's run through the low lights of the rest:
I'm still trying to figure out what plus-minus means in the Dirty Water graphic.
The fact that the four buildings are not considered one complete unit also trips me up. The Truckee Meadows is depicted as 7 buildings, not divisible by 4. In addition, if 2 short buildings + 1 tall + 1 medium = 200,000 people, how many people live in 2 tall + 1 medium + 4 short buildings?
The obesity charts are pinatas.
The cost of health care chart is boring, just a prettied up data table. Why are life expectancy statistics expressed in 2 decimal places, and not in years and months?
Why 78.11 years and not 78 years (or 78 years, 1 month)?
The scatter chart relating survival rates of people with various ailments and the survival rates of virues/bacteria left outside our bodies is alright but do we care about this correlation?
I hate to be so negative but I can't believe these are examples of good infographics.
My appeal for readers to send in positive examples still stand!
Stef, who had a hand in the inkblot charts that many loved, sent in the following chart, with the note that he hasn't seen these line/area charts before.
This chart is interesting indeed. The objective of the chart is to compare the state of drinking water in different regions of the developing world. It tries to emphasize the amount of improvement attained between 1990 and 2006.
I can't quite figure out how the regions are ordered. It's not by any of proportions depicted in the chart, nor position on the map.
Next, with the areas catching much attention, I wanted to figure out what the areas mean.
To help in this exercise, I computed the key piece of information, i.e. the increase or decrease in proportion of each water source, and placed on in each piece of area, as shown below.
Based on this evidence, one has to conclude that the area has nothing to do with the change in proportions over time. The brown areas (unimproved sources) are negative changes while the blue and light blue areas are positive changes. Negative area is not a visually depictable concept, unfortunately.
Also, note the dark blue areas of Latin America vs Western Asia. The Western Asia one is a bit larger than the Latin America but the change in proportions is exactly the reverse, 7% against 23%.
Is this a new type of chart? It took me a few days to figure it out.
How is the following chart related to the above chart?
The original chart is a cousin misfit of the above chart, as we can see below.
The key piece of data is embedded in the slopes of the connecting lines, and this cousin of the column chart with connecting lines draws our attention away from those lines and to the areas. The colored areas are in no way proportional to the slopes of the connecting lines and so the information has been distorted.
Nice-looking chart but needs rethinking.
PS. Some commentators seem to think that I suggested that the paired column charts would be a better alternative than the original. No -- I am using charts to analyze charts. An improved chart would be like the following, in which the areas are de-emphasized in favor of the lines. (Please imagine the vertical axis.)
I hinted at it in the last post, and some readers also made similar suggestions. What happens if we plot the U.S. life expectancy data in relative terms (indiced) rather than in absolute terms?
The result is highly revealing, and that is why we should always look at the data many ways. While in the original chart, the differences in the race/gender segments were essentially obscured by the overall slowly-growing trend, in our new chart, we took out the trend, isolating the growth rates.
The reconstructed chart showed that:
Between 1970 and roughly 1990, blacks of both genders gained in life expectancy at a rate higher than the national average, while white females lagged behind white males
However, almost all of the gain by blacks were attained between 1970 and 1984, and in the 10-15 years following, this excess gain was wiped out so that by 1992 or so, the black male, black female and white male lines again converged.
Starting in 1995, black males again achieved significant improvement in life expectancy. This time, black females did not follow their male counterparts. Meanwhile, white females continue to lag behind.
Not being a health care specialist, I can't say what happened to the cohorts of the 1970s, the 1980s and 1995. One thing is for sure: these insights are hard to glean from the original.
Reference: "CDC says life expectancy in the US is up, deaths not", Miami Herald, Aug 19 2009. CDC Life expectancy data.
Here are some interesting reading from other places:
Tag clouds have caught on since we approved them a while ago. One interesting use was at the Life Vicarious blog. They use it to compare the inclinations of three New York-based restaurant reviewers. What they should have done is to remove irrelevant words like "one", "also", "many", "make"/"made", etc. In statistics, this is called removing "noise" which helps bring out the "signal".
Andrew Gelman discussed the NYT article that reported the finding of unexpected male bias in the children of Asian American families. He can be counted on to make useful comments on any accompanying graphics. He rightly pointed out that this is one example of not starting at zero: the relevant baseline is 100 since the metric is essentially the over-age of males relative to females. I also agree that a line chart with a longer time series plotting percentages rather than over-age would work better.
The first thought that came to mind after browsing through all the charts was: what a great job they have done to generate interest in food data, which has no right to be entertaining. Specifically, this is a list of things I appreciated:
An obvious effort was undertaken to extract the most thought provoking data out of a massive amount of statistics collected by various international agencies. There weren't any chart that is overstuffed, which is a common problem.
It would be somewhat inappropriate to use our standard tools to critique these charts. Clearly, the purpose of the designer was to draw readers into statistics that they might otherwise not care for.Moreover, the Wired culture
has long traded off efficiency for aesthetics, and this showed in a graph such as this, which is basically a line chart with two lines, and a lot of mysterious meaningless ornaments:
A nice use of a dualline chart, though. It works because both data series share the same scale and only one vertical axis is necessary, which is very subtly annotated here.
The maintenance of the same motifs across several charts is well done. (See the pages on corn, beef, catfish)
It would be nice if Wired would be brave enough to adopt the self-sufficiency principle, i.e. graphs should not contain a copy of the entire data set being depicted. Otherwise, a data table would suffice. The graphical construct should be self-sufficient. This rule is not often followed because of "loss aversion"; there is the fear that a graph without all the data is like an orphan separated from the parents. Since, as I noted, these graphs are mostly made for awe, there is really no need to print all the underlying data. For instance, these "column"-type charts can stand on their own without the data (adding a scale would help).
Not sure if sorting the categories alphabetically in the column chart is preferred to sorting by size of the category. The side effect of sorting alphabetically is that it spreads out the long and the short chunks, which simplifies labelling and thus reading.
Not a fan of area charts (see below). Although it is labelled properly, it is easy at first glance to focus on the orange line rather than the orange area. That would be a grave mistake. The orange line actually plots the total of the two types of fish rearing, not the aquaculture component. The chart is somewhat misleading because it is difficult to assess the growth rate of aquaculture. Much better to plot the size of both markets as two lines (either indiced or not).
Reference: "The Future of Food", Wired, Oct 20 2008.
As a reader noted, this chart is essentially unreadable. It contains data for the composition of diets in four countries during two time periods.
What might we want to learn from this data?
Are there major differences in diet between countries?
Within each country, are there changes in diet composition over the thirty years?
If there were changes in diet inside a country over time, did those reflect a worldwide trend or a trend specific to that country?
Unfortunately, the use of donut charts, albeit in small multiples, does not help the cause. The added dimension of the size of the pies, used to display the total calories per person per day, serves little purpose. Seriously, who out there is comparing the pie sizes rather than reading off the numbers in the donut holes if she wants to compare total calories?
This data set has much potential, and allows me to show, yet again, why I love "bumps charts".
Here is one take on it. (Note that the closest data I found was for six different countries - China, Egypt, Mexico, South Africa, Philippines, India - and for different periods.)
The set of small multiples recognizes that the comparison between 1970 and 2000 is paramount to the exercise. There is a wealth of trends that can be pulled out of these charts. For example, the Chinese and Egyptians take in much more vegetables than the people of the other countries; in particular, the Chinese increased the consumption of vegetables drastically in those 30 years. (top row, second from left)
Or perhaps, for sugars and sweetners, consumption has increased everywhere except for South Africa. In addition, the Chinese eat a lot less sugars than the other peoples. (top row, right)
Egg consumption also shows an interesting pattern. In 1970, the countries had similar levels but by 2000, Mexicans and the Chinese have outpaced the other countries. (bottom row, right)
These charts are very versatile. The example shown above is not yet ready for publication. The designer must now decide what are the key messages, and then can use color judiciously to draw the reader's attention to the relevant parts.
Also, some may not like the default scaling of the vertical axes. That can be easily fixed.
Finally, here is another take which focuses on countries rather than food groups. We note that too many categories of foods make it hard to separate them.
I find it embarrassing for the Economist to print an article like this one. (Do they have a statistics editor?)
The subtitle asserting "causality" is offensive. It is alleged that smoking bans in bars have "caused" more road accidents because people are forced to drive longer distances to find those bars that still allow smoking.
To assert causality so starkly for an undesigned observational study is unprofessional. I doubt that the authors of the study they cited even went so far. At best, they probably found a correlation.
Another problem is the practical significance of the finding. There is a 13% increase in fatal accident rate in a "typical county containing 680,000 people". There are two problems with this statement:
When I check the Census data, there are only about 85 counties in the entire U.S. with at least 680,000 people. What do they mean by "typical"?
13% is said to be an increment of 2.5 fatal accidents, presumably per year. The crane accident in Manhattan a few weeks ago killed at least five people. I just don't believe that one can prove definitively that such a tiny difference is not due to chance so even the correlation, let alone the causality, is suspect.
It appears that the paper is locked up in pre-publication. If you have seen it, let us know if the authors actually asserted causality.
Reference: "Unlucky Strikes", The Economist, April 3 2008.