A reader from Down Under sent this via twitter:
Seems like the editor fell asleep.
A reader from Down Under sent this via twitter:
Seems like the editor fell asleep.
On my holiday travel, I found a disguised donut chart in the Delta Sky Magazine (Dec 2010), talking about manufacturing jobs in the U.S. Then, flipping through the Spanish section at the back of the same magazine, I found the translated article, plus a translated chart. To my surprise, they look different:
Surprise No. 1: the sizes of the cog wheels are different. Even though the color is still mapped to year in the same way, somehow one of these authors decided to take liberty with the relative size. The suspect is the Spanish author who decided to make 2009 much larger and Jan to Dec 2012 much smaller.
Surprise No. 2: the use of commas within a number, and the format of dates differ by culture. That explains why the Spanish author removed the commas from the numbers, making it harder for me (English-speaking) to comprehend. Also, the swap from "01/12-09/12" to "Sep. 2012" suggests that Spanish speakers don't like the month/year formatting of dates. It also suggests that the Spanish readers have no trouble inferring that the "Sep. 2012" data point refers to "Jan. 2012 to Sep. 2012".
Surprise No. 3: The Spanish author improved the chart in one way. He grouped the annual data together via overlapping, leaving the 2012 partial-year data point by itself.
There are some problems with both charts. The most serious is the failure to project the 2012 jobs number. The chart seems to indicate that 2012 is a lackluster year, at best level with the previous years but in fact, the number of jobs in three quarters has already exceeded the full-year count of 2011, 2010 and 2009. Unless the fourth quarter is a particularly bad quarter for manufacturing jobs, it would seem that the message should be that 2012 is a great year of recovery. You can't tell from these charts: in particular, the Spanish author decided to shrink the 2012 cog wheel into insignificance.
The issue here is providing context for comparison. Even if the projected 2012 full-year number is provided, that may not be enough to judge whether manufacturing is healthy. Other useful context can be the growth rate of manufacturing versus other sectors of the economy; and the growth rate of jobs in relation to the population/work force growth rate.
As usual, a simple line chart displays the time-series data more clearly. (I simply linearly extrapolated the 2012 full-year number, which is probably an over-estimate. In practice, you can look up the data and figure out the ratio of Jan-Sept jobs to full-year jobs on average and inflate the number that way.)
The great Andrew Gelman did a Junk Charts style post today, and very well indeed.
Andrew created two alternatives, one is a line chart (profile chart) which is often a better option (despite the data being categorical), the other is more creative, and the better of the two.
Some of Gelman's readers complained that he arbitrarily "standardized" the data by indexing against the average of the countries depicted; one can further grumble that a 50% "excess" may sound impressive but it would be equivalent to less than an hour, perhaps not as startling. These types of complaints are fair but do realize that blog posts like these are primarily concerned with how data is best visualized. If one prefers a different indexing method, or a different set of countries, or a different color for the lines, etc., one can easily revise the chart to reflect those preferences.
The easiest way to see why the third chart is better than the first is that the strongest message coming off the first chart is that there are no material differences between these six countries in terms of time usage but in the third chart, the designer (here, it's Gelman) is asserting that there are interesting differences.
Received a wonderful link via reader Lonnie P. to this website that presents a historical reconstruction of W.E.B. DuBois's exhibit of the "American negro" at the 1900 Paris Expo. Amusingly, DuBois presented a large series of data graphics to educate the world on the state (plight) of blacks in America over a century ago.
You can really spend a whole afternoon examining these charts (and more); too bad the charts have poor resolution and it is often hard to make out the details.
Judging from this evidence, we must face up to the fact that data graphics have made little progress during these eleven decades. Ideas, good or bad, get reinvented. Disappointingly, we haven't learned from the worst ones.
(see discussion here)
(see discussion here)
(See discussion here.)
(see the Vampire chart here)
(see the discussion here.)
(see discussion here.)
My dislike of donut charts has been well documented. Click here.
What I want to discuss is the use of interactivity, a feature of this chart but something that backfires. The underlying data is a 5-level rating of "corporate sentiment" by industry, by country, and over time. That would be 4 dimensions jostling for space on a surface. Obviously, some decisions have to be made as to which dimension to highlight and which to push to the background.
This chart highlights the 5-level ratings using the donut device. All other dimensions are well hidden by the interactive feature. Pressing on the forward/backward buttons reveals the industry dimension. Pressing on the arrow on the top left corner reveals the time dimension. Pressing on the map reveals the country dimension.
The problem with this level of detachment is that readers are obstructed from viewing multiple dimensions at once. For instance, it is very hard to understand the differences in sentiment between different industries, or between different countries, or the change in sentiment over time.
The version on the right shows, for instance, the distribution of ratings by industry for Q3 2010, and for all Asia combined. This is a rough sketch, and one would want to fix quite a few things: making the sector labels horizontal, reducing the distance between the columns, labeling the ratings 1 as "very positive", ordering the sectors from most positive to least positive, etc.
A chart of ratings by country (aggregate of all industry sectors) would follow the same format. Similarly, one can compare ratings across countries, for a given sector... and this can be replicated 11 times for each sector. Similarly, ratings across industries for any given country.
For comparisons across time, I'd suggest using average ratings rather than keeping track of five proportions. This reduces a lot of clutter that does not improve readers' comprehension of the trends. A line chart would be preferred.
A better way to organize the chart is to start with the types of questions that the reader is likely to want to answer. Clicking on each question (say, compare ratings across industries within a country) would reveal one of the above collections of charts.
Another improvement is to add annotations. For instance, one wonders whether the airlines colluded to all give a 2 rating. It is always a great idea to direct readers' attention to the most salient parts of a chart, especially if it contains a lot of data.
In October 2007, I wrote about the "canvass" metaphor for graphing software. This was what I said:
With the advent of AJAX and other interactive technologies, one can only hope that new graphing software will use the "canvass" metaphor. If we want to reduce the spacing between bars, we should be able to grab the bars and move them together. If we want to change the ordering, we should be able to mouse over some menu and select a pre-defined ordering scheme, or to drag and move bars around as we please. etc. etc.
To push this metaphor further, this kind of software should facilitate the "exploratory" stage of graph-making. I blogged about this stage of making sketches before. One longs for software that allows one to flip through many different chart types quickly, to settle on the desired type, and then to make the nitty-gritty changes to the axes, colors, dots, etc.
The revolution has arrived in the form of JMP's Graph Builder function. It is not perfect yet, as even the example I use will show, but I'm excited because we are getting closer to that "canvass" metaphor.
I'm going to re-make this inedible pair of donuts from an otherwise quite nice infographics on the growth and nature of spam in the last 10 years. (New Scientist)
I have pointed out the biggest shortcoming of donut charts often: the fact that the most important clue to the size of each sector of the underlying pie chart, that is, the angle at the center of the pie, has been cut off from the chart, and often, as in here, obscured by a number.
There are dramatic shifts in proportions of spam types during the last decade but the effect is underwhelming as depicted.
By clicking on the word "Year" and dragging it to a box called "Overlay", I made a paired bar chart:
What about a dot plot instead? This change requires a right click but easy enough:
Here's where I encountered a little inconvenience. It's probably ignorance on my part since I didn't read the manual. I couldn't figure out how to increase the dot size for all dots at once, only one at a time.
In any case, I'm still searching. I want to do a small-multiples line chart. For this, I drag the word "Year" into the bottom of the chart labelled "X", and then right-click to add a line to the dot chart.
This is close to a desired chart type for this data. The change from year to year is highly apparent, and the increased and decreased spam types are also obvious. I would color the increases differently from the decreases if I have the time.
I had a very difficult time (and failed in) getting the year labels to say 1999 and 2009 which are the logical points for this data. JMP seems to have a mind of its own.
Since it takes no time, I experimented some more. By moving "Category" to "Wrap", I reproduced the above chart but in a matrix form:
Finally, I made the "Category" an "overlay" which resulted in this chart. This is kind of like the Bumps chart but obviously a bad idea for this data: (I'm not even showing the really ugly legend).
So, my dream toy -- the "canvass" style graph maker -- is here! It only takes a few minutes to move the data around this canvass, and see these different chart types.
I indicated that this goes a long way but isn't perfect. Right now, sketching and exploring is easy but refining and detailing is not as easy.
What I would like to see: once the general form of the chart is chosen, maybe a second canvass is needed, with Photoshop as a metaphor, in which we can chisel out the nitty-gritty details, like the axis labels, dot sizes, line widths and so on.
Also, the number of chart types can, and I presume will, be increased over time. For instance, I don't think the current version allows a profile chart; it seems to adhere to the overly-rigid rule that a categorical data series should not be connected by a line.
(I should say that in the current release, one way to accomplish this is to save the resulting graph-sketch as a "JMP script" and then go into the code and change things around. But since we are doing point and click, and visual interaction, why not go all the way?)
Most existing graphing software fall into two extremes: the Excel style which is super-rigid, or the R style which allows minute control over every little thing. This, I think, is the third way.
So said a reader, Stephen B., of the following graphic (note: pdf) in the London Times concerning Andy Murray's recent tennis triumphs.
As a reader noted, this chart is essentially unreadable. It contains data for the composition of diets in four countries during two time periods.
What might we want to learn from this data?
Are there major differences in diet between countries?
Within each country, are there changes in diet composition over the thirty years?
If there were changes in diet inside a country over time, did those reflect a worldwide trend or a trend specific to that country?
Unfortunately, the use of donut charts, albeit in small multiples, does not help the cause. The added dimension of the size of the pies, used to display the total calories per person per day, serves little purpose. Seriously, who out there is comparing the pie sizes rather than reading off the numbers in the donut holes if she wants to compare total calories?
This data set has much potential, and allows me to show, yet again, why I love "bumps charts".
Here is one take on it. (Note that the closest data I found was for six different countries - China, Egypt, Mexico, South Africa, Philippines, India - and for different periods.)
The set of small multiples recognizes that the comparison between 1970 and 2000 is paramount to the exercise. There is a wealth of trends that can be pulled out of these charts. For example, the Chinese and Egyptians take in much more vegetables than the people of the other countries; in particular, the Chinese increased the consumption of vegetables drastically in those 30 years. (top row, second from left)
Or perhaps, for sugars and sweetners, consumption has increased everywhere except for South Africa. In addition, the Chinese eat a lot less sugars than the other peoples. (top row, right)
Egg consumption also shows an interesting pattern. In 1970, the countries had similar levels but by 2000, Mexicans and the Chinese have outpaced the other countries. (bottom row, right)
These charts are very versatile. The example shown above is not yet ready for publication. The designer must now decide what are the key messages, and then can use color judiciously to draw the reader's attention to the relevant parts.
Also, some may not like the default scaling of the vertical axes. That can be easily fixed.
Finally, here is another take which focuses on countries rather than food groups. We note that too many categories of foods make it hard to separate them.
References: "Who's Eating What?", Wired, Oct 2008; "The Double burden of malnutrition", FAO, 2006.
The accompanying text proclaimed: "Rock stars are famous for excess, and some pay the price". The rest of the paragraph points out drug- and alcohol-related deaths, plus deaths due to "unhealthy lifestyles", which apparently include cancer and cardiovascular disease.
There is a gaping hole between what's on the chart and what's in the text. They just talk past each other.