Reading: WSJ Guide to Information Graphics
Mar 06, 2010
Dona Wong, who had stints on the graphics teams at both the Wall Street Journal and the New York Times, has contributed a how-to book on statistical graphics. It is called "The Wall Street Journal. Guide to Information Graphics".
The biggest strength of this book is the material on data collection and selection, which is an overlooked aspect of statistical graphics. The content of p.103, for example, is not typically found in similar books: on this page, Wong works through how to determine the scales for two stock-price charts in such a way that the distances represent relative changes in stock prices (rather than absolute changes). Chapter 3 ("Ready Reference"), which covers this type of material, is almost as big as Chapter 2, which runs through basic rules of making graphs that should be familiar to our readers. Her philosophy, then, leans toward Tukey's as espoused in his seminal book EDA, although Wong keeps to the most basic elements (percentages, indices, log scales, etc.), obviously aiming for a different audience than Tukey.
The guidelines relating to making charts are prescriptive and concise. The following snippet (pp.72-73) is typical of the style:
Wong focuses on saying what to do, but (usually) not why. Perhaps for this reason, the book has no references or notes, except for mentioning Ed Tufte as Wong's thesis adviser. Almost all the best practices described in the book would meet with our approval. One that has not been featured much on this blog is the preference for shades of the same color to many different colors of the same shade.
Despite the title, the book actually discusses statistical graphics (same as Junk Charts), not "infographics" (as covered by Information Aesthetics, for example). Almost all the graphical examples are conceptual, and not based on real-life examples. This editorial decision has the advantage of sharpening the educational message but the disadvantage of being less engaging.
A unique feature of Wong's book is Chapter 5 ("Charting Your Course"), which covers business charts used to organize operational data, rather than present insights -- things like Gantt charts (which she calls work plans), org charts, flow charts, 2-by-2 matrices, and so on. Things that are in the toolkit of management consultants. This is an under-studied area, and deserves more attention. I am reminded of Tufte's re-design of bus schedules. This type of charts is different in the need to print all pieces of data onto the chart, the prevalence of text data (and the difficulty of incorporating them into charts), and efficient search as a primary goal. And it is in this chapter that the decision to stay conceptual diminishes the impact: it would be very valuable for readers to see a complete Gantt chart based on a real project, and how it evolves over the course of the project. I have always found these types of charts to start out nicely but gradually sink as details and detours pile up.
There is one chart on p.59 I would like to discuss.
Here, Wong allows the use of double axes in certain cases, basically when the two data series have linearly-related scales. She appends the advice: "Adhere to the correct chart type for each series -- lines for continuous data and bars for discrete quantities... The only exception is when both data series call for a chart with vertical bars. In such instances, convert one to a line." (Regular readers know I don't think much of this rule.)
Based on the chart above, Wong either considers both revenue and market share to be discrete quantities, or considers revenue to be discrete and market share to be continuous. In my mind, both series are continuous data and a chart with two lines is appropriate here.
Revenue is definitely discrete; though it doesn't say it on that chart, I would assume given the context that the revenue units is "million dollars per month". Since market share is probably calculated from revenue, you could probably argue that one either way.
Posted by: Nick | Mar 06, 2010 at 03:55 PM
The double axis graph would be most effective as a scatterplot (revenue vs market share, with each data point labeled by its month). Despite being Tufte's former student, she doesn't seem to have taken on board his observation that the media overwhelmingly prefers time series data even when a single bivariate plot is more illuminating.
Nick: just because revenue is reported monthly doesn't make it "definitely discrete". The revenue is being earned continuously during those months, and market share also changes continuously. Binning into conveniently-sized time bins doesn't itself convert the underlying data from being continuous into discrete.
Posted by: Mike | Mar 07, 2010 at 03:04 PM
For me the problem is that you have to chart what you have, not what you want to have.
If Wong had monthly revenue and market share numbers to work with rather than more fine-grained readings, the chart above would be perfectly appropriate in my opinion. The choice is not a comment on the nature of the underlying data, it might just be an honest representation of the available data.
Am I missing something?
Posted by: Chris Mills | Apr 09, 2010 at 01:25 AM
Leaving aside the issue of whether the type of graph is appropriate, I really like how it looks.
I thumbed through a copy of the book a few weeks ago. I found a rule or two I didn't agree with, but I might buy it just because of how much I like the style of the graphs: simple, spare, compact, and clear.
Posted by: JF | Apr 09, 2010 at 04:07 PM