From Bernard L., another exemplary effort by the Times. This one really got me excited.
The set of line graphs shows how demographics of students in American schools have evolved in the last two decades. Here, I selected New York City schools, and the tool sensibly decided to compare those with New York State schools (gray line).
There is so much to learn from one simple chart:
The blue and gray lines are almost parallel everywhere, which tells us that in terms of the change in demographic composition, New York City pretty much resembled New York State during this entire period.
However, in terms of demographic composition, rather than the change in composition, New York City schools are very different from the rest of the state, in that the proportion of white is lower by a third while that of minorities are much higher, especially black and Hispanic students.
State-wide (as well as city-wide), black and white students have been declining as a proportion while Hispanics and Asians have increased.
The extent of the change is immediately visible, Asians have jumped from 7% to 14% for example.
From a graph design perspective, the execution is very clean. Data labels are limited to the first and last values. A small multiples concept is used with the ethnic groups placed side by side. A great awareness of foreground and background as well. And imagine how much data has been visualized here, and be impressed. You can look at any county in the country.
Here's one where the county change does not exactly mirror the state change (Napa in California):
Reference: "Diversity in the classroom", New York Times, March 12 2009.
Jerome C., a reader and blogger, wrote up a wonderful piece on different ways to publish charts on the web. Highly, highly recommended.
*** Rant ***
One of the points he made was that images (jpegs, gifs, etc.) are often published with poor quality. I feel the pain. Ever since Typepad switched to its new and "improved" editor, this blog has been suffering from low-quality thumbnails. I know, I know... I need to move to Wordpress. But from a 15-minute online research effort, I realized that moving a blog with lots of images is rather impossible! All of the images would have to be uploaded, and a lot of links would need to be fixed. Maybe the next time I am on holiday, I will get around to it. *** Excel ***
a look at his comparisons of four ways to forklift an Excel chart onto
a blog. (The image on the right showed one of the four ways.) The difference in image sharpness is marked.
Resizing Excel charts is a common source of headache. Always right-size the chart inside Excel before exporting!
*** Swivel, Google, etc.***
I also share Jerome's point of view on these on-line graphics creators. Good idea, wishing for more. In his words:
to make a point, you absolutely need to be able to control every aspect
of your graph, even if its form remains familiar: combine series, group
or highlight some datapoints, format axis, and so on.
I would like to explore the other options he cited, such as Processing.
*** Great example ***
Jerome's blog has a promising beginning. The following chart is both informative and beautifully crafted. It brings out the clear message that OECD countries have done admirably well in life expectancy, and particularly impressive in reducing the variance among member countries by lifting the expected age of the worse-off, relative to the better-off, with most of the gain happening during the 1980s. (Adding quartiles may also be meaningful. And I prefer to put the labels outside the plot area.) The graph does not explain what caused the shift in the 1980s but this is a great starting point for the curious.
Gelman pointed to this Brendan Nyhan post dissecting David Sirota's chart purportedly showing a "race chasm" in the Democratic primaries. The left chart is David's original and the right is a Nyhan revision.
Please see Nyhan for the political interpretation. Here, I want to note a number of improvements Brendan made to the chart:
Sirota plotted the ranks of the percent of black population, which is misleading. Nyhan plotted the actual percentages on his horizontal axis
Sirota connected the dots which highlighted the noise (ups and downs) in the data. Nyhan fitted a linear model (he also tried other non-linear versions).
Sirota plotted Obama's overall margin of win/loss. Nyhan plotted his margin among white voters only, which more directly addressed the issue.
Nyhan exposed the excluded states in a footnote. Sirota didn't. For this chart, this piece of information is very important since so many states were excluded.
Nyhan walked us through multiple charts he used to explore the data. Much of the time was spent picking and choosing states to include or exclude. We learnt that Sirota excluded states with large Hispanic populations, which Nyhan disagreed with while Nyhan wanted to exclude Florida, which Sirota decided against, even though Sirota excluded Michigan, which Nyhan consented but Nyhan also wanted to exclude the causus states, and so on...
Judging from the charts, this picking and choosing appears not to have changed the outcome in this case. In general, one should exercise great care in such decisions because one might end up seeing what one wants to see.
The following chart is missing from the post, which I think points out something more telling than the negative correlation between Obama's margin with white voters and the proportion of black population.
Reader Daniel sent us a great example of how even little things matter a lot in chart-making. The left chart is the original. The right chart (created by Daniel) shuffled the order of the legend to match the curves, and spaced them out. All of a sudden, the chart is much easier to read.
I love articles that expose the behind-the-scenes of creating complex graphs. This Wall Street Journal blog post tells us some dirty secrets behind these cartograms that depict the "influence" of different media outlets throughout the world.
In the second interesting item of the week, I return to the fabulous Google Finance chart, which shows the distribution of stock market returns by sector. I wrote about it twice (here and here). In the original post, I saluted the engineers for figuring out the formidable technical issues of turning a live dynamic data stream into a live dynamic graphic but didn't go into details. (Trust me.)
The other night, this chart popped up on my browser.
If someone kept track of each time such a mishap showed up, the tally would probably be 1-5% of the time.
The triple challenge of generating this graphic is the volume of data that needs to be processed, the velocity at which it changes, and the flicker of time from input to output, probably not more than a few minutes. The analysis and charting must be maintained continuously during market hours. For any such projects, the thing to manage is the error rate, and one should be totally thrilled if it's in the range Google engineers have achieved.
As the SSS blog pointed out, the section on how they decided to visualize the shift in party margins by House districts, specifically to declare scatter plots as too "difficult for the masses", is fascinating. It illustrates the idea of sketching that I have advocated here in the past. (The PDF of the complete graphic can be downloaded from here.)
From my point of view, the issue is less the type of chart than the level of aggregation. The chart has a very appealing data-to-ink ratio (a la Tufte) but could less be more? One of the secrets of making a good chart, and any data analysis for that matter, is to reduce complexity. For example, is it crucial for every single district to receive equal treatment? (Similarly, if scatter plots were chosen, is it crucial to include every district?)
Several examples of great charts can be found in Matt's presentation. On slide 83, I admire the Bonds/Aaron/Ruth chart. The inset showing the acceleration of Bonds from age 35 to 39, as compared to the decline of Aaron and Ruth during the same age span, is powerful. Similarly, the effective use of foreground (blue) and background (gray) in comparing ARod, Pujols and Griffey against the big 3 is masterly (see right).
There is also a sequence on mapping the San Diego wildfires (slides 2-10), showing how they gathered population data to complement fire data, thus adding context to the threat to highly populated regions.
On a different vein, the SSS blog, written by the people at Harvard's Institute for Quantitative Social Sciences, has written a number of engaging posts on data graphics recently. Take a note at Visualizing Electoral Data, which coincidentally addresses a similar issue as the NYT party vote share graphic discussed above.
This graphic plots the degree of party swings by UK parliamentary constituency. The darker the color, the tighter the stranglehold by one party. Going from top to bottom, the authors show party swings over successive elections. The swing constituencies are therefore near the middle of the chart.
Jorge Camoes has been a regular reader and sometime commenter for a while. Little did we know that he has been blogging in Portuguese for the last 10 months. Recently, he has decided to join the English-speaking world. His new blog is, simply, Charts.
One post discusses the "population pyramid" chart for comparing advertising spending. He suggested the overlapping bar chart; see his comment here. By folding one side onto the other, this chart is clearly an improvement over the original, and yet it fails to convey the proportional spend, which is the key point being made in the article.
In another post, Jorge created a "screencast" (tutorial) of how to create a population pyramid in Excel. A lot of this mirror my own experience using Excel for graphing. Those of you who have asked for tips in the past should definitely see it.
What you'll find is that creating a nice-looking chart in Excel requires a lot of tedious finger-work. It is truly incredible how many steps, how much opening and closing of windows, back and forth navigation, etc. users are made to suffer through to make cosmetic changes.
With the advent of AJAX and other interactive technologies, one can only hope that new graphing software will use the "canvass" metaphor. If we want to reduce the spacing between bars, we should be able to grab the bars and move them together. If we want to change the ordering, we should be able to mouse over some menu and select a pre-defined ordering scheme, or to drag and move bars around as we please. etc. etc.
(I have heard that Apple's spreadsheet software Numbers has some of these features. I have yet to use it myself. If any of you have, let us know what you think.)
From time to time, I get queries about what software I use to create junkart charts. This is my first post on the wide-ranging topic, which I shall take up again.
My first rule of thumb is: develop the concept first, then worry about tools.
I believe the software question is misplaced. One should never allow tools to get in the way of one's imagination.
Like an artist, I carry a sketchbook in which I draw many versions of charts for each data set I come across. Once I see each version, I can better judge what works, and what doesn't. As I sketch, I'll sometimes find insights in the data I haven't notice before, which will prompt another round of sketches. Until I finalize the concept, I don't think about software. Until this point, it's as primitive as it gets.
What has all these got to do with the Madonna wall advertisement? Notice the artists standing on the crane in the lower left corner. I was walking in New York while thinking about this post, and thought what a perfect example of sketching, or developing the concept. The artists weren't deciding what and how to paint the ad while the crane scaled the ten-storey building; they already had it sketched out, both on paper and on the wall itself. Here is the blown-up image of Madonna's unfinished hand. The sketchmarks were clearly visible. So next time you make a chart, try making sketches first!