Apr 08, 2008

Pick-and-choose

Gelman pointed to this Brendan Nyhan post dissecting David Sirota's chart purportedly showing a "race chasm" in the Democratic primaries.  The left chart is David's original and the right is a Nyhan revision.
Sirota

Please see Nyhan for the political interpretation.  Here, I want to note a number of improvements Brendan made to the chart:

  • Sirota plotted the ranks of the percent of black population, which is misleading.  Nyhan plotted the actual percentages on his horizontal axis
  • Sirota connected the dots which highlighted the noise (ups and downs) in the data.  Nyhan fitted a linear model (he also tried other non-linear versions).
  • Sirota plotted Obama's overall margin of win/loss.  Nyhan plotted his margin among white voters only, which more directly addressed the issue.
  • Nyhan exposed the excluded states in a footnote.  Sirota didn't.  For this chart, this piece of information is very important since so many states were excluded.

Nyhan walked us through multiple charts he used to explore the data.  Much of the time was spent picking and choosing states to include or exclude.  We learnt that Sirota excluded states with large Hispanic populations, which Nyhan disagreed with while Nyhan wanted to exclude Florida, which Sirota decided against, even though Sirota excluded Michigan, which Nyhan consented but Nyhan also wanted to exclude the causus states, and so on...

Judging from the charts, this picking and choosing appears not to have changed the outcome in this case.  In general, one should exercise great care in such decisions because one might end up seeing what one wants to see.

The following chart is missing from the post, which I think points out something more telling than the negative correlation between Obama's margin with white voters and the proportion of black population.

Sirota2




Apr 07, 2008

Little things

Reader Daniel sent us a great example of how even little things matter a lot in chart-making.  The left chart is the original.  The right chart (created by Daniel) shuffled the order of the legend to match the curves, and spaced them out.  All of a sudden, the chart is much easier to read.


Library_2


Reference: "information behaviour of the researcher of the future", UCL, Jan 2008.

Apr 05, 2008

Making maps

I love articles that expose the behind-the-scenes of creating complex graphs.  This Wall Street Journal blog post tells us some dirty secrets behind these cartograms that depict the "influence" of different media outlets throughout the world.

Wsj_mediacartogram

(Via Andrew Sullivan; he's dissing NYT again)

Additional links of interest:

Original posting at Paris-based L’Observatoire des Médias blog

Boing Boing

Gawker

Online Journalism Blog (warning: this link is taken over by a rogue script from an advertiser or some other entity that distributes scripts so it wasn't loading when I tried)


Dec 02, 2007

Live dynamic graphics

In the second interesting item of the week, I return to the fabulous Google Finance chart, which shows the distribution of stock market returns by sector.  I wrote about it twice (here and here).  In the original post, I saluted the engineers for figuring out the formidable technical issues of turning a live dynamic data stream into a live dynamic graphic but didn't go into details.  (Trust me.)

Goog_oops The other night, this chart popped up on my browser.

Oops.

If someone kept track of each time such a mishap showed up, the tally would probably be 1-5% of the time.

The triple challenge of generating this graphic is the volume of data that needs to be processed, the velocity at which it changes, and the flicker of time from input to output, probably not more than a few minutes. The analysis and charting must be maintained continuously during market hours.  For any such projects, the thing to manage is the error rate, and one should be totally thrilled if it's in the range Google engineers have achieved.

Nov 12, 2007

New York Times: a tribute

As many of you realize, this blog owes a lot to the New York Times.  The Times is unique in its willingness to print interesting, sophisticated graphics.  Via the Social Science Statistics blog, I found out that Matthew Ericson is a deputy graphics editor, and he recently gave a gigantic presentation at the IEEE InfoVis conference. (You can download the entire document from his website.)

Nyt_houseshiftAs the SSS blog pointed out, the section on how they decided to visualize the shift in party margins by House districts, specifically to declare scatter plots as too "difficult for the masses", is fascinating.  It illustrates the idea of sketching that I have advocated here in the past. (The PDF of the complete graphic can be downloaded from here.)

From my point of view, the issue is less the type of chart than the level of aggregation.  The chart has a very appealing data-to-ink ratio (a la Tufte) but could less be more?  One of the secrets of making a good chart, and any data analysis for that matter, is to reduce complexity.  For example, is it crucial for every single district to receive equal treatment?  (Similarly, if scatter plots were chosen, is it crucial to include every district?)

*********************************************************

Nyt_bondsetal_2 Several examples of great charts can be found in Matt's presentation.  On slide 83, I admire the Bonds/Aaron/Ruth chart.  The inset showing the acceleration of Bonds from age 35 to 39, as compared to the decline of Aaron and Ruth during the same age span, is powerful.  Similarly, the effective use of foreground (blue) and background (gray) in comparing ARod, Pujols and Griffey against the big 3 is masterly (see right).

There is also a sequence on mapping the San Diego wildfires (slides 2-10), showing how they gathered population data to complement fire data, thus adding context to the threat to highly populated regions.

******************************************

On a different vein, the SSS blog, written by the people at Harvard's Institute for Quantitative Social Sciences, has written a number of engaging posts on data graphics recently.  Take a note at Visualizing Electoral Data, which coincidentally addresses a similar issue as the NYT party vote share graphic discussed above.

Sss_partisanswing This graphic plots the degree of party swings by UK parliamentary constituency.  The darker the color, the tighter the stranglehold by one party.  Going from top to bottom, the authors show party swings over successive elections.  The swing constituencies are therefore near the middle of the chart. 

 

Oct 21, 2007

Charts, charts, charts

Jorge Camoes has been a regular reader and sometime commenter for a while.  Little did we know that he has been blogging in Portuguese for the last 10 months.  Recently, he has decided to join the English-speaking world.  His new blog is, simply, Charts.

One post discusses the "population pyramid" chart for comparing advertising spending. 
ChartsspendHe suggested the overlapping bar chart; see his comment here.  By folding one side onto the other, this chart is clearly an improvement over the original, and yet it fails to convey the proportional spend, which is the key point being made in the article.

In another post, Jorge created a "screencast" (tutorial) of how to create a population pyramid in Excel.  A lot of this mirror my own experience using Excel for graphing.  Those of you who have asked for tips in the past should definitely see it.

What you'll find is that creating a nice-looking chart in Excel requires a lot of tedious finger-work.  It is truly incredible how many steps, how much opening and closing of windows, back and forth navigation, etc. users are made to suffer through to make cosmetic changes.

With the advent of AJAX and other interactive technologies, one can only hope that new graphing software will use the "canvass" metaphor.  If we want to reduce the spacing between bars, we should be able to grab the bars and move them together.  If we want to change the ordering, we should be able to mouse over some menu and select a pre-defined ordering scheme, or to drag and move bars around as we please. etc. etc.

(I have heard that Apple's spreadsheet software Numbers has some of these features.  I have yet to use it myself.  If any of you have, let us know what you think.)


Apr 01, 2007

Tricks of the trade 1

Handmad1From time to time, I get queries about what software I use to create junkart charts.  This is my first post on the wide-ranging topic, which I shall take up again.

My first rule of thumb is: develop the concept first, then worry about tools. 

I believe the software question is misplaced.  One should never allow tools to get in the way of one's imagination.

Like an artist, I carry a sketchbook in which I draw many versions of charts for each data set I come across.  Once I see each version, I can better judge what works, and what doesn't.  As I sketch, I'll sometimes find insights in the data I haven't notice before, which will prompt another round of sketches.  Until I finalize the concept, I don't think about software.  Until this point, it's as primitive as it gets.

What has all these got to do with the Madonna wall advertisement?Handmad2 Notice the artists standing on the crane in the lower left corner.  I was walking in New York while thinking about this post, and thought what a perfect example of sketching, or developing the concept.  The artists weren't deciding what and how to paint the ad while the crane scaled the ten-storey building; they already had it sketched out, both on paper and on the wall itself.  Here is the blown-up image of Madonna's unfinished hand.  The sketchmarks were clearly visible.  So next time you make a chart, try making sketches first!  

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31