Jun 12, 2009

Reading comprehension

Note: I am in the middle of a holiday and so posting will be limited.

Andrew posted a pretty chart that caught my attention.  This is the sort of sophisticated chart that rewards careful reading. 

Vouchermaps2000

Below is a guide to reading the chart:

  • It is a small multiples chart with the components arranged in two dimensions (income levels, and a race-religion hybrid category).  The top row is a summary of voters of all race-religion grouped by income.  Note that there is no corresponding summary column for voters of all incomes grouped by race-religion.
  • Source of data: 2000 poll but applied to 2008 demographic patterns.  In other words, there is an underlying assumption that opinions have stayed stable within the demographic groups.
  • The chart is in fact three dimensional because each map gives us the geographical (state by state) breakdown.
  • It is useful to figure out the smallest unit of data: in this case, this is the percentage support of federal school vouchers by voters of a given race-religion-income-and-state category.
  • The color scheme is such that red represents highest support and blue lowest support, with pink and purple in the middle
  • It's almost always better to start from the aggregate (that is, the average) and then study variations along different dimensions, and this is how the chart is arranged from top to bottom
  • On the top row, the higher income groups tended to favor vouchers more than lower income groups, with a break point around $75k; even here, the regional differences are significant, with northeast and southwest hotter for vouchers at all income levels
  • As we move from row to row, we realize that the aggregate data hides many disparities.  For example, white Catholics (second row) are more likely to support vouchers regardless of income level while white non-evangelical Protestants (fourth row) are much less likely than average to support vouchers at all income levels.
  • Notice that the statistician (Andrew) has carefully defined the race-religion categories to balance between collapsing subgroups that are distinct and showing too many subgroups so as to cloud the patterns.  That is why there are many more race-religion subgroups that are not shown.  The ones shown are of special interest.  Consider the white protestants, evangelical vs. non-evangelical (third and fourth rows).  If one were to fix the race, geography and income dimensions, and even fix half of the religion dimension, we still find the two subgroups to be on different ends of the spectrum relating to the voucher issue.  This is why the evangelical or not dimension has been included.
  • The white space is interesting.  Here, the issue faced by the statistician is sparse data when one gets down to multi-dimensional subgroups.  Andrew chose to ignore all the data, which is the wise thing to do.  With so few samples, it is particularly easy to draw bad conclusions.   
  • Because of the white space, we get additional information on the spatial distribution of the demographic subgroups.  The black population (at least the voters) are predominantly found in the southeast while Hispanics are in the southwest.  The subgroup of income higher than $150k is essentially all white.  Admittedly, this is a very crude read because we only have two levels (below 2% of state population and above).  Of the colored states, we cannot differentiate between densely populated and not.

  

Such rich graphics deserve careful reading.  Enjoy!
  

May 12, 2009

Spinning multi-color 2

Here are two more versions of the greenhouse gas chart.

The first one is a Marimekko which many would consider to be appropriate for this type of data.  It is essentially a stacked bar chart where the width of the bar is scaled to the proportion of the type of gas.  Here's what one would be looking at:

Redo_greenhouse2


Merimekkos (also called Mosaic charts) share many of the problems of pie charts.  Note the need to use multi-color, the difficulty in comparing the areas of the pieces (even worse than looking at sectors), and the difficulty in comparing across categories since the pieces float in irregular space (take for example the three pink pieces).  My rule is: avoid at all costs. (Well, like the pie chart, when the data is sufficiently simple, with very few pieces and with some outliers, these could be acceptable.)


Secondly, here is a recycled junkart chart, with all white space removed from the interior.  (Thanks to Derek for the suggestion.)

Redo_greenhouse3


Depending on what the purpose of the chart is, one can decide what is the base for the proportions.  My version preserves equity between the two dimensions.  Anything else will require the designer to make a choice.  If, for example, the base is 100% for each type of gas emitted, then the reader could not derive from the same chart the proportion of each source of emission (across all types of gases).


Spinning multi-color

New York Times has a great pointer to the Global Warming Art website.  The author Robert Rohde wants to popularize environmental science by visualization of the data.  There are many interesting charts and well worth repeated visits.

These pie charts cry out for some re-dressing:

Greenhouse_Gas_by_Sector

The pie charts, the colors, the whole works.  Most troubling is that each pie has its own sorting scheme, and because the text labels were not reproduced in the smaller pies, the reader is sent scrambling around to find the right labels.

In addition, these pie charts, as with almost every other pie chart, fail the self-sufficiency test.  Without all the data printed next to each sector, the reader is simply unable to judge the size of each sector.

Further, the aggregate data (larger pie) may not be as relevant after realizing that the smaller pies show very different patterns.  The following junkart version tries to bring out this fact by treating both dimensions (type of greenhouse gas; source of emission) equitably.

Redo_greenshouse


While I picked on this particular chart, I must say I support Robert's effort and wish him luck in this very well-intentioned project.



Apr 21, 2009

Some things never fall

It's New York City.  House prices do not fall.  Enjoy the graphic!


Nyt_houseprice

Click here for the interactive version

Reference: "For Housing Crisis, the End Probably Isn't Near", New York Times, April 21 2009.

PS. Memo to grammar teachers: When did they start using contractions in titles? 

Apr 19, 2009

Don't mess with the scale

My friend Patrick pointed out the single biggest issue with the chart below -- that the designer chose a scale that precisely undermines the message of the chart.  Undermine may be too mild a word to use here; annihilate may be more apt.

Dow2


The lines in this chart are anchored at the zero point on the time line (horizontal axis) used to indicate the bottoms of various bear markets in the Dow from 1929 to 2007.  From that anchor, time runs to the left showing the amount of time for the Dow to go from peak to bottom (the decline); time runs to the right showing the amount of time for the Dow to climb back to the prior peak (the recovery).  As the caption said, the point of the chart is "if the decline was fast, the recovery took a considerable time".

Funny thing then that the distances from the zero point are roughly comparable on the left as on the right.

This illusion resulted from some very convoluted and perplexing messing around with the horizontal scale.  First, the left-of-center scale is in months while the right-of-center is in years.  Second, the left-of-center scale has normal spacing while the right-of-center seemingly was suffering from spasms.  Take a closer look:

Dow2_right The first five years (0-5) took up about half the scale while the next five (5-10) took maybe one-eighth.  The first year (0-1) took about as much space as the next two years (1-3).

I am not quite sure what is the logic behind this but since the message of the chart has everything to do with the time duration, it is most unfortunate to introduce such distortions.

There is yet another "innovation" in this chart.  Notice that on the right side, the axis labels are irregular (more spasms)... 0,1,3,4,5,10,15,20, 25...  This is as if the designer is posing one of those IQ questions requiring readers to figure out the next number in the sequence.  The specific time intervals selected may have meaning: note that all the lines are straightened out in between these tick marks.  Given that each line represents a different historical sequence, it is difficult to comprehend the regularity of these intervals across history.  Perhaps this will prove to be the key to unlocking the secret of this chart.  Please comment below if you are able to unravel this mystery.

Besides, the same type of "innovation" was not applied to the left side of the chart.  Here, the designer opted to throw out all the data between the peak and the bottom and straightened out all the intermediate fluctuations.

Below are two different versions of this chart, basically restoring the time scale to the normal, equally spaced, symmetric appearance.  The top one used monthly Dow returns where the volatility obstructed our understanding of the trends, requiring the use of color to differentiate the lines.  In the next version, I used R to generate the loess estimates (a type of smoothing) and the trends became clearer.  (There was a prior discussion of loess on Junk Charts here.)


Redo_dow2
 

Now, these pictures are very different from the original graph!

I'd be very cautious about reading into these charts anyway.  This question is not one suitable for statistical analysis.  The sample size of six is far too small.  Each recession is different in terms of causes, remedies and context.  The fact that we call them recessions do not make them comparable.  Further, it is also impossible to know at this stage if the 2007 decline has reached bottom.  The chart designer essentially assumed this to be the case but who knows?


PS. Nick Rapp, one of the designers of the chart, responds in the comments.  He has started a blog to feature the work of his graphics team at AP.  His colleague has created an interactive version.  More than anything, this post highlights an aspect of the chart that Nick and his team clearly spent a lot of time doodling over.  The concept of the chart itself is wonderful actually, if I didn't say so already; it is essentially the same chart as the oft-printed chart where the anchor point is the start of each recession, only here the anchor is the bottom of each recession.

Apr 06, 2009

Pure delight

Nyt_infantmortality  My favorite Bumps chart in the New York Times ...


For the purist, this is the original rank-based version.

With judicious use of color and background/foreground, this makes for a good story.

The color scheme here, however, is a bit bland.  Green for improvement, blue for decline and orange for USA.

Note, for example, New Zealand and England both suffered similar drastic drops as the US.

It would be better to (for example) split out the large improvements and large declines, or to split out the developed world versus the developing world.

This chart is created like this probably because the accompanying piece makes only passing reference to this chart so there is not a clear message to the creator what to do with the data.  

Interestingly, there were no ties in 1960 but quite a few ties in 2004.  I wonder why.  I'd shift the dot to the mid-point between ranks rather than move them up to the higher rank.

All in all, a much more engaging way to present this data than the reams of table found in say the UN World Development Report.


Reference: "Vital Statistics: U.S. Still Struggling With Infant Mortality", New York Times, April 6 2009.




An art class?

Robert F. pointed us to these charts, via the Digital Design Blog.  A larger version is found here.  These look like scraps from an art class, exploring perspective and 3D.

Mtcc These types of charts are quite prevalent in the web analytics area.  We have a long way to go in terms of producing good visualization of such data.


For even more light entertainment, click here.  (Warning: not for the easily offended, language purists, and mildly not safe for work).  (This is via Pete S).

Mar 28, 2009

Knowing what one is doing

Jess B. sent in some entertainment.


Billshrink

In case the font is too small:

Billshrink2

There is a lot more here, including the author's note in the comments section.

Mar 22, 2009

Colorful maps

Bernard L. loved the recent NYT take on immigration in America.

The very pretty maps are found here.

Nyt_immigrants

An amazing amount of data is being visualized here.  Mousing on the map will pick up the specific data for each county.  There is a bar up top for discovering the evolution over time.  It would be great if there is an animation button so the map can be played out without clicking.  An animated gif will also do (similar to the disease map we featured some time ago).

Nyt_imimigrants_scale The colors on the first map represent the origin of the top ethnic group in each county.  Within each group, the tint of the color further displays the percentage of the population that group accounts for.  The subgroups appear to be 0-2%, 2-5%, over 5%.  The last subgroup is very wide.


Not so keen on the second map with all those bubbles.  They show the number of people from each country by county.  The bubble size is proportional to population.  Every version of this map looks the same because the population is concentrated in the cities and the interior is sparsely populated, no matter what ethnic group.


Regardless, this is another laudable effort by the crew at the Times.


Reference: "Immigration Explorer", New York Times, March 10 2009.





Mar 17, 2009

The trouble with maps

Todd B. pointed us here.  These are maps that supposedly show the distribution of respondents for each answer choice in a survey exploring accents in different parts of the country.  The full set of maps for every question can be found here.  


Caramel

Amusingly, the researchers also provided a map of "all respondents".  (I won't ask how the proportions of respondents were reduced to binary output to produce the above maps.)

Caramel2
Here is Todd to lead off the discussion:

Just because you put data on a map doesn't make it effective. Check out these mapped responses that either tell us that there is no difference in dialects or fails to illustrate differences effectively.



Reference: "Dialect Survey", University of Wisconsin-Madison.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31