« June 2007 | Main | August 2007 »

Transgender trends

One of the many gratifications of blogging is to connect with others who have similar interests; so it has been fantastic to receive user submissions (though admittedly I don't check my inbox frequently enough).  The thoughtfulness of these nominations continues to impress me.

Evan sent in 254 charts he created after looking at the post on baby namesJordanv31970200528yrs_2An example is shown on the right. 

He is particularly interested in the question of names that are given to both males and females. 

For example, the bottom chart shows that Jordan is primarily a male name, and saw a period of growth followed by decline, although the decline has been more severe on the male side than the female side. 

It's a nice touch to label the most recent year.  I'd also label the values for the most recent year on the axes.

Evan also offers the following solution to the scaling problem we identified in the original WSJ chart:

My solution was just to put two charts on each chart. One at a fixed scale for every chart to give a sense of size and one at a variable scale to better show the shape of the plot.

In other words, for less popular names, the top chart would look much more compressed.

There are many more charts to sift through on his site.  Evan welcomes suggestions.


Noisy subways

This NYC subway report is impossible to read.
Nyt_subwayreport

However, it is very difficult to find a good way to show the information.  In fact, the data contained very little of that.  Curiously, the ratings are very dispersed so that each line is graded high on some category and low on others.  Here's one view of it:

Redo_subwayreport

I have grouped the subway lines together (A/C/E, 4/5/6, etc.).  The metrics are plotted left to right in the same order as in the original.  Is it all noise and no signal?

(I just realized the vertical axis is reversed: best ratings are at the bottom, worst ratings at the top.  Doesn't matter anyway since I can't see any patterns.)

Source: "No. 1 Train is Rated Highest by Commuter Advocates", New York Times, July 24 2007.

PS. Two contributions from readers.  Still looking for insight from this data...

Trains789fg5_2 Trainspotmatrix_2



Exception to the rule

It's pretty hard to decree hard-and-fast rules for graphical design; every rule seems to admit its exception.  This reinforces Tufte's contribution as he has successfully organized the rules in his collection of books.

Dustin J sent in this chart from the Economist.  Its first impression is ugly and overly complex.

Econ_petrol

Dustin commented:

Steven Few says not to use stacked bar charts because you cannot compare individual values very easily and as a rule I avoid stacked bars with more than six or seven divisions. What do you think of this stacked bar--I think it is quite effective in telling the story.

On this blog, I have also re-done some stacked bar charts but this one is truly an exception to the rule.  The reason why this one works is that it's not about the individual components, it's showing that the US consumes more than all those countries combined. 

If only it has the proper caption!  The Economist is uncharacteristically detached here: "Petrol consumption per day", "Litres bn, 2003".  How about "Goliath v. Davids"?  "US v. the World"? "Dream Team USA"?

It'd help if they tone down the colors; also, by simply annotating the total litres for the US and the total for the other countries, they would have made a clearer point without using gridlines.  But these are minor glitches in an otherwise effective chart.

Source: Economist, July 2007.


Mid-week entertainment: dogma

Wsj_laff1This chart from a Wall Street Journal editorial has been making the rounds lately, being ridiculed left and right.  A number of you have been leaving comments here so I'm putting it up and center as our light entertainment for the week.

The chart is being used to justify this economic concept called the "Laffer Curve" which claims that lowering tax rates can increase total tax receipts (for example, because fewer people will cheat the government.)  As far as I know, it is dogma, and has never been proven empirically.

I also agree with Prof. Gelman's skepticism about using countries as experimental units to inform domestic policy.

Fire away!



Further reading:

Junk Chart readers

Economist's View
Tufte blog
Gelman blog


And more:

Cosmic Variance
Brad DeLong


Gauging the water level

Nyt_waterThis set of charts covered the back page of one of New York Times' sections this weekend.

Regular readers will share my enthusiasm for the top chart.  It makes a clear, cogent case to support the article's thesis concerning the rise of bottled water.  Various renditions of this type of chart have appeared here, for example.

Specifically, the smart use of color to cluster the line objects helps interpret the trends.  Blue sets out the two primary interests.  (It's a mystery to me why the gray lines were separated into darker and lighter hues.)

The twenty-year horizon used is another nice touch. I'd remove the gridlines although they aren't too distracting here.

Sadly, the second graphic does not meet the high standard of the first.  The biggest problem concerns the red rectangle, purportedly showing how much of the bottled water was imported.  The choice of differently-sized bottles as objects makes it impossible to gauge what proportion of the total was imported.  If the rectangle was placed over 1-litre bottles instead, it would look smaller.

Source: "A Battle Between the Bottle and the Faucet", New York Times, July 15, 2007.


More prevalent versus more likely

Aleks pointed to an interesting Business Week chart used to explain what people in different age groups are doing on-line.  This is a pretty chart that does an admirable job with a difficult data set.

Bw_onlinedataThe key to this chart, unfortunately missing, is that the percentages must be read as vertical columns to make sense.  So the top left square says 34% of "Young Teens" who answered the survey said they create web pages on-line.  In addition, the total of each column can be much more than 100% because multiple responses were allowed.

Realizing the above, we should interpret the bottom (grey) row as saying: "Older boomers" and "seniors" are more likely to be "Inactives" than younger people.  A tempting interpretation is: "Inactives" are more likely to be "seniors" and "older boomers".  But this is wrong because the chart hides the age distribution.  While 70% of "Seniors" are inactive, "Seniors" may represent a small proportion of the population, and thus they may not account for a large proportion of "Inactives".  This is the difference between prevalence and incidence rate.  (Another way to grasp this is to add the percentages across a row and try and fail to understand what the row sum could mean.)

The construct of the square grids is less damaging than it seems.  In effect, the data has been rescaled by dividing by 10.  The reader is then forced to apply "rounding".  If you are someone who sees $19.95 as $19, then you'd round down the partial rows.  If you see $19.95 as $20, you'd round up the partial rows.  So the designer has pushed you to think in terms of whole numbers between 0 and 10, in other words, in units of 10%, rather than units of 1% or, horror of horrors, 0.1% or at some other unrealistic precision.

Here's another example where the profile chart shines.  Because the percentages don't sum up to 100%, the other alternatives like stacked bar charts and "Merrimeckos"/mosaic charts don't work.  (Prior discussion of this issue here.)

Redo_onlinedata

This version gives a column view of the data, the lines linking percentages of each age group performing on-line activities.  The profiles nicely cluster into three groups: the younger people are more likely to say they are "joiners", "spectators" or "creators" but less likely to be "inactives".  We also see that the likelihood of being "Collectors" has little to do with age.

Source: "Inside Innovation -- In Data", Business Week, June 11 2007.



Adulterated education

A good teacher makes a great difference.  Reader Richard M drove this point home when he sent in a junk chart posing as educational material. The offending graphic is used by BBC's Skillswise website to teach "Handling data: Graphs and Charts".  Skillswise is an otherwise laudable effort to help adults "improve their basic skills in reading, writing and maths".

Skillswise Even for pros, each question is a challenge.  Question 7 really requires a new pair of glasses.

The entire worksheet is located here.  The use of patterns for shading is especially disconcerting.  The graphic also lacks self-sufficiency as we have trouble comparing countries without referencing the underlying data.  As we discussed before, a good graphic is one in which graphical objects (bars, pies, dots, etc.) illuminate the underlying data; when all the data must be printed next to the objects, the graphic is most likely redundant.

Source: BBC Skillswise website.