Apr 25, 2008

Knit-picking

Nyt_tuitionfree2 In celebrating the recent trend by "elite" colleges to lowering the cost of education, the Times printed this chart, the top part of which is shown here.

The three colors represent different levels of aid.  Blue means "grants replace loans"; red means "free tuition"; yellow means "parents pay nothing".  The colleges are grouped by the minimum qualifying income for the blue category.

The whole effect is of a knit.  We shall call this the "knit chart".

I believe a simple data table will do the job nicely.  If any reader has other ideas, please show us your work!

A few points to note about the original:

  • Ordering by the minimum income to qualify for "grants replace loans" is arbitrary, as is alphabetizing colleges within each group
  • Qualifying "at any income level" should be shown on the left of "$40,000 or below" rather than to the right of $100,000.  The current order is such that qualifying level increases with income from left to right, except from $100,000 to "any income", where it falls off a cliff.
  • Qualifying at any income level is better shown as a separate column on the right disconnected from the income scale.  The current configuration devalues the effort spent in making a proper income scale.
  • Too many lines of equal length, and too few yellow and red lines to make the knit chart effective
  • Should the graph cater to parents interested in seeing what aid they qualify for given their income level?  Or should the graph highlight the breadth of aid available at individual colleges?

Reference: "The (Yes) Low Cost of Higher Ed", New York Times, April 20 2008.

PS. The original point about the "any income level" was incorrect as pointed out by Chris below.  I have replaced that with a different issue.

PPS. Matias' version (see comments) is a superb demonstration of the power of data tables, well-applied.   It is clean and simple, and addresses both the questions pointed out in the last bullet point.  The only thing sacrificed was the visual representation of the relative size of the income requirements, which I agree is the least valuable part of the original.  As usual, many thanks to our readers for coming up with great ideas!

Redo_tuitionfree2

Apr 19, 2008

Cram it like Koby

You have to gradually build up your gut by eating larger and larger amounts of food, and then be sure to work it all off so body fat doesn't put a squeeze on the expansion of your stomach in competition  -- Takeru Kobayashi, six-time champion of the Coney Island hot dog eating contest

Kobayashi is a phenom.  He can stuff 60 hot dogs or 100 burgers in ten or twelve minutes and show no consequences.  Ordinary people can't hope to emulate these feats.

Junk Charts sees Kobayashi as a hero; an anti-hero really.  We are ordinary people; we can't hope to cram it like Koby.  A message we keep repeating here is: too much data sinks a chart.

Econ_anglosaxon Not long after this chart showed up in the Economist, several readers urged us to take a look.  It's a well-nourished chart indeed, one to challenge Kobayashi, but for all that it contains, the reader has to try very hard to find insights.  What with the multiple colors, iron-fisted gridlines, above-and-below boxes, dotted and solid lines, and a legend with nine pieces split in two spots?  Besides, the U.S. boxes grab all the attention by virtue of them being wider (country being more partisan).

The key to unraveling this chart is to identify the relevant comparisons:

  • UK average vs US average
  • UK left vs US left
  • UK right vs US right
  • UK independent vs US independent

And then for the gluttonous:

  • UK right vs US left
  • UK left vs independent vs right
  • US left vs independent vs right

In the junkchart version, we address these comparisons sequentially.

Redo_anglosaxon1a
(Apologies for the tiny font.)

We are again using a small multiples approach that places four comparisons next to each other: average, left, independent, right. Consistently, the British is to the left of Americans.  The only places where the two cultures meet are where liberals agree on "ideology" and "military action".

Also note that we use a symmetric horizontal scale centered at 0.  There are too many charts out there where the center is not at the center!

A similar presentation addresses the other three comparisons.  Democrats in the U.S. are miles to the right of Tories in terms of "religion".  In the UK, Labor and Tories are not much different except on "ideology".  In the US, Independents lean closer to Democrats.

Redo_anglosaxon2a

Joining the lines (I hear the grumbles) helps bring out the gap between the groups being compared.  Without lines, the chart would look like this.

Redo_anglosaxon3a

It is often hard to keep track of which dot is which as they trade order from issue to issue.

PS. Anyone knows what is being measured on the horizontal axis?  The original graph mysteriously stated "respondents' views".


References: 

Eric Talmadge: "Pigout champion Kobayashi limbers up for hot dog gold" June 25, 2004

"Anglo-Saxon Attitudes", Economist, Mar 27 2008.

Mar 30, 2008

Small multiples re-imagineered

Nyt_disney

This chart gave me trouble.  I kept staring at it, staring.  Searching for the legend.  What could the several lines, in different colors, represent?  Take a look yourself.




Well, it turns out all three graphs were duplicates.  A different line was given dark blue to highlight a particular amusement park.

I have not seen this tactic used before.  This is like a small multiples concept except that every chart contains the same data.  Is it better than having just one chart?

Reference: "Will Disney Keep Us Amused?", New York Times, Feb 10 2008.




PS. [4/6/2008]  Here are two alternative charts contributed by our readers.  See comments below.

Derek suggested using sparklines:

Redo_parks1

Zuil reverted to basics:

Redo_parks2

Mar 08, 2008

Chart cleanup

Anna E. submitted this great example from Yahoo! Green.  A well-meaning chart but stuffed with redundancy.
Yahoo_bostongreen

Much appear to be going on and yet the entire chart contains 15 data points, Boston's ranks on each of 15 categories.  The bar lengths convey the same information as the data labels.  The legend provides a catchy name for different levels of ranks (0-10 = "leader"; 10-20 = "advances"; etc.).  The colors merely reiterate the catchy titles.  Similarly, the colored squares repeat the information in the bars.

In the name of green, we cleaned up this chart:

Redo_bostongreen

As a standalone graph, the categories should be ordered by Boston's ranks.  Here, we assume that cross-referencing cities is needed so we leave the order unchanged.


Feb 03, 2008

Redundancy

Nick B., who occasionally writes about statistical graphics, found some classic chart junk from a Canadian report on the Afghan army.  Here's one example, together with the junkchart version.Redoafghan_2

Redundancy is an enemy of good graphics, and incongruous redundancy is worse.  Here, troop level is variously described as "total force size", "strength" and "army growth"; the chart on the right uses only the army concept.  The data labels ("47000 Strength"), the axis labels ("50000 Total Force Size"), and the gridlines all germinate from the five grand data points underlying the entire chart!

Another distorting feature is that use of different-sized time intervals, which we space out appropriately on the right chart.

Ultimately, the key message should be growth in the army size, not the absolute number of troops.  The slopes of the line segments encode this information.  Alternatively, a data table can be rather powerful for simple data like this:

Redoafghan2 By what is called the "end state", there would be 70% more troops than those as of December 2007.

 


Jan 10, 2008

Football rankings 1

The Times' sports pages made wise use of graphics in a series of NFL articles recently.  Here is a rank plot (below left) comparing Jaguars quarterback David Garrard to seven other quarterbacks who started the weekend of January 5.

Nyt_garrard

Simple and effective, this chart does not fuss around in showing us where Garrard ranks relative to the others. 

Redo_garrardThe junkart revision (below right) plays with a different scale: the spacing between the tick marks represent proportional differences in the underlying metric.  This gives us a little more: for example, Garrard's second rank in completion percentage is less remarkable than first thought as he essentially tied with the 3rd and 4th best while the top six were bunched between 60 and 65 percent.

But Garrard's touchdown to interception ratio stands out as the next best quarterback attained only about half his ratio.  (Todd Collins who had not thrown an interception until that time was omitted; he also had only started four games.)


References: "Two Dreams (One Big, One Tiny) Come True", New York Times, Jan 4 2008; ESPN statistics.

Dec 09, 2007

Lacking buzz

Nielsen, they of the ratings, is roughing it in the information age.  When they announced on-line tracking tools, Wired quipped: "It's looking like online video policing companies will have to make room for another deputy."  Last year, cable companies revolted over a service measuring the effectiveness of commercials.

Via the Data Mining blog, I learnt about yet another new on-line offering, called "Hey! Nielsen" for obscure reasons.  (Perhaps Hey! Nielsen is the new Yahoo! !)

The site is an enigma wrapped in a mystery.  The official description says:

Hey! Nielsen is the place to make a name for yourself while trading opinions on TV, movies, music, personalities, web sites and more.

How does one "trade" opinions?

According to the FAQ, the "Hey! Nielsen" score, the cornerstone of the site, is:

a real-time indicator of a topic's impact and value and you play a major role. As the site evolves and users submit their opinions and commentary, the score will rise or fall based on a number of factors including, but not limited to, user opinions, news coverage, and raw data from our sister sites Billboard.com, HollywoodReporter.com, and BlogPulse.com.

Sounds like a product aimed at marketers to help them track public opinion but offering little control over sampling. 

The "Hey! Nielsen" buzz chart (below) captures the change in "Hey! Nielsen" score over time.

Heynielsen

This chart is an unfortunate case of flipping background into foreground.  What grabs our attention are those hideous white circles with numbers in them.  The legend explains that these are the daily numbers of opinions on the subject, in other words, the daily sample sizes.  As they stand now (with the site still in beta), they serve to expose the low level of participation, leading to small sample sizes, and irrelevance.  But what when the site became super-popular, would the circles say 56234, 19245, 90257, etc.?  Why would visitors care about daily sample sizes anyway?  Mousing over these circles reveal text but in most cases, they are blocked by neighboring white circles.

In the meantime, the circles obscure the line which shows the trend in the "Hey! Nielsen" score over time.  This chart reminds me of that Google toy known as Google Trends.  The Googlers provide no vertical scale so the graphs are unreadable.  "Hey! Nielsen"ers provide a vertical scale -- kind of -- but the graphs are still meaningless: what does a score of 881 mean?  how about 724?  what is the maximum score?  what is the minimum?  Beware numbers without context.

The vertical axis does start from zero but has an odd spacing of tick labels. The gridlines are distracting and serve no purpose.  The orange area under the curve also makes little sense.

We look forward to seeing version 2.0.

 

Sep 23, 2007

Buffer time

As this report from the Department of Transportation makes clear, congestion on our roadways causes travellers to add "buffer time" to their planned journeys.  So, for instance, one may have to allocate 32 minutes for a trip that would have taken 20 minutes in uncongested traffic if one would like to guarantee on-time arrival.  The 12 minutes would either become time spent sitting on the road or wasted time due to arriving too early.

Buffer time can be applied to graphs too.  Some graphs require readers to spend time fishing out the information.  The chart used to illustrate travel time belongs to this category. 
Dottraveltime_2The clock analogy fails; in fact, it confuses matters as the hour hand just sits there serving no purpose.  The buffer time between staring and comprehending is too much!

Only four numbers underly this chart: travel time when uncongested and buffer time to guarantee on-time arrival, for 1982 and 2001.  The following version gets to the point without fuss. 
RedotraveltimeIt shows that the travel time increased significantly even under uncongested traffic; worse, the buffer time multiplied.

Reducing buffer time is always good but some buffer time may be inevitable.  In the traffic analogy, to eliminate all buffer time would mean lots of unused capacity.  In the context of graphs, more complicated charts would require more time; the key is whether the reader is rewarded for the time spent figuring out the chart.



Source: "Traffic Congestion and Reliability", Department of Transportation.

Aug 28, 2007

Cheers

Nyt_mets07


This is an exemplary chart from the NYT Sports page.  It provides a clear, informative and exciting way to visualize how the baseball season has gone for the Mets this and last year.  It's been mostly up and not much down. 

We can observe the more subtle differences: last season was a steady rise with only two prolonged down periods; this season's curve is driven by two up periods (including right now), outside of which the record has hovered around two levels (0, +3).

Especially commendable is the judicious use of axis labels.  However, I'm not clear on how some of the labels were chosen.  For example, 14 games ahead seem to me a rather arbitrary one.

All in all, a job well done.

Source: "Not Only Yankee Fans Cheering for Week 22", New York Times, Aug 27, 2007

Jul 21, 2007

Exception to the rule

It's pretty hard to decree hard-and-fast rules for graphical design; every rule seems to admit its exception.  This reinforces Tufte's contribution as he has successfully organized the rules in his collection of books.

Dustin J sent in this chart from the Economist.  Its first impression is ugly and overly complex.

Econ_petrol

Dustin commented:

Steven Few says not to use stacked bar charts because you cannot compare individual values very easily and as a rule I avoid stacked bars with more than six or seven divisions. What do you think of this stacked bar--I think it is quite effective in telling the story.

On this blog, I have also re-done some stacked bar charts but this one is truly an exception to the rule.  The reason why this one works is that it's not about the individual components, it's showing that the US consumes more than all those countries combined. 

If only it has the proper caption!  The Economist is uncharacteristically detached here: "Petrol consumption per day", "Litres bn, 2003".  How about "Goliath v. Davids"?  "US v. the World"? "Dream Team USA"?

It'd help if they tone down the colors; also, by simply annotating the total litres for the US and the total for the other countries, they would have made a clearer point without using gridlines.  But these are minor glitches in an otherwise effective chart.

Source: Economist, July 2007.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31