Apr 25, 2008

Knit-picking

Nyt_tuitionfree2 In celebrating the recent trend by "elite" colleges to lowering the cost of education, the Times printed this chart, the top part of which is shown here.

The three colors represent different levels of aid.  Blue means "grants replace loans"; red means "free tuition"; yellow means "parents pay nothing".  The colleges are grouped by the minimum qualifying income for the blue category.

The whole effect is of a knit.  We shall call this the "knit chart".

I believe a simple data table will do the job nicely.  If any reader has other ideas, please show us your work!

A few points to note about the original:

  • Ordering by the minimum income to qualify for "grants replace loans" is arbitrary, as is alphabetizing colleges within each group
  • Qualifying "at any income level" should be shown on the left of "$40,000 or below" rather than to the right of $100,000.  The current order is such that qualifying level increases with income from left to right, except from $100,000 to "any income", where it falls off a cliff.
  • Qualifying at any income level is better shown as a separate column on the right disconnected from the income scale.  The current configuration devalues the effort spent in making a proper income scale.
  • Too many lines of equal length, and too few yellow and red lines to make the knit chart effective
  • Should the graph cater to parents interested in seeing what aid they qualify for given their income level?  Or should the graph highlight the breadth of aid available at individual colleges?

Reference: "The (Yes) Low Cost of Higher Ed", New York Times, April 20 2008.

PS. The original point about the "any income level" was incorrect as pointed out by Chris below.  I have replaced that with a different issue.

PPS. Matias' version (see comments) is a superb demonstration of the power of data tables, well-applied.   It is clean and simple, and addresses both the questions pointed out in the last bullet point.  The only thing sacrificed was the visual representation of the relative size of the income requirements, which I agree is the least valuable part of the original.  As usual, many thanks to our readers for coming up with great ideas!

Redo_tuitionfree2

Apr 14, 2008

Progress and retrogress

Joran E. pointed to this "icky" chart he found on Clive Crooks' blog at the Atlantic. 
Orig_tertiary

He ordered a "junkchart treatment", so here it comes.

First we wanted to process the triangles, dots and squares to make sense of this data.  We noted that the data came from a single year (2005) so the chart did not trace the development of the education sector over time.  But wait, it used a different route to get at the same idea.  The author compared different generations within each country to see if more and more citizens took university degrees.  So each vertical "arrow" was kind of a historical record of different generations within a country.  Under this criterion, Korea and Japan had come a long way while the US and China stagnated.

The chart is quite impossible to read as designed.  There is little reason to sort by 25-34-year-old proportion when the message concerns improvement over generations.  Besides, what about countries that apparently retrogressed?  (like Russia and Germany)

Redo_tertiary2For this data, I returned to my favored bumps chart.  Here is version one.  There are two ways to read this chart: across countries, we note that most of the European states (blue) had similar profiles showing roughly a constant rate of growth.  The Asian duo of Japan and Korea (brown) had the most marked growth.  Of North America (black), Canada diverged from the US since the 35-44 generation.

Alternatively, we can focus on the change generation-over-generation.  From 55-64 to 45-54, almost all countries in this sample (except Japan) grew at the same rate.  Then between 45-54 and 35-44, the two Asian countries clearly set the pace.  The generation between 35-44 and 25-34 is most interesting: Korea has not slowed, Japan has slowed a little but still grew as fast as Canada.  A trio of European countries (Spain, Ireland, France) outpaced their neighbors.

Below I show version two.  This one combines bumps chart with small multiples.  North America, Europe and Asia/Australia are now in separate charts.  This removes clutter.

Redo_tertiary

 

Apr 12, 2008

Hanging tough

Orig_literacy

Reader Nick B. sent in this example calling it "interesting".  The chart tells a compelling story once we figure out what it is.  Grasping the tree structure is key.

It illustrates the important idea that averaging sometimes masks  variations in the data.  For example, while the province of Guerrero scored 78% on literacy, the municipalities within Guerrero had scores ranging from 28% to 90%.

It also shows that the gender gap was larger in lesser Metlatonoc municipality than in more literate Cuautitian.

In addition, it tells us that while Mexico on average measured very well on literacy, subpopulations within Mexico spanned the world's best and worst (from about Mali's level to Italy's).

While I find this chart adequate, the pieces hanging off each other did not seem ideal, especially the two overlapping municipality pieces which were placed next to each other.  However, it is tough to come up with an alternative.  Here's one attempt; the changes are mild.

Redo_literacy_2 I prefer the horizontal orientation.

The branches are emphasized (as opposed to the "T" junction) because that's a key part of the story.

The national level, especially the span between Mali and Italy, is de-emphasized; I treat it as gridlines.

Instead of placing the overlapping pieces next to each other, I let the ranges literally overlap, which serves to stress this feature.


 

 

Jan 04, 2008

Maps and dots

Happy New Year

The cosmos of university ranking got more interesting recently with the advent of the "brain map" by Wired magazine.  This new league table counts the total number of winners of five prestigious international prizes (Nobel, Fields, Lasker, Turing, Gairdner) in the past 20 years (up to 2007); and the researcher found that almost all winners were affiliated with American institutions.
Wired_brainmap
As discussed before, the map is a difficult graphical object; it acts like a controlling boss.  In this brain map, the concentration of institutions in the North American land mass causes over-crowding, forcing the designer to insert guiding lines drawing our attention in myriad directions.  These lines scatter the data asunder, interfering with the primary activity of comparing universities.

Wired_dots The chain of dots object cannot stand by itself without an implicit structure (e.g. rows of 10).  This limitation was apparent in the hits and misses chart as well.  Sticking fat fingers on paper to count dots is frustrating.  Simple bars allow readers to compare relative strength with less effort.

Redo_brainmap_2

In the junkart version, we ditched the map construct completely,  retaining only the east-west axis.  [For lack of space (and time), I omitted the US East Coast and Washington-St. Louis.]  With this small multiples presentation, one can better contrast institutions.

To help comprehend the row structure, I inserted thin strikes to indicate zero awards. A limitation of the ranking method is also exposed: UC-SF has a strong medical school and not surprisingly, it has received a fair share of Nobel (medicine), Lasker and Gairdner prizes; but zero Lasker and Gairdner could be due to less competitive medical schools or none at all!


Reference: "Mapping Who's Winning the Most Prestigious Prizes in Science and Technology", Wired magazine, Nov 2007.

Nov 25, 2007

A dangerous equation

Graduation rates at 47 new small public high schools that have opened since 2002 are substantially higher than the citywide average, an indication that the Bloomberg administration’s decision to break up many large failing high schools has achieved some early success.

Most of the schools have made considerable advances over the low-performing large high schools they replaced. Eight schools out of the 47 small schools graduated more than 90 percent of their students.

Nyt_smallsch This graphic included in the NYT article  lent support to the "small schools movement".  In particular, note the last sentence of the above quotation: it incorporates the oft-used device of subgroup support of a hypothesis, in this case, the subgroup of eight top-performing schools.

Such analysis is "dangerous", according to Howard Wainer, who discusses this and other examples of misapplication in a recent article in American Scientist, entitled "The Most Dangerous Equation".  He alleged that billions have been wasted in the pursuit of small schools.

The issue concerns sample size.  Dr. Wainer and associates analyzed math scores from Pennsylvania public schools.  Wainer_mathscoresAverage scores for smaller schools are based on smaller number of students, and therefore less stable (more variable).  More variability means more extremes.  Thus, by chance alone, we expect to find more smaller schools among the top performers.  Similarly, by chance alone, we also expect to find more smaller schools among the worst performers. 

The scatter plot lays out their argument. Focusing only on the top performers (blue dots), one might conclude that smaller schools do better.  However, when the bottom performers (green) are also considered, the story no longer holds.  Indeed, the regression line is essentially flat, indicating that scores are not correlated with school size.

This is all nicely explained via the standard error formula (De Moivre's equation) in Dr. Wainer's article.  Here is a NYT article from the mid 1990s describing this same phenomenon.

File this as another comparability problem.  Because estimates based on smaller samples are less reliable, one must take extra care when comparing small samples to large samples.

Dr. Wainer is publishing a new book next year, called "The Second Watch: navigating the uncertain world".  I'm eagerly looking forward to it.  His previous books, such as Graphic Discovery and Visual Revelations, both part of the Junk Charts collection.

Sources: "The Most Dangerous Equation", American Scientist, November 2007; "Small Schools Are Ahead in Graduation", New York Times, June 30 2007.


P.S. Referring back to the NYT chart above, one might wonder at the impossible feat of raising graduation rates across the board simply by breaking up large schools into smaller ones.  This topic was taken up here, here and here.  When evaluating the "small schools" policy, it is a mistake to discuss only the performance of small schools; any responsible analysis must look at improvement over all schools.  Otherwise, it's a simple matter of letting small schools skim off the cream from larger schools.

 

Aug 12, 2007

Non-elites

From Mikhail Simkin comes some intriguing analysis of "experts"; in this line of research, experts are compared to the "general public" and often "proved" to be shenanigans. Stock pickers don't do better than apes; economists don't do better than Big Macs; you get the idea.  In a new twist, Simkin puts twelve images of modern art on his website, and asks visitors to distinguish between those by grand masters and those "ridiculous fakes" produced by him apparently on a computer.

Since conventional wisdom says elite universities provide better education, Simkin attempted to find out if there is a difference between "elites" and "the crowd" in their ability to recognize modern art. (Elites, to him, meant the Ivy League and Oxbridge.)  The following pair of histograms clinched his point:

we see that there is not much difference between the elite and the crowd.

Simkin_fakeart


Since the shapes of the histograms are similar, one might be inclined to agree with the statement.  This is until one notes the wildly different scales used because only 143 of the 56,020 quiz-takers could be identified as "elites".

The shapes are clarified if we use a relative scale (percentages) rather than absolute scale.  Further, the difference is more easily seen when cumulative percentages are plotted.  In other words, we are interested in comparing the proportion of respondents who score at least X points out of 12.

Redo_fakeart

Two features are worth noting:

  • A gap opens up between 4 to 7: specifically, 40% of "non-elites" scored 7 points or below while only 25% of "elites" scored 7 points or below.
  • The curves criss-cross around 11 to 12: this shows that "non-elites" were more likely to have perfect scores (although this difference is small).  Perhaps museum directors don't have .edu addresses.

Notice that I plotted Elite vs Non-Elite rather than Elite vs All Respondents.  While it seems innocuous to use "All Respondents", and in this case, there is no noticeable difference since Elites were a tiny proportion, when the test group accounts for a significant proportion of the total, the value for "All Respondents" will be influenced by that for the test group.  As a general rule, compare A to not A.

Simkin's exercise raises many statistical issues of design, which we won't discuss here.

Source: "Properly Prescribed" (via, RSS Significance)

Jul 09, 2007

Adulterated education

A good teacher makes a great difference.  Reader Richard M drove this point home when he sent in a junk chart posing as educational material. The offending graphic is used by BBC's Skillswise website to teach "Handling data: Graphs and Charts".  Skillswise is an otherwise laudable effort to help adults "improve their basic skills in reading, writing and maths".

Skillswise Even for pros, each question is a challenge.  Question 7 really requires a new pair of glasses.

The entire worksheet is located here.  The use of patterns for shading is especially disconcerting.  The graphic also lacks self-sufficiency as we have trouble comparing countries without referencing the underlying data.  As we discussed before, a good graphic is one in which graphical objects (bars, pies, dots, etc.) illuminate the underlying data; when all the data must be printed next to the objects, the graphic is most likely redundant.

Source: BBC Skillswise website.
 

Jan 28, 2007

Is it random?

A puzzle (from Laurie Snell, via the Gelman blog): are the tiles on this wall random or not? How does one prove something's random? Or that something's not random?

Dartmouthtiles
On this blog, we have addressed this question before when we discussed lottery numbers, and suicides.

 

Reference: Gelman's blog

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31