« April 2011 | Main | June 2011 »

Slow news day

It must be a slow news day when the media spends hundreds of words on discussing a chart that has yet to be unveiled. But the New York Times writer surely got it right with the opener: "Whatever you do, don't call it a pie chart."

We are being told that the government will replace the "food pyramid" with a food pie chart although they will call it something else. This thing which is not a pie chart has not been revealed yet but because chart purists have such political clout it was thought necessary to release a trial balloon on a holiday weekend to gauge what the response might be.

I think we should reserve our judgment till we see this thing.


In any case, the focus on encouraging people to eat the right proportions of foods is wrong-headed. Firstly, it is next to impossible for anyone to keep track of the distribution of foods consumed in any given day, unless you keep a diary. Secondly, nutritionists know that the biggest contributor to obesity is the quantity of food being eaten. Thus, a much more effective way is encouraging smaller portions, or knowing when to stop eating. This method also happens to be much easier to put into practice.

Rank confusion

This chart, found in Princeton Alumni Weekly, only partially scanned here, supposedly gave reasons for "Princeton's top-rated [Ph.D.] programs" "to celebrate". My alma mater has outstanding academic departments, but it would be difficult to know from this chart!


Due to the color scheme, the numbers that jump out at you are the ones in the bright orange background, which refers to how many other departments are ranked equal to Princeton's in those subjects. It takes some effort to realize that the more zeroes there are in the top buckets (fading orange), the better.

The editor started with a nice idea, which is to convert raw rankings into clusters of rankings. She recognized that in this type of rankings (see a related post on my book blog here), it is meaningless to gloat about #1 versus #2 because they are probably statistically the same. For instance, in the ranking of Architecture departments (ARC), 37 schools (including Princeton) all belonged to the same cluster as Princeton, judged to be a statistical tie.

One of the main reasons why this chart looks so confusing is its failing the self-sufficiency test. It really is a disguised data table, with some colorful background and shadows; the graphical elements add nothing to the data at all. If one covered up all the data, there is nothing left to see!

In the following rework, I emphasize the cluster structure. Each subject has three possible clusters, schools ranked above, equal to, and below Princeton. Instead of plotting raw numbers, the chart shows proportions of schools in each category. The order is roughly such that the departments with the relatively higher standing float to the top. Because a bar chart is used, the department names could be spelt out in their entirety and placed horizontally.


If one has access to the raw data, it would be even better to reveal the entire cluster structure. It is quite possible that the clusters above and below Princeton can be further subdivided into more clusters. This will allow readers to understand better what the cluster ranks mean.


A good question deserves good data

The last chart in the infographics on OECD education data asks another intriguing question: do countries that pay teachers more achieve better test scores?


This chart suffers from the same ill as the one previously discussed (here): the data is not suitable to address the question. It is mighty hard to see any pattern in the set of bar charts on offer. This lack of correlation can be confirmed by displaying the data in a scatter plot:

The scatter on the left presents the data as shown in the original, with a regression line drawn in that appears to indicate a positive correlation of higher spending and higher achievement.

Here, spending is measured by the ratio of primary teacher pay after 15 years of service to average GDP while achievement is indicated by the proportion of students who attain a "top" level of proficiency in any or all of the three test subjects.

But notice the solitary point sitting on the top right corner (labelled "1"). That point is Korea, which has both the highest achievement and the highest spending (by far). Korea is an outlier (known as a leverage point). The chart on the right is the same as the one on the left with Korea removed. What appears to be a moderate positive correlation vanishes. (The numbers plotted are the ranking of countries by the proportion of students attaining top proficiency, the metric on the vertical axis.)

So, either the message is that achievement and spending are uncorrelated (for every country except Korea), or that we have a measurement problem. I think the latter is more likely, and would defer to psychometricians to say what are acceptable measures for spending and for achievement. Do primary teachers with 15 years or more of service represent "education spending"? Do top students adequately capture general achievement in the education system?


Soshable_payperf_closeup The original chart contains a serious misinterpretation of the data (source: Education at a Glance 2009, OECD). It falsely assumes that the proportion of students attaining top proficiency in each subject is additive. In fact, because the same student could be top in one or more subjects, the base of such a sum would not be 100%.

In my version, the metric used is the proportion of students who attain top proficiency in 1, 2 or all 3 subjects. This metric is computed off a 100% base.

I also removed the breakdown by gender. This creates clutter, and I can't find any interest in the male or female data.


See also our first post on this infographics.


Nothing is as simple as it seems

Thanks to reader Chris P. (again) for pointing us to this infographics about teacher pay. This one is much better than your run-of-the-mill infographics poster. The designer has set out to answer specific questions like "how much do teachers make?", and has organized the chart in this way.

This post is about the very first chart because I couldn't get past it. It's a simple bar chart, with one data series indexed by country, showing the relative starting salary of a primary-school teacher with minimal training. This one:


The chart tells us that the range of salaries goes from about $12,000 at the low end (Poland) to over $65,000 at the high end (Luxembourg), with U.S. roughly at the 67% percentile, running at $42,000 per year. The footnote says that the source was OECD.

The chart is clean and simple, as a routine chart like this should. One might complain that it would be easier to read if flipped 90 degrees, with country labels on the left and bars instead of columns. But that's not where I got stuck... mentally.

I couldn't get past this chart because it generated so many unanswered questions. The point of the chart is to compare U.S. teacher pay against the rest of the world (apologies to readers outside the U.S., I'm just going with the designer's intention). And yet, it doesn't answer that question satisfactorily.

Our perception of the percentile ranking of the U.S. is fully determined by the choice of countries depicted. One wonders how that choice was made. Do the countries provide a nice sampling of the range of incomes from around the world? Is Poland truly representative of low pay and Luxembourg of high pay? Why are Korea and Japan the only two Asian countries shown and not, say, China or India? Why is there a need to plot Belgium (Fl.) separately from Belgium (Fr.), especially since the difference between the two parts of Belgium is dwarfed by the difference between Belgium and any other country? This last one may seem unimportant but a small detail like this changes the perceived ranks.

Further, why is the starting salary used for this comparison? Why not average salary? Median salary? Salary with x years of experience? Perhaps starting salary is highly correlated to these other metrics, perhaps not.

Have there been sharp changes in the salaries over time in any of these countries? It's quite possible that salaries are in flux in less developed countries, and more stable in more developed countries.

Also, given the gap in cost of living between, say, Luxembourg and Mexico, it's not clear that the Mexican teacher earning about $20,000 is worse off than the Luxembourger taking home about $65,000. I was curious enough to do a little homework: the PPP GDP per capita in Luxembourg was about $80,000, compared to $15,000 in Mexico, according to IMF (source: Wikipedia), so after accounting for cost of living, the Mexican earns an above-average salary while the Luxembourger takes home a below-average salary. Thus, the chart completely misses the point.


  Jc_trifecta Using the Trifecta checkup, one would address this type of issues when selecting the appropriate data series for use to address the meaningful question.

Too often, we pick up any data set we can lay our hands on, and the data fails to answer the question, and may even mislead readers.




PS. On a second look, I realized that the PPP analysis shown above was not strictly accurate as I compared an unadjusted salary to an adjusted salary. A better analysis is as follows: take the per-capita PPP GDP of each country, and the per-capita unadjusted GDP to form the adjustment factor. Using IMF numbers, for Luxembourg, this is 0.74 and for Mexico, this is 1.57. Now, adjust the average teacher salary by those factors. For Luxembourg, the salary adjusted for cost of living is $48,000 (note that this is an adjustment downwards due to higher cost of living in that country), and for Mexico, the adjusted salary was inflated to $31,000. Now, these numbers can be appropriately compared to the $80,000 and $15,000 respectively. The story stays the same.