This chart, found in Princeton Alumni Weekly, only partially scanned here, supposedly gave reasons for "Princeton's top-rated [Ph.D.] programs" "to celebrate". My alma mater has outstanding academic departments, but it would be difficult to know from this chart!
Due to the color scheme, the numbers that jump out at you are the ones in the bright orange background, which refers to how many other departments are ranked equal to Princeton's in those subjects. It takes some effort to realize that the more zeroes there are in the top buckets (fading orange), the better.
The editor started with a nice idea, which is to convert raw rankings into clusters of rankings. She recognized that in this type of rankings (see a related post on my book blog here), it is meaningless to gloat about #1 versus #2 because they are probably statistically the same. For instance, in the ranking of Architecture departments (ARC), 37 schools (including Princeton) all belonged to the same cluster as Princeton, judged to be a statistical tie.
One of the main reasons why this chart looks so confusing is its failing the self-sufficiency test. It really is a disguised data table, with some colorful background and shadows; the graphical elements add nothing to the data at all. If one covered up all the data, there is nothing left to see!
In the following rework, I emphasize the cluster structure. Each subject has three possible clusters, schools ranked above, equal to, and below Princeton. Instead of plotting raw numbers, the chart shows proportions of schools in each category. The order is roughly such that the departments with the relatively higher standing float to the top. Because a bar chart is used, the department names could be spelt out in their entirety and placed horizontally.
If one has access to the raw data, it would be even better to reveal the entire cluster structure. It is quite possible that the clusters above and below Princeton can be further subdivided into more clusters. This will allow readers to understand better what the cluster ranks mean.
The last chart in the infographics on OECD education data asks another intriguing question: do countries that pay teachers more achieve better test scores?
This chart suffers from the same ill as the one previously discussed (here): the data is not suitable to address the question. It is mighty hard to see any pattern in the set of bar charts on offer. This lack of correlation can be confirmed by displaying the data in a scatter plot:
The scatter on the left presents the data as shown in the original, with a regression line drawn in that appears to indicate a positive correlation of higher spending and higher achievement.
Here, spending is measured by the ratio of primary teacher pay after 15 years of service to average GDP while achievement is indicated by the proportion of students who attain a "top" level of proficiency in any or all of the three test subjects.
But notice the solitary point sitting on the top right corner (labelled "1"). That point is Korea, which has both the highest achievement and the highest spending (by far). Korea is an outlier (known as a leverage point). The chart on the right is the same as the one on the left with Korea removed. What appears to be a moderate positive correlation vanishes. (The numbers plotted are the ranking of countries by the proportion of students attaining top proficiency, the metric on the vertical axis.)
So, either the message is that achievement and spending are uncorrelated (for every country except Korea), or that we have a measurement problem. I think the latter is more likely, and would defer to psychometricians to say what are acceptable measures for spending and for achievement. Do primary teachers with 15 years or more of service represent "education spending"? Do top students adequately capture general achievement in the education system?
The original chart contains a serious misinterpretation of the data (source: Education at a Glance 2009, OECD). It falsely assumes that the proportion of students attaining top proficiency in each subject is additive. In fact, because the same student could be top in one or more subjects, the base of such a sum would not be 100%.
In my version, the metric used is the proportion of students who attain top proficiency in 1, 2 or all 3 subjects. This metric is computed off a 100% base.
I also removed the breakdown by gender. This creates clutter, and I can't find any interest in the male or female data.
Thanks to reader Chris P. (again) for pointing us to this infographics about teacher pay. This one is much better than your run-of-the-mill infographics poster. The designer has set out to answer specific questions like "how much do teachers make?", and has organized the chart in this way.
This post is about the very first chart because I couldn't get past it. It's a simple bar chart, with one data series indexed by country, showing the relative starting salary of a primary-school teacher with minimal training. This one:
The chart tells us that the range of salaries goes from about $12,000 at the low end (Poland) to over $65,000 at the high end (Luxembourg), with U.S. roughly at the 67% percentile, running at $42,000 per year. The footnote says that the source was OECD.
The chart is clean and simple, as a routine chart like this should. One might complain that it would be easier to read if flipped 90 degrees, with country labels on the left and bars instead of columns. But that's not where I got stuck... mentally.
I couldn't get past this chart because it generated so many unanswered questions. The point of the chart is to compare U.S. teacher pay against the rest of the world (apologies to readers outside the U.S., I'm just going with the designer's intention). And yet, it doesn't answer that question satisfactorily.
Our perception of the percentile ranking of the U.S. is fully determined by the choice of countries depicted. One wonders how that choice was made. Do the countries provide a nice sampling of the range of incomes from around the world? Is Poland truly representative of low pay and Luxembourg of high pay? Why are Korea and Japan the only two Asian countries shown and not, say, China or India? Why is there a need to plot Belgium (Fl.) separately from Belgium (Fr.), especially since the difference between the two parts of Belgium is dwarfed by the difference between Belgium and any other country? This last one may seem unimportant but a small detail like this changes the perceived ranks.
Further, why is the starting salary used for this comparison? Why not average salary? Median salary? Salary with x years of experience? Perhaps starting salary is highly correlated to these other metrics, perhaps not.
Have there been sharp changes in the salaries over time in any of these countries? It's quite possible that salaries are in flux in less developed countries, and more stable in more developed countries.
Also, given the gap in cost of living between, say, Luxembourg and Mexico, it's not clear that the Mexican teacher earning about $20,000 is worse off than the Luxembourger taking home about $65,000. I was curious enough to do a little homework: the PPP GDP per capita in Luxembourg was about $80,000, compared to $15,000 in Mexico, according to IMF (source: Wikipedia), so after accounting for cost of living, the Mexican earns an above-average salary while the Luxembourger takes home a below-average salary. Thus, the chart completely misses the point.
Using the Trifecta checkup, one would address this type of issues when selecting the appropriate data series for use to address the meaningful question.
Too often, we pick up any data set we can lay our hands on, and the data fails to answer the question, and may even mislead readers.
PS. On a second look, I realized that the PPP analysis shown above was not strictly accurate as I compared an unadjusted salary to an adjusted salary. A better analysis is as follows: take the per-capita PPP GDP of each country, and the per-capita unadjusted GDP to form the adjustment factor. Using IMF numbers, for Luxembourg, this is 0.74 and for Mexico, this is 1.57. Now, adjust the average teacher salary by those factors. For Luxembourg, the salary adjusted for cost of living is $48,000 (note that this is an adjustment downwards due to higher cost of living in that country), and for Mexico, the adjusted salary was inflated to $31,000. Now, these numbers can be appropriately compared to the $80,000 and $15,000 respectively. The story stays the same.
My coworker pointed me to a Huffington Post article claiming a Bill Gates byline that contains some highly dubious analysis and a horrific chart. We presume Gates was fed this information by some analysts but even so, one wishes he wouldn't promote innumeracy. But then, he has a history: Howard Wainer demolished analysis by his foundation used to channel lots of dollars to the "small schools" movement a few years ago; I wrote about that before.
First, the offensive chart:
Using double axes earns justified heckles but using two gridlines is a scandal! A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)
I can't understand the choice of scale for the score axis. The orange line, for instance, seems to have a positive slope. In any case, since these scores are "scaled", and the "standard error" is about 1 (this number is surprisingly hard to find, even on Google), it would appear that between 300 and 400 on the score axis, there are 100 units of standard error. By convention, three units of standard error away from the average is considered rare (events). There is no conceivable way that the average score could jump by that much.
The analysis is also flawed. Here's the key paragraph:
Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries... For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.
This argument contains several statistical fallacies:
Comparing apples and oranges: a glaring piece of missing information is whether other countries have increased their per-student spending on education, and if so, how fast the growth is compared to that in the U.S. Without this, the analysis makes no sense.
Confusing correlation and causation: so spending increased while test scores stagnated. In order to conclude that there is something wrong with the spending, one must first believe that spending has a causal effect on test scores. Observe that this is not a conclusion from the data; it is an assumption going into the analysis, neither supported nor disputed by the data since the data merely show a (lack of) correlation. This is another instance of "story time": we see data, we see conclusion, we are misled into thinking that data supports conclusion but in fact, the data is an irrelevant distraction. (For other instances of "story time", see this link to my book blog.)
Fallacy #1 and fallacy#2 combined: even if you believe that spending affects test scores, it is still a stretch to say that spending in U.S. schools affects the gap in test scores between U.S. students and foreign students. In the world where foreign countries are frozen in time, maybe so but where foreign countries are investing in education, one can't say anything about the test score gap without first knowing what's going on overseas.
Assumption invalidating the analysis: In a short breath, the analyst admits the possibility of (a) spending increase together with flat scores and (b) score increase together with flat spending. One model under which both of those possibilities coexist is one in which test scores are independent of spending. If so, why would one even look at a plot of these two quantities?
The dilemma of being together (a la Chapter 3 of Numbers Rule Your World): sorry to say but the spending on pupils is likely to have a highly skewed distribution depending on school district. Also, the average test scores is likely to have high variability across school districts. Thus, using an average for the entire country muddies the water.
Needless to say, test scores are a poor measure of the quality of education, especially in light of the frequent discovery of large-scale coordinated cheating by principals and teachers driven by perverse incentives of the high-stakes testing movement.
In the same article, Gates asserts that quality of teaching is the greatest decisive factor explaining student achievement. Which study proves that we are not told. How one can measure such an intangible quantity as "excellent teaching" we are not told. How student achievement is defined, well, you guessed it, we are not told.
It's great that the Gates Foundation supports investment in education. Apparently they need some statistical expertise so that they don't waste more money on unproductive projects based on innumerate analyses.
Given the recent post questioning the value of the MBA degree, one would think the Economist powers-that-be would not be staffing up MBAs. But then, if not useless MBAs, how would the Economist explain this chart they printed next to the said article?
This chart appears to tell us that all the top MBA programs succeed in reducing their students' earning potential. In each case, the "pre-MBA salary" exceeds the "salary on graduation".
More likely, the red part is the incremental salary, possibly explained by the value of the degree while the gray part is the pre-MBA salary.
However, since the author has few nice words to say about business schools, one can never be 100% sure if he is presenting some counter-intuitive data.
In the Trifecta checkup, one would find nothing wrong with the chart type, nor is there anything wrong with asking the return on investment of an MBA degree.
The third component -- having the right data -- is what renders this effort a failure. It is too simplistic to measure return on investment on the salary upon graduation. Surely, one must also include future career paths, intangible benefits from network relationships, personal development, etc.
In a prior post, I showed a chart of Pisa test scores that can be used to investigate differences between any pair of countries. At least one reader found it confusing, containing too much data. I then realize that if the objective of the chart is re-stated as "How the UK fared relative to other OECD countries", which was the intent of the original Guardian chart, the chart could be presented in the following simplified fashion:
Simplification can be achieved in many ways, one of which is simplifying the objective. In fact, I'd not be opposed to showing just the left side of the chart, which addresses an even more general question, which is how the countries fared in a general sense.
While the lines in the Guardian chart display correlations of math, reading and science scores within specific countries, essentially a parallel coordinates plot, the same correlation can be visualized in a scatterplot matrix (see this post).
Each scatter plot here relates the scores of two subject areas as indicated by the axis labels. The simplest observation is the high degree of positive correlation on all three panels: in other words, countries in general do well in all three subjects, or poorly in all three subjects.
This pattern confirms why it isn't very productive to focus readers' attention on this set of correlations when dealing with this data set.
You'll notice the use of colored dots on the scatter plots. Imagine that I have put the countries into groups based on overall scores (rather than just reading scores) as in my earlier analysis. The dots of the same color represent countries that are deemed to have performed similarly. The black cross indicates the "average country".
Focusing on the colors for the moment, you can confirm yet again that a country doing well in one subject is highly predictive of it doing well in the other subjects.
As I pointed out at the start of the prior post, using a little statistical technique allows us to understand the data better, and plotting summaries of the data allows us to draw more interesting conclusions than putting all the data, unperturbed, onto a canvass.
Information graphics is one of many terms used to describe charts showing data -- and a very ambitious one at that. It promises the delivery of "information". Too often, readers are disappointed, sometimes because the "information" cannot be found on the chart, and sometimes because the "information" is resolutely hidden behind thickets.
Statistical techniques are useful to expose the hidden information. They work by getting rid of the extraneous or misleading bits of data, and by accentuating the most informative parts. A statistical graphic distinguishes itself by not showing all the raw data.
Here is the Guardian's take on the OECD PISA scores that were released recently. (Perhaps some of you are playing around with this data, which I featured in the Open Call... alas, no takers so far.) I only excerpted the top part of the chart.
This graphic is not bad, could have been much worse, and I'm sure there are much worse out there.
But think about this for a moment: what question did the designer hope to address with this chart? The headline says comparing UK against other OECD countries, which is a simple objective that does not justify such a complex chart.
The most noticeable feature are the line segments showing the correlation of ranks among the three subject areas within each country. So, South Korea is ranked first in reading and math, and third in science. Equally prominent is the rank of countries shown on the left-hand-side of the chart (which, on inspection, shows the ranking of reading scores); this ranking also determines the colors used, another eye-catching part of this chart. (The thick black UK line is, of course, important also.)
In my opinion, those are not the three or four most interesting questions about this data set. In such a rich data set, there could be dozens of interesting questions. I'm not arguing that we have to agree on which ones are the most prominent. I'm saying the designer should be clear in his or her own mind what questions are being answered -- prior to digging around the data.
*** With that in mind, I decided that a popular question concerns the comparison of scores between any pair of countries. From there, I worked on how to simplify the data to bring out the "information". Specifically, I used a little statistics to classify countries into 7 groups; countries within each group are judged to have performed equally well in the test and any difference could be considered statistical noise. (I will discuss how I put countries into these groups in a future post, just focusing on the chart here.)
Here is the result: (PS. Just realized the axis should be labelled "PISA Reading Score Differentials from the Reference Country Group" as they show pairwise differences, not scores.)
Each row uses one of the country groups as the reference level. For example, the first row shows that Finland and South Korea, the two best performing countries, did significantly better than all other country groups, except those in A2. The relative distance of each set of countries from the reference level is meaningful, and gives information about how much worse they did.
(The standard error seems to be about 3-6 based on some table I found on the web, which may or may not be correct. This value leads to very high standardized score differentials, indicating that the spread between countries are very wide.
I have done this for the reading test only. The test scores were standardized, which is not necessary if we are only concerned about the reading test. But since I was also looking at correlations between the three subjects, I chose to standardize the scores, which is another way of saying putting them on an identical scale.)
Before settling on the above chart, I produced this version:
This post is getting too long so I'll be brief on this next point. You may wonder whether having all 7 rows is redundant. The reason why they are all there is that the pairwise differences lack "transitivity": e.g., the difference between Finland and UK is not the difference between Finland and Sweden plus the difference between Sweden and the UK. The right way to read it is to cling to the reference country group, and only look at the differences between the reference group and each of the other groups. The differences between two country groups neither of which is a reference group should be ignored in this chart: instead look up the two rows for which those countries are a reference group.
Before that, I tried a more typical network graph. It looks "sophisticated" and is much more compact but it contains less information than the previous chart, and gets murkier as the number of entities increases. Readers have to work hard to dig out the interesting bits.
I noticed a burst of activity on Twitter with "Junk Charts" nominations, too many for me to take care of. So, I'm trying a new feature, the Open Call. It's your chance to start the conversation on these charts.