No doubt, this question is better answered using traditional statistical techniques such as Analysis of Variance. What the chart does not answer is the "degree" of the significance of the difference (p-value).

I think Kolmogorov-Smirnov is better suited to this particular problem, although a chi-square test (on elite vs. non-elite, not elite vs. all) can be used as well. I think I'd prefer K-S.

The big problem with this graphic is that we are terrible at comparing the horizontal distance between two curves - our eyes compare the shortest distance between them. A plot of the difference between the curves could be revealing.

I'd be interested to see what that looks like when the y-axis is transformed to make a probability chart; will that opening up between 4 and 7 be as apparent, or more apparent?

I agree that the histogram has to be normalized in terms of %, but the cummulative % doesn't work for me in terms of building an immediate visual intuition of what is going on.

For comparing two simple distributions such as these, I'd use two superimposed histograms. Here is a variation on this theme - a superimposed violin plot.

http://img508.imageshack.us/img508/4882/eliteviolin2ba7.png

The labeling of the cumulative probability graph is wrong. It says "100% of subjects (in both groups) got AT LEAST 12 questions CORRECT."

I don't have a problem with that label; instead I'd invert the cumulative curve to start at 100% ("100% got at least 0 correct") and slope down to one or two percent ("1.6% got at least 12 questions correct"). Then the "elite" line would be to the right of, and above, the general line, properly conveying the information that they did a little better overall.

I tried a probability scale, but it was disappointing, because the cumulative curve inevitably lost either the 100% or 0% cumulative values off to infinity. It did nicely display the differences at all other points though.

The superimposed histograms make it clear that the distributions have very different shapes, and that the Elite are not normally distributed. Since the distributions are so different here, I'm not sure if any conclusion can be made based on the data.

I'd say that realistically, there's a good chance that the Elites are a compound of two distributions: liberal arts majors who'd seen many of the pieces before and the rest, the rest being distributed identically to the plebeians. The liberal arts majors are added to a small sample of the rest and skew things to the right. It's impossible to know without more detailed information about the quiz-takers.

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

## NEW BOOTCAMP

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

## Book Blog

Graphics design by Amanda Lee