« On the bubble | Main | Mid-week entertainment: spots »



No doubt, this question is better answered using traditional statistical techniques such as Analysis of Variance. What the chart does not answer is the "degree" of the significance of the difference (p-value).

John Johnson

I think Kolmogorov-Smirnov is better suited to this particular problem, although a chi-square test (on elite vs. non-elite, not elite vs. all) can be used as well. I think I'd prefer K-S.

Hadley Wickham

The big problem with this graphic is that we are terrible at comparing the horizontal distance between two curves - our eyes compare the shortest distance between them. A plot of the difference between the curves could be revealing.


I'd be interested to see what that looks like when the y-axis is transformed to make a probability chart; will that opening up between 4 and 7 be as apparent, or more apparent?

Zuil Serip

I agree that the histogram has to be normalized in terms of %, but the cummulative % doesn't work for me in terms of building an immediate visual intuition of what is going on.

For comparing two simple distributions such as these, I'd use two superimposed histograms. Here is a variation on this theme - a superimposed violin plot.



The labeling of the cumulative probability graph is wrong. It says "100% of subjects (in both groups) got AT LEAST 12 questions CORRECT."


I don't have a problem with that label; instead I'd invert the cumulative curve to start at 100% ("100% got at least 0 correct") and slope down to one or two percent ("1.6% got at least 12 questions correct"). Then the "elite" line would be to the right of, and above, the general line, properly conveying the information that they did a little better overall.

I tried a probability scale, but it was disappointing, because the cumulative curve inevitably lost either the 100% or 0% cumulative values off to infinity. It did nicely display the differences at all other points though.


The superimposed histograms make it clear that the distributions have very different shapes, and that the Elite are not normally distributed. Since the distributions are so different here, I'm not sure if any conclusion can be made based on the data.

I'd say that realistically, there's a good chance that the Elites are a compound of two distributions: liberal arts majors who'd seen many of the pieces before and the rest, the rest being distributed identically to the plebeians. The liberal arts majors are added to a small sample of the rest and skew things to the right. It's impossible to know without more detailed information about the quiz-takers.

The comments to this entry are closed.

Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter