Transformations and regressions
Bubble charts and their discontents

How representative is your sample?

Taking a hint from Mahalanobis, I dug into Howard Wainer's other book  (Visual Revelations) to find the following gem.  Imagine you're an engineer working for the military.  You have the ingenious idea to inspect planes that returned home and plot the pattern of bullet holes.  The dark regions had high density of bullet holes.  Your task is to recommend where to put extra armour on the new planes.  What would you recommend?  (Note: the answer appears after the graphic!)



Howard credited Abraham Wald for his counter-intuitive insight.  We should put extra armour in the white regions, not the dark regions.  The inference is that the planes that got shot in the dark regions managed to return to the base while others got hit presumably in the white regions and never returned.

What has this to do with sampling?  If we forgot about the planes that never came back, we may jump to the conclusion that we should reinforce the dark regions.  The sample we didn't see is as important as the sample we observed.  To wit:


Statisticians call this "survivorship bias".  We only oberve survivors but we must not forget about the non-survivors!

A related page I found on the Web: Steve Simon


Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.