## How representative is your sample?

##### Aug 20, 2005

Taking a hint from Mahalanobis, I dug into Howard Wainer's other book  (Visual Revelations) to find the following gem.  Imagine you're an engineer working for the military.  You have the ingenious idea to inspect planes that returned home and plot the pattern of bullet holes.  The dark regions had high density of bullet holes.  Your task is to recommend where to put extra armour on the new planes.  What would you recommend?  (Note: the answer appears after the graphic!)

Howard credited Abraham Wald for his counter-intuitive insight.  We should put extra armour in the white regions, not the dark regions.  The inference is that the planes that got shot in the dark regions managed to return to the base while others got hit presumably in the white regions and never returned.

What has this to do with sampling?  If we forgot about the planes that never came back, we may jump to the conclusion that we should reinforce the dark regions.  The sample we didn't see is as important as the sample we observed.  To wit:

Statisticians call this "survivorship bias".  We only oberve survivors but we must not forget about the non-survivors!

A related page I found on the Web: Steve Simon