## Digging deeper

##### Nov 30, 2007

Two items from other places caught my eye this week as they directly relate to some things we discussed on this blog.

First, I second Andrew's suggestion of a recent NYT article for teaching the concept of margin of error, or how to read political poll coverage intelligently. Towards the end of this piece is a small gem:

Some pundits began by saying the horse race numbers were close but then tried to marshal evidence that they were not. On ABC's own Web site, Chris Cillizza, wrote: "Among women in the Post poll, Obama actually leads Clinton 32 percent to 31 percent among women. Voters 45 years of age or older are similarly divided, choosing Clinton by a 27 percent to 26 percent margin over Obama. Ditto for those who earn $50,000 or less a year; 29 percent for Clinton, 29 percent for Obama."

Mr. Cillizza failed to mention that if the margin of sampling error is plus or minus five percentage points for all of the likely Democratic caucus goers, then it is even higher for subgroups like women.

In a recent post, I call this the "oft-used device of subgroup support of a hypothesis". This example illustrates the fallacy more clearly. It's the "let dig deeper since we haven't found the gold yet" phenomenon. Such analysis suffers from two serious statistical problems. The article deals with the sample size problem: the margin of error at the subgroup level is by definition larger; what this means is the bar for statistical significance has been raised; and rare is the case where such analysis could lead to any further insights. (Of course, I am assuming the original poll was not designed to be analyzed at the subgroup level.)

The other issue -- more difficult to explain and omitted in the article -- is the multiple hypothesis problem. It is well known that if we dig around long enough, we may get so dizzy that anything that glitters will look like gold. In other words, false positives. Like the sample size problem, the remedy is to raise the bar for statistical significance even higher. In practice, this frequently wipes out the rationale for such analysis.

I will address the other interesting item in a new post.