I can't bring myself to read the paper, but it sounds like they showed each subject multiple chart pairs, say k pairs, which would mean they had 20*k observations and not just 20. Still a hell of a multiple comparisons problem, though. Of course, the real problem here is that the experimental procedure can be fairly paraphrased as "someone with an ax to grind asked leading questions until he was happy with the answers".

Cosma: Not really. They are not randomizing treatment. Every one of 20 participants inspected both sets of charts. They should have used a paired-difference test but they didn't. The comparison is for the difference between the sum total of scores for chart type A and the sum total of scores for chart type B, replicated 20 times. So I think you have to call that 20 observations.

Are you sure? The paper says, "Each participant saw only one version of each chart, either Holmes or plain."

Jerzy: I know it's confusing because the way they set it up is super complicated. Each participant saw 14 charts, 7 are of the Holmes variety, 7 are of the plain variety. They alternate between Holmes and plain charts, and insert a blank chart to provide a "visual break".
So each participant is exposed to both "treatments" in alternating fashion.
However, for any given chart (better described as given data set), each participant sees only once. So in the above figure, you either saw the Holmes version or the plain version. But if you saw the Holmes version for the Diamond chart, your next chart would be the plain version.

So I think you have to call that 20 observations.

If that's what they did, then yes, it's just n=20. Wow (and not in a good way). "We can perhaps say what the experiment died of."

