I recently came across a series of papers by Irwin Levin (link), about how well people estimate statistical averages from a given set of numbers. In contrast to the findings of Tversky and Kahneman, Gigerenzer, etc. on probability, it seems like we are able to guess average values pretty well, even in the presence of outliers.

It must be said the sample size used in Levin's experiments was tiny (12 students in one case but working with something like 75 sets of numbers). That said, the experimental setup was remarkable. Take this paper as an example. The numbers were either shown in sequence or at the same time. Levin created three types of tasks: a descriptive task in which the goal was to get the average of the numbers presented, including the outliers; an inference task in which the goal was to guess the average of the population of numbers from which the sample was drawn, in which case we expect the subjects to discount the outliers; and a discounting task, in which subjects were presented with data including outliers, but were asked to ignore them.

***

The reason for this post is that Levin's work was done in the 1970s (Levin himself retired this year according to his webpage). There doesn't appear to be much interest in this subject since then.

It seems like researchers may find the estimation of summary statistics like means, medians, etc. not interesting enough. All the new research that I know of concerns judging probability distributions, margins of error, variability, etc.

However, I'm more interested in point estimates, and I feel that the early research left the question still unsettled. I haven't found any research on how good we are at guessing the median of a set of numbers, or the mode, or trimmed means, or moving averages. If we see repeated numbers, are we likely to use the average, the median or the mode, or some other statistic to summarize that information? Given what we now know about irrationality and biases in judging probabilities, are we able to replicate Levin's finding? Or will we find that his experiment would not hold with better samples?

It would be interesting to explore this topic graphically rather than numerically. A simple trial would be to present a scatterplot and ask people to draw a best-fit line. Simpler would be a number line inviting the subject to plot the mean. The subject could be asked to match distributions of points they believed arose from the most similar probability functions... Sounds like a kind of fun study to be a participant in.

When I took cognitive psych back in college I did something sort of similar: I asked people to add an additional "random" point to a cloud of previously generated pseudorandom points. The results suggested that the process the subject went through to select a point was complex - they intersected an overal tendency toward certain spaces that was unique to each person with a tendency to certain areas (e.g. those far from other points) that everyone preferred with a given arrangement of seed points.

Overal I find it very interesting, but I'm not sure what the paths are from results of studies like these to applications. It may be that it could help make choices about where to lead the reader's eye to statistically significant patterns, and where to rely on them to see those patterns without help? Or perhaps it could help analysts inoculate themselves against human weaknesses in the interpretation of data?

Posted by: GroundTruthTrek | 11/15/2012 at 02:40 AM

GTT: Here's the application that I was thinking about when I wrote this post. Imagine you are one of those people who receive regular reports with data in them or are in a position to observe data as they arise. Say the manager of a call center, or a shop. You are not the analyst or accountant. But because you're there, you develop an intuition of what the week's sales number is, or the average person who calls the call center. I'm interested in how that representative statistic is generated in the absence of doing the numbers, and how accurate that estimate is compared to the real numbers.

Posted by: Kaiser | 11/15/2012 at 04:43 PM

Following that line of reasoning along, a very simple utility would be in identifying cases where more explicit statistics are needed to pre-empt poor intuition on the part of a data observer. If people are great at picking out the mean of un-analyzed data, then there's not much need to calculate that mean on the fly. But (for example) people aren't necessarily that great at identifying significant clumpiness in data, so if that's a relevant question it might be good to report data along with some quantification of that clumpiness so that the observer doesn't jump to the conclusion that there's something going on when there isn't.

Specific example: A business that sold high-value items might see very few sales per week, and in this situation clumps due to random chance will be common. However real clumps, related to factors the business operators weren't yet aware of, might well happen. So if they had a little helper utility to watch the data stream and estimate the chances that a given clump resulted from random variability, that might be valuable.

That's one step beyond the research you're talking about here, but I can see how the research might suggest approaches like this.

Posted by: GroundTruthTrek | 11/15/2012 at 08:02 PM

There's some nice work on the representation of statistics by the visual system that might be of interest:

http://visionlab.harvard.edu/Members/George/Publications_files/Alvarez-Oliva-2008-PsychSci.pdf

(and the papers that follow).

This work is more perceptual in nature, but it suggests that our visual system computes a variety of summary statistics over the displays we see.

Posted by: Mike Frank | 11/16/2012 at 12:39 PM

Here's another example of an application. I just finished marking a stack of midterm exams. If you ask me to what the mean score was in the class, how accurate would my guesstimate be? What are the heuristics I would be using to come up with that guess? Am I affected by the last few papers I marked? Am I affected by the frequency of repeated scores? Am I affected by the sections of the exam that count for more points? Am I affected by the max and min scores I've seen? How does memory play into this?

I understand that there is work done on the visual side and I am of course interested in visualizations. But I don't believe the two areas overlap. What I'm looking for is different.

Posted by: Kaiser | 11/17/2012 at 03:32 PM

Here's something from another vision lab -- not necessarily about summary statistics, but much more on the visualization side of things.

http://perception.research.yale.edu/papers/12-Newman-Scholl-PBR.pdf

Posted by: Emily Ward | 11/20/2012 at 07:58 PM