I recently came across a series of papers by Irwin Levin (link), about how well people estimate statistical averages from a given set of numbers. In contrast to the findings of Tversky and Kahneman, Gigerenzer, etc. on probability, it seems like we are able to guess average values pretty well, even in the presence of outliers.
It must be said the sample size used in Levin's experiments was tiny (12 students in one case but working with something like 75 sets of numbers). That said, the experimental setup was remarkable. Take this paper as an example. The numbers were either shown in sequence or at the same time. Levin created three types of tasks: a descriptive task in which the goal was to get the average of the numbers presented, including the outliers; an inference task in which the goal was to guess the average of the population of numbers from which the sample was drawn, in which case we expect the subjects to discount the outliers; and a discounting task, in which subjects were presented with data including outliers, but were asked to ignore them.
The reason for this post is that Levin's work was done in the 1970s (Levin himself retired this year according to his webpage). There doesn't appear to be much interest in this subject since then.
It seems like researchers may find the estimation of summary statistics like means, medians, etc. not interesting enough. All the new research that I know of concerns judging probability distributions, margins of error, variability, etc.
However, I'm more interested in point estimates, and I feel that the early research left the question still unsettled. I haven't found any research on how good we are at guessing the median of a set of numbers, or the mode, or trimmed means, or moving averages. If we see repeated numbers, are we likely to use the average, the median or the mode, or some other statistic to summarize that information? Given what we now know about irrationality and biases in judging probabilities, are we able to replicate Levin's finding? Or will we find that his experiment would not hold with better samples?