You can follow this conversation by subscribing to the comment feed for this post.

It would be interesting to explore this topic graphically rather than numerically. A simple trial would be to present a scatterplot and ask people to draw a best-fit line. Simpler would be a number line inviting the subject to plot the mean. The subject could be asked to match distributions of points they believed arose from the most similar probability functions... Sounds like a kind of fun study to be a participant in.

When I took cognitive psych back in college I did something sort of similar: I asked people to add an additional "random" point to a cloud of previously generated pseudorandom points. The results suggested that the process the subject went through to select a point was complex - they intersected an overal tendency toward certain spaces that was unique to each person with a tendency to certain areas (e.g. those far from other points) that everyone preferred with a given arrangement of seed points.

Overal I find it very interesting, but I'm not sure what the paths are from results of studies like these to applications. It may be that it could help make choices about where to lead the reader's eye to statistically significant patterns, and where to rely on them to see those patterns without help? Or perhaps it could help analysts inoculate themselves against human weaknesses in the interpretation of data?

GTT: Here's the application that I was thinking about when I wrote this post. Imagine you are one of those people who receive regular reports with data in them or are in a position to observe data as they arise. Say the manager of a call center, or a shop. You are not the analyst or accountant. But because you're there, you develop an intuition of what the week's sales number is, or the average person who calls the call center. I'm interested in how that representative statistic is generated in the absence of doing the numbers, and how accurate that estimate is compared to the real numbers.

Following that line of reasoning along, a very simple utility would be in identifying cases where more explicit statistics are needed to pre-empt poor intuition on the part of a data observer. If people are great at picking out the mean of un-analyzed data, then there's not much need to calculate that mean on the fly. But (for example) people aren't necessarily that great at identifying significant clumpiness in data, so if that's a relevant question it might be good to report data along with some quantification of that clumpiness so that the observer doesn't jump to the conclusion that there's something going on when there isn't.

Specific example: A business that sold high-value items might see very few sales per week, and in this situation clumps due to random chance will be common. However real clumps, related to factors the business operators weren't yet aware of, might well happen. So if they had a little helper utility to watch the data stream and estimate the chances that a given clump resulted from random variability, that might be valuable.

That's one step beyond the research you're talking about here, but I can see how the research might suggest approaches like this.

There's some nice work on the representation of statistics by the visual system that might be of interest:

http://visionlab.harvard.edu/Members/George/Publications_files/Alvarez-Oliva-2008-PsychSci.pdf

(and the papers that follow).
This work is more perceptual in nature, but it suggests that our visual system computes a variety of summary statistics over the displays we see.

Here's another example of an application. I just finished marking a stack of midterm exams. If you ask me to what the mean score was in the class, how accurate would my guesstimate be? What are the heuristics I would be using to come up with that guess? Am I affected by the last few papers I marked? Am I affected by the frequency of repeated scores? Am I affected by the sections of the exam that count for more points? Am I affected by the max and min scores I've seen? How does memory play into this?

I understand that there is work done on the visual side and I am of course interested in visualizations. But I don't believe the two areas overlap. What I'm looking for is different.

Here's something from another vision lab -- not necessarily about summary statistics, but much more on the visualization side of things.

http://perception.research.yale.edu/papers/12-Newman-Scholl-PBR.pdf

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

## NEW BOOTCAMP

See our curriculum, instructors. Apply.
Business analytics and data visualization expert. Author and Speaker. Founder of Principal Analytics Prep, MS Applied Analytics at Columbia. See my full bio.

## Next Events

May: 2 New York Marketing Association Big Data Workshop, NYC

May: 5 NYPL Analytics Careers Talk, NYC

May: 8 Data Visualization Seminar, Denver, CO

May: 15 Data Visualization Seminar, Cambridge, MA

May: 17 Data Visualization Seminar, Philadelphia, PA

May: 22 Data Visualization Seminar, San Ramon, CA

See here

## Future Courses (New York)

Summer: Statistical Reasoning & Numbersense, Principal Analytics Prep (4 weeks)

Summer: Applied Analytics Frameworks & Methods, Columbia (6 weeks)

## Junk Charts Blog

Graphics design by Amanda Lee

## Search3

•  only in Big Data