« Use of statistics in setting insurance rates | Main | Know your data 10: spying mannequins »


Feed You can follow this conversation by subscribing to the comment feed for this post.


It would be interesting to explore this topic graphically rather than numerically. A simple trial would be to present a scatterplot and ask people to draw a best-fit line. Simpler would be a number line inviting the subject to plot the mean. The subject could be asked to match distributions of points they believed arose from the most similar probability functions... Sounds like a kind of fun study to be a participant in.

When I took cognitive psych back in college I did something sort of similar: I asked people to add an additional "random" point to a cloud of previously generated pseudorandom points. The results suggested that the process the subject went through to select a point was complex - they intersected an overal tendency toward certain spaces that was unique to each person with a tendency to certain areas (e.g. those far from other points) that everyone preferred with a given arrangement of seed points.

Overal I find it very interesting, but I'm not sure what the paths are from results of studies like these to applications. It may be that it could help make choices about where to lead the reader's eye to statistically significant patterns, and where to rely on them to see those patterns without help? Or perhaps it could help analysts inoculate themselves against human weaknesses in the interpretation of data?


GTT: Here's the application that I was thinking about when I wrote this post. Imagine you are one of those people who receive regular reports with data in them or are in a position to observe data as they arise. Say the manager of a call center, or a shop. You are not the analyst or accountant. But because you're there, you develop an intuition of what the week's sales number is, or the average person who calls the call center. I'm interested in how that representative statistic is generated in the absence of doing the numbers, and how accurate that estimate is compared to the real numbers.


Following that line of reasoning along, a very simple utility would be in identifying cases where more explicit statistics are needed to pre-empt poor intuition on the part of a data observer. If people are great at picking out the mean of un-analyzed data, then there's not much need to calculate that mean on the fly. But (for example) people aren't necessarily that great at identifying significant clumpiness in data, so if that's a relevant question it might be good to report data along with some quantification of that clumpiness so that the observer doesn't jump to the conclusion that there's something going on when there isn't.

Specific example: A business that sold high-value items might see very few sales per week, and in this situation clumps due to random chance will be common. However real clumps, related to factors the business operators weren't yet aware of, might well happen. So if they had a little helper utility to watch the data stream and estimate the chances that a given clump resulted from random variability, that might be valuable.

That's one step beyond the research you're talking about here, but I can see how the research might suggest approaches like this.

Mike Frank

There's some nice work on the representation of statistics by the visual system that might be of interest:


(and the papers that follow).
This work is more perceptual in nature, but it suggests that our visual system computes a variety of summary statistics over the displays we see.


Here's another example of an application. I just finished marking a stack of midterm exams. If you ask me to what the mean score was in the class, how accurate would my guesstimate be? What are the heuristics I would be using to come up with that guess? Am I affected by the last few papers I marked? Am I affected by the frequency of repeated scores? Am I affected by the sections of the exam that count for more points? Am I affected by the max and min scores I've seen? How does memory play into this?

I understand that there is work done on the visual side and I am of course interested in visualizations. But I don't believe the two areas overlap. What I'm looking for is different.

Emily Ward

Here's something from another vision lab -- not necessarily about summary statistics, but much more on the visualization side of things.


The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep