Big data
Dec 21, 2005
To me, statistics is about searching for beauty in simplicity. Much of our discipline is concerned with data reduction, or finding creative representation that consumes less space than the raw data. That's why I have mixed feelings about complicated, multi-dimensional, dynamic, user-controlled, gee-whiz displays of data. I have no issue with these as works of art but they tend not to enhance our understanding of the data.
I take a marginally relevant example from Google's well-illustrated 2005 Zeitgeist report.
The annotator nailed the key insights from this data, especially the flatness of "surfing" versus the seasonality of "snowboarding". Something is not right with the week-by-week fluctuations: these represent noise that interferes with our perception of the underlying seasonal trend. An easy remedy is to "smooth" the data using a moving average, exponential smoothing, etc. The smoothed data will not contain such jaggedness, making it easier to read.
Scribbles was the best the Google geniuses could do? There are so many OTHER things wrong with these charts: no scale, no indication of data granularity, fat lines that obscure each other (especially the black line for surfing), and no link to underlying data. I'm tagging this one as "half-assed graphics."
Posted by: Mike Anderson | Dec 21, 2005 at 08:17 AM
Sometimes when you have a complex high-dimensional data set you do "complicated, multi-dimensional, dynamic, user-controlled, gee-whiz displays". It's naive to assume you can always understand the data from simple 2D plots.
It's also important to make the distinction between displays for exploration and displays for communication. Exploration is a very personal task and at first you don't need to explain your findings to others. You can create very rough and ready graphics, as complicated as you can understand. When you want to communicate your findings, you need to extract simpler views of the data -- but without complex exploratory graphics it is often hard to find these simplifying views.
Posted by: Hadley | Dec 21, 2005 at 04:52 PM
Hadley makes some good points. But the Google graph has been released for public consumption, so I would say communication is the primary goal. I also think that some kind of smoothing is called for.
Posted by: Nick Barrowman | Dec 21, 2005 at 09:23 PM
There's no question that the google graph couldn't be substantially improved (the easiest way would be to just stretch the graph horizontally). I just took umbrage at the statement implying that all interactive and dynamic graphics are flim-flammery.
Posted by: Hadley | Dec 22, 2005 at 03:00 PM
I'd stand by my comment on gee-whiz charts until someone shows me a bunch of examples that really work.
Tufte is fond of the Napoleon Russian campaign chart, for example. It's one of a kind in the sense that it cannot be easily ported to other situations with other variables. Also, it collapses all 5-6 dimensions to 2-D.
In the past, I have highlighted at least two interactive/ dynamic charts here that I find to be enlightening. One was the obesity map as it evolved over time. The other was the reading and math assessment scores: at the official site, they have these stacked bar charts where the reader can line up different segments.
One weakness of dynamic graphs is that it relies on the reader's visual memory of what took place before. Even in the obesity map case, I have to review the animation multiple times and each time focus on a different corner of the map!
Posted by: Kaiser | Dec 22, 2005 at 06:10 PM
Have a look at the demos on the ggobi site. The teaching one will probably be most persuasive as it has been designed to show how dynamic/interactive graphics can be useful for teaching multivariate statistics, rather than demonstrating the functionality of ggobi. For a more historical perspective, check out the ASA graphics section online videos.
I'd agree that neither of the interactive/dynamic charts you have shown here in the past were particularly good. Even though interactive graphics have a rich history, and are an incredibly powerful set of techniques, they have yet to reach the mainstream of statistics, let alone the wider public.
There are some insights that you simply cannot gain from static graphics. (Not to mention all the wonderful small things that interactive graphics bring, like being able to match outlying points to the record in the original dataset). I would strongly suggest you become familiar with at least one interactive/dynamic graphics package (eg. ggobi, mondrian, manet, datadesk, crystalvision) before making any more cracks about gee-whizzery!
Posted by: Hadley | Dec 23, 2005 at 06:19 AM