Order
Happy holidays!

Big data

To me, statistics is about searching for beauty in simplicity.  Much of our discipline is concerned with data reduction, or finding creative representation that consumes less space than the raw data.  That's why I have mixed feelings about complicated, multi-dimensional, dynamic, user-controlled, gee-whiz displays of data.  I have no issue with these as works of art but they tend not to enhance our understanding of the data.

I take a marginally relevant example from Google's well-illustrated 2005 Zeitgeist report.

 Googlez_2
The annotator nailed the key insights from this data, especially the flatness of "surfing" versus the seasonality of "snowboarding".  Something is not right with the week-by-week fluctuations: these represent noise that interferes with our perception of the underlying seasonal trend.  An easy remedy is to "smooth" the data using a moving average, exponential smoothing, etc.  The smoothed data will not contain such jaggedness, making it easier to read.


Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Mike Anderson

Scribbles was the best the Google geniuses could do? There are so many OTHER things wrong with these charts: no scale, no indication of data granularity, fat lines that obscure each other (especially the black line for surfing), and no link to underlying data. I'm tagging this one as "half-assed graphics."

Hadley

Sometimes when you have a complex high-dimensional data set you do "complicated, multi-dimensional, dynamic, user-controlled, gee-whiz displays". It's naive to assume you can always understand the data from simple 2D plots.

It's also important to make the distinction between displays for exploration and displays for communication. Exploration is a very personal task and at first you don't need to explain your findings to others. You can create very rough and ready graphics, as complicated as you can understand. When you want to communicate your findings, you need to extract simpler views of the data -- but without complex exploratory graphics it is often hard to find these simplifying views.

Nick Barrowman

Hadley makes some good points. But the Google graph has been released for public consumption, so I would say communication is the primary goal. I also think that some kind of smoothing is called for.

Hadley

There's no question that the google graph couldn't be substantially improved (the easiest way would be to just stretch the graph horizontally). I just took umbrage at the statement implying that all interactive and dynamic graphics are flim-flammery.

Kaiser

I'd stand by my comment on gee-whiz charts until someone shows me a bunch of examples that really work.

Tufte is fond of the Napoleon Russian campaign chart, for example. It's one of a kind in the sense that it cannot be easily ported to other situations with other variables. Also, it collapses all 5-6 dimensions to 2-D.

In the past, I have highlighted at least two interactive/ dynamic charts here that I find to be enlightening. One was the obesity map as it evolved over time. The other was the reading and math assessment scores: at the official site, they have these stacked bar charts where the reader can line up different segments.

One weakness of dynamic graphs is that it relies on the reader's visual memory of what took place before. Even in the obesity map case, I have to review the animation multiple times and each time focus on a different corner of the map!

Hadley

Have a look at the demos on the ggobi site. The teaching one will probably be most persuasive as it has been designed to show how dynamic/interactive graphics can be useful for teaching multivariate statistics, rather than demonstrating the functionality of ggobi. For a more historical perspective, check out the ASA graphics section online videos.

I'd agree that neither of the interactive/dynamic charts you have shown here in the past were particularly good. Even though interactive graphics have a rich history, and are an incredibly powerful set of techniques, they have yet to reach the mainstream of statistics, let alone the wider public.

There are some insights that you simply cannot gain from static graphics. (Not to mention all the wonderful small things that interactive graphics bring, like being able to match outlying points to the record in the original dataset). I would strongly suggest you become familiar with at least one interactive/dynamic graphics package (eg. ggobi, mondrian, manet, datadesk, crystalvision) before making any more cracks about gee-whizzery!

The comments to this entry are closed.