Happy to report on the dataviz workshop, a first-time offering at NYU. I previously posted the syllabus here.
I made minor changes to the syllabus, adding Alberto Cairo's book, The Functional Art (link), as optional reading, some articles from the recent debate in the book review circle about the utility of "negative reviews" (start here), and some blog posts by Stephen Few.
The Cairo and Few readings, together with Tufte, are closest to what I want to accomplish in the first two classes, before we start discussing individual projects: encouraging students to adopt the mentality of the course, that is to say, to think of dataviz as an artform. An artform implies many things, one of which is a seriousness about the output, and another is the recognition that the work has an audience.
The field of data visualization is sorely lacking high-level theory, immersed as so many of us are in tools, data, and rules of thumb. It is my hope that these workshop discussions will lead to a crytallization of the core principles of the field.
We went on a tour of many dataviz blogs, and documented various styles of criticism. In the next class, we will discuss what style we'd adopt in the course.
The composition of the class brings me great excitement. There are 12 enrolled students, which is probably the maximum for a class of this type. One student subsequently dropped out, after learning that the workshop is really not for true beginners.
The workshop participants come from all three schools of dataviz: computer science, statistics, and design. Amongst us are an academic economist trained in statistical methods, several IT professionals, and an art director. This should make for rewarding conversation, as inevitably there will be differences in perspective.
REQUEST FOR HELP: A variety of projects have been proposed; several are using this opportunity to explore data sets from their work. That said, some participants are hoping to find certain datasets. If you know of good sources for the following, please write a comment below and link to them:
- Opening-day ratings from sites like Rotten Tomatoes
- New York City water quality measures by county (or other geographical unit), probably from an environmental agency
- Data about donors/donations to public media companies
Since this is a dataviz blog, I want to include a chart with this post. I did a poll of the enrolled students, and one of the questions was about what dataviz tools they use to generate charts. I present here two views of the same data.
The first is a standard column chart, plotting the number of students who include a particular tool in his or her toolset (each student is allowed to name more than one tools). This presents a simple piece of information simply: Excel is the most popular although the long tail indicates the variety of tools people use in practice.
What the first option doesn't bring out is the correlation between tools, indicated by several tools used by the same participant. The second option makes this clear, with each column representing a student. This chart is richer as it also provides information on how many tools the average student uses, and the relationship between different tools.
The tradeoff is that the reader has to work a little more to understand the relative importance of the different tools, a message that is very clear in the first option.
This second option is also not scalable. If there are thousands of students, the chart will lose its punch (although it will undoubtedly be called beautiful).
Which version do you like? Are there even better ways to present this information?