Book review: Getting (more out of ) Graphics by Antony Unwin
Oct 30, 2024
Antony Unwin, a statistics professor at Augsburg, has published a new dataviz textbook called "Getting (more out of) Graphics", and he kindly sent me a review copy. (Amazon link)
I am - not surprisingly - in the prime audience for such a book. It covers some gaps in the market:
a) it emphasizes exploratory graphics rather than presentation graphics
b) it deals not just with designing graphics but also interpreting (i.e. reading) them
c) it covers data pre-processing and data visualization in a more balanced way
d) it develops full case studies involving multiple graphics from the same data sources
The book is divided into two parts: the first, which covers 75% of the materials, details case studies, while the final quarter of the book offers "advice". The book has a github page containing R code which, as I shall explain below, is indispensable to the serious reader.
Given the aforementioned design, the case studies in Unwin's book have a certain flavor: most of the data sets are relatively complex, with many variables, including a time component. The primary goal of Unwin's exploratory graphics can be stated as stimulating "entertaining discussions" about and "involvment" with the data. They are open-ended, and frequently inconclusive. This is a major departure from other data visualization textbooks on the market, and also many of my own blog posts, where we focus on selecting a good graphic for presenting insights visually to an intended audience, without assuming domain expertise.
I particularly enjoyed the following sections: a discussion of building graphs via "layering" (starting on p. 326), enumeration of iterative improvement to graphics (starting on p. 402), and several examples of data wrangling (e.g. p.52).
Unwin does not give "advice" in the typical style of do this, don't do that. His advice is fashioned in the style of an analyst. He frames and describes the issues, shows rather than tells. This paragraph from the section about grouping data is representative:
Sorting into groups gets complicated when there are several grouping variables. Variables may be nested in a hierarchy... or they may have no such structure... Groupings need to be found that reflect the aims of the study. (p. 371)
He writes down what he has done, may provide a reason for his choices, but is always understated. He sees no point in selling his reasoning.
The structure of the last part of the book, the "advice" chapters, is quite unusual. The chapter headers are: (data) provenance and quality; wrangling; colour; setting the scene (scaling, layout, etc.); ordering, sorting and arranging; what affects interpretation; and varieties of plots.
What you won't find are extended descriptions of chart forms, rules of visualization, or flowcharts tying data types to chart forms. Those are easily found online if you want them (you probably won't care if you're reading Unwin's book.)
***
For the serious reader, the book should be consumed together with the code on github. Find specific graphs from the case studies that interest you, open the code in your R editor, and follow how Unwin did it. The "advice" chapters highlight points of interest from the case studies presented earlier so you may start there, cross-reference the case studies, then jump to the code.
Unfortunately, the code is sparsely commented. So also open up your favorite chatbot, which helps to explain the code, and annotate it yourself. Unwin uses R, and in particular, lives in the "tidyverse".
To understand the data manipulation bits, reviewing the code is essential. It's hard to grasp what is being done to the data without actually seeing the datasets. There are no visuals of the datasets in the book, as the text is primarily focused on the workflow leading to a graphic. The data processing can get quite involved, such as Chapter 16.
I'm glad Unwin has taken the time to write this book and publish the code. It rewards the serious reader with skills that are not commonly covered in other textbooks. For example, I was rather amazed to find this sentence (p. 366):
To ensure that a return to a particular ordering is always possible, it is essential to have a variable with a unique value for every case, possibly an ID variable constructed for just this reason. Being able to return to the initial order of a dataset is useful if something goes wrong (and something will).
Anyone who has analyzed real-world datasets would immediately recognize this as good advice but who'd have thought to put it down in a book?
Comments
You can follow this conversation by subscribing to the comment feed for this post.