How to read a graph
Oct 13, 2008
Via Gelman, here is a nifty book-buying map from Amazon, displaying the split between "red books" and "blue books" bought by Amazon users in each state in the months leading up to the 2004 and 2008 presidential elections.
Gelman noted the similarity between the Amazon map and the red-blue split of rich voters.
This post is about how to read a graph. Here are some things that come to mind looking at the map:
- Sampling bias: how does Amazon's customer base compare with the U.S. population, or rich voters? It would be prudent to check this before making generalizations. Gelman's point may be that Amazon customers behave like rich voters.
- Sampling period: is the period long enough to capture the average inclination of the book buyers? As is well known, book sales follow a long-tail distribution (Chris Anderson wrote an entire book based on this observation.) Best-sellers have a disproportionate influence on average values. If the time period is too short, the data may only represent the best-sellers. Consider the following two maps in successive periods in 2004:
- Classification: The long-tailed nature of book sales has wide-reaching implications on interpreting the data. The most essential feature is that single books (bestsellers) have a disproportionate impact on average sales. Since the key metric here is proportion of red (or blue) books, it follows that whether a best-seller is classified as red or blue makes a huge difference.
If the purple books include best-sellers, then the decision to call it purple rather than red or blue causes an influential book to be excluded from the calculation. We often forget that the decision to exclude is not a neutral decision; it is an active decision that says the excluded data contains no useful information.
This is not to say that excluding those books is the wrong decision. We must make these decisions with considerable care, and realize that excluding best-sellers when book sales have a long-tailed distribution must not be taken lightly.
- Causality: Lets say we are sufficiently satisfied that we can make a statement about book buying habits and voting behavior. Then we need to think about the direction of causality. Is the map saying that red book buyers are likely to vote red? Or that red voters are likely to buy red books? No prolonged staring at this data set will resolve this issue as other data would be needed to address it.
The more data is used to create a graph, the harder our task is to interpret it. But the pay-off for spending the time is all the sweeter. Happy graph-reading!
One final note: there is no doubt that this interactive map feature is a brilliant marketing move by Amazon. This is a great and fun way for readers to find interesting books.
Reference: "Amazon, U.S.A.", Gelman blog, Oct 5 2008.
It is a sad world where we only choose to talk to those we agree with, and only read books that confirm what we already know, how we already think. This polarises the field and only fuels misunderstandings and false representations of each others' ideas. We start seeing differences where none exist.
This graph seems to assume this is already the case. Although many of the books tracked here might be "election books" that only serve narrow purposes, I don't think this assumption is very well founded.
Posted by: TH | Oct 13, 2008 at 05:46 AM
Another skew in the data is the mismatch between area of a state on the map and population of that state. My home state of Rhode Island barely appears on the map, but its population outnumbers several states with much greater area: Montana, South Dakota, North Dakota, Alaska, and Wyoming, plus Vermont and Delaware, which are not geographically large.
Posted by: Jon Peltier | Oct 13, 2008 at 09:39 AM
Thanks for sharing the graphing information. Very interesting topic.
Posted by: G Horse | Oct 20, 2008 at 08:09 PM
Thats some very useful information...
"It is a sad world where we only choose to talk to those we agree with, and only read books that confirm what we already know, how we already think"
And that is very very true lol
Posted by: Tony | Oct 21, 2008 at 03:11 AM