Last week, my Columbia students discussed this nice article in the New York Times called "The Most Detailed Map of Gay Marriages in America". (link)
The center of the article is this map:
I asked the students to identify the problem that this dataviz is supposed to address. Someone responded that it tells us where gay married couples are found geographically. I asked her what is the answer to the question. She said they are mostly concentrated in the coasts and there are relatively few in middle America.
Then, I asked what assumption was needed in order for that comment to be valid, something that is not found on the map itself. If you look at the legend of the map, you'll see that the data being plotted are proportions of married couples that are same-sex, not counts of same-sex couples. Thus, when one makes that conclusion about coastal skew, one is using the knowledge that the population of the U.S. is concentrated on the coasts. (If the population density is higher in the middle than on the coasts by a certain degree, there could be more same-sex couples in the middle despite the lighter shades of orange.)
Next, I wanted an answer to a simple question: which (three-digit) area code (the unit of analysis on this map) has the highest density of same-sex married couples? Students scratched their heads. This "most detailed" map is not well equipped with the answer. The issue raised is whether the amount of details is an obvious virtue of a dataviz.
We then touched upon two other topics that are also very important:
1) How was this data obtained and computed? The author does a great job explaining two ways of arriving at these counts, a sample survey versus administrative records (tax filings).
2) How should the data be interpreted? The author walks through a number of unexpected comparisons, all of which point to the importance of statistical controls. For example, why do same-sex couples make more money than opposite-sex couples? Why do lesbian couples make less than gay-male couples?
I highly recommend reading this article closely.
PS. At the end of the class, one student approached me to suggest that I should not have wasted class time discussing this article.
did that one student give reasons why he/she thought it was a waste of time?
Posted by: Kenneth | 09/25/2016 at 12:08 AM
For comparison to the map, I did a funnel plot of similar data last year, trying to account for the small samples in some counties.
http://blogs.sas.com/content/jmp/2015/07/15/graph-makeover-where-same-sex-couples-live-in-the-us/
Using ZIP3 should help with the sampling, but doesn't help as much as I would expect with the shape size issue (San Francisco's 940xx is still too small to see until you zoom in). And the speckled look in Iowa comes from discontiguous ZIP3s, forcing a pattern that probably doesn't exist.
Posted by: Xan Gregg | 09/25/2016 at 07:53 PM