Popularizing public data
Mar 29, 2012
Dona Wong, whose graphics book I reviewed two years ago (link), has recently joined the New York Fed to lead an effort to visualize data. This is exciting because consumers are unlikely to learn anything from Excel spreadsheets, HTML tables, etc. which are the typical formats of public data.
One of their efforts is visualization of mortgage delinquency data in the Tri-state and Long Island regions (link). This animation reminds me of the CDC obesity map, to which I gave a positive review in 2005 (link). This type of chart is great for revealing the evolution of a metric over time and over space. The sliding control is a very nice extra touch. This allows readers to freeze-frame the map and examine the details.
I sent Dona a few comments:
- A speed control would be nice
- Remove the word "quintiles" from the legend
- Add some cumulative measures, such as "90-day delinquency or worse"
***
Let's start with the last recommendation. Any given delinquent mortgage will move between the different states of delinquencies over time, and so it is useful to look simultaneously at the evolution of all three levels of problems. In particular, a reduction in 90-day delinquencies may mean that people are starting to pay off their loans, or it may mean that some of these mortgages have become foreclosures.
In Long Island, for example, the proportion of loans 90-day delinquent appeared to have decreased from Jan 2010 to Dec 2011 but as the second set of charts showed, the reduction was probably because many of these loans turned into foreclosures.
***
The second recommendation requires a bit of explanation, and peering behind the scenes. The legend is shown on the right (this is for Tri-State, foreclosures).
The easiest way to read the chart is to ignore the quintiles, and think of the colors as representing different ranges of delinquency rates within a county.
The quintile is a description of how the designer divides the counties into five equal-sized groups (corresponding to the five shades of colors). All the counties are sorted in increasing order of delinquency rate from the lowest to the highest. The bottom 20% of counties are classified as Quintile 1, the next 20% as Quintile 2, and so on up to Quintile 5 (the worst performing counties).
Each quintile represents a range of delinquency rates. For instance, in the above example for Tri-State foreclosures, Quintile 1 are counties with foreclosure rates between 0% and 1.3%. My point is that it is sufficient for readers to know the range of foreclosure rates associated with each color. Sometimes, introducing technical terms is more trouble than it's worth.
***
There is another reason why I would hide the quintile information if I were the designer of this chart. It's because there are different ways to define quintiles, and it takes too much time to explain your choice.
Notice that the maps of 2007 are very light colored across most counties (see example on the right) and the later maps are much darker colored. This means that my description above is too simplistic: Quintile 1 is not really the best-performing 20% of counties - it contains the best-performing 20% of county-month-year combinations. The designer starts with a list which contains an entry for each county for each month of each year, instead of a list that contains one entry for each county.
Both ways of defining quintiles are legitimate. The resulting maps will emphasize different aspects of the data. The way Dona's team did this, the maps emphasize the general worsening of delinquency rates over this period of time. (This is why the delinquency rates rather than quintiles are more important to understanding the chart.)
Alternatively, one can choose to take each month as a separate dataset, and then divide the list of counties into quintiles. The maps would now look totally different because in this rendition, all five colors will feature in equal numbers in each freeze-frame. This view allows readers to know at any moment in time which are the best counties and which are the worst. However, it has the disadvantage that the range of delinquency rates defining each color would shift from month to month. In this version, the legend should be described in terms of quintiles rather than rates.
In the end, I think the way this chart is constructed makes sense. My little suggestion is to not mention the quintiles at all and let that work in the background.
***
When reading complex charts like this, you may not realize the number of decisions that have already been made. This is a great example of how such decisions affect the appearance of the final product.