Statistical adjustment in charts
Dec 05, 2011
On the book blog, I often talk about the reasons why statisticians adjust data, and why it is necessary in order to paint a proper picture of what the data is saying. (See here or here.)
On this blog, I have frequently complained about how the "prior information" on maps is too strong - large regions dominate our perception regardless of the data. In the U.S., large but sparsely populated states attain disproportionate attention.
So, why not bring "statistical adjustment" to maps?
That's exactly what cartograms do. For example, look at the following pair of maps created by the people at Leicestershire County Council. (PDF link here)
The map on the left and the cartogram on the right plot identical data. The only difference is that each hexagon on the cartogram represents an equal number of people. The two views give very different impressions: the big dark green patch on the middle-right of the map -- representing a relatively sparse neighborhood -- is shrunk to a single dark green hexagon on the cartogram. Meanwhile, the most deprived areas (dark purple) which look relatively small on the map are expanded to quite a few hexagons.
According to the map, most of the county live in areas ranked in the half considered less deprived (green), and that is good news. But wait... there is a lot of purple in the cartogram!
The real piece of news is that the majority of people live in the half of the neighborhoods considered more deprived (purple) but this uncomfortable fact is well-hidden in the mostly green map on the left.
Given that the measures of "deprivation" are about people, not geographical neighborhoods, the cartogram is much closer to the real world experience... notwithstanding the obvious geographical distortion introduced by the statistical adjustment.
According to Alex L., who is part of the team producing these graphics:
LSOAs were created for the 2001 [UK] Census to disseminate the data and are generally considered to represent 'neighbourhoods'. They are created to have a broadly consistent population (approx 1500 people in 2001) and socio-economic traits.
Question: Is there any reason to show the map at all?
The map is useful for people familiar with Leicestershire's geography who want to quickly identify the deprivation levels of their own and neighbouring areas, etc.
Plus, of course, it's useful as an illustration of the point you make: that sparsely-populated areas tend to be less deprived. This would be hidden if just the cartogram was shown.
Measures of deprivation are not only about 'people' but also about infrastructure, so I think to conceal the geographic version altogether might bring in its own distortions.
Posted by: Mo | Dec 06, 2011 at 05:37 AM
There most certainly is! But first, a clarification: LSOAs (hexagons in the cartogram) do not have “an equal number of people”. They have broadly similar sizes. In the Office of National Statistics’ estimates for 2007 the mean population nationally was 1573, SD 288, range 552-8808. The data is generally a good fit for a normal distribution, but with a long tail at the upper end. To shoe-horn them into equal sized hexagons in itself introduces a distortion.
I would first ask, what is the purpose of the analysis and presentation? Presumably to improve understanding of the client population for local authority services, and probably health service provision also. In that case, the location is all important, and however superficially attractive the cartogram may be it adds little to our understanding of this. The authors of the impressive original report make this point about geographic distortion explicitly in their narrative. No-one familiar with this sort of data will be surprised that less deprived people tend to live in more sparsely populated areas. Part of being better off is the ability to get away from the neighbours.
There are other questions. Most analysis of this dataset presents the results in local and/or national deciles of quintiles of deprivation. In this case the bandings are lowest 10%, next 40%, next 40%, and highest 10%. The legends need careful reading. I wander also at the choice of multiple colours to represent parts of a continuous range rather than using shades of a single colour, which would generally be considered best practice.
Posted by: Meic Goodyear | Dec 06, 2011 at 06:15 AM
Just a couple of points in reply to Meic:
1. LSOAs were broadly consistent at the time of the 2001 Census (although there are variations). Obviously populations change over time and the release of the 2011 Census data next year will shine more light on that. They are still more consistent than other geographies (e.g. parish) although the population consistency (or otherwise) is not always the reason we use them, rather their role as a common data container that can be used as a proxy for ‘neighbourhood’.
2. The deprivation dataset is an interesting one and doesn't always follow the pattern of urban-more deprived (housing is a great example of this). The cartogram draws our attention to the data rather than the area and makes it easier to put these messages out.
3. The deprivation data is a ranking, so from a policy perspective, it is valuable to know where our 'most' (by any measure) deprived areas are. This was a conscious decision to salami-slice the key areas for focus. The colour scheme was a divergent scheme taken from www.colorbrewer2.org. We didn't use a continuous scheme as lower -ranking areas are not necessarily 'affluent' by this measure, just 'less deprived'. The scheme allows us to focus on the purple areas in the context of the report.
Hope this clarifies things!
Posted by: Alex L | Dec 07, 2011 at 06:40 AM
Agree with Mo about map, and Meic Goodyear about color. But rather than coloring shapes it might be better to color bubbles sized according to population. This way we give importance to population instead of land mass.
Something like this: http://i.imgur.com/o0CCW.png
Posted by: Mark Ledwich | Dec 09, 2011 at 01:03 AM
Maps are indeed problematic because of prior information. If that is not information that is vital for understanding the added data then it is a bad choice.
Almost even worse is the prior information in the post before. The order of countries in the circle is not related to the data shown and makes the arrows have different lengths. Circular network graphs is a good tool for starting out and exploring( and sorting) networks but not for presenting data in this way.
Posted by: Jörgen Abrahamsson | Dec 09, 2011 at 04:23 AM