Note: I am in the middle of a holiday and so posting will be limited.
Andrew posted a pretty chart that caught my attention. This is the sort of sophisticated chart that rewards careful reading.
Below is a guide to reading the chart:
- It is a small multiples chart with the components arranged in two dimensions (income levels, and a race-religion hybrid category). The top row is a summary of voters of all race-religion grouped by income. Note that there is no corresponding summary column for voters of all incomes grouped by race-religion.
- Source of data: 2000 poll but applied to 2008 demographic patterns. In other words, there is an underlying assumption that opinions have stayed stable within the demographic groups.
- The chart is in fact three dimensional because each map gives us the geographical (state by state) breakdown.
- It is useful to figure out the smallest unit of data: in this case, this is the percentage support of federal school vouchers by voters of a given race-religion-income-and-state category.
- The color scheme is such that red represents highest support and blue lowest support, with pink and purple in the middle
- It's almost always better to start from the aggregate (that is, the average) and then study variations along different dimensions, and this is how the chart is arranged from top to bottom
- On the top row, the higher income groups tended to favor vouchers more than lower income groups, with a break point around $75k; even here, the regional differences are significant, with northeast and southwest hotter for vouchers at all income levels
- As we move from row to row, we realize that the aggregate data hides many disparities. For example, white Catholics (second row) are more likely to support vouchers regardless of income level while white non-evangelical Protestants (fourth row) are much less likely than average to support vouchers at all income levels.
- Notice that the statistician (Andrew) has carefully defined the race-religion categories to balance between collapsing subgroups that are distinct and showing too many subgroups so as to cloud the patterns. That is why there are many more race-religion subgroups that are not shown. The ones shown are of special interest. Consider the white protestants, evangelical vs. non-evangelical (third and fourth rows). If one were to fix the race, geography and income dimensions, and even fix half of the religion dimension, we still find the two subgroups to be on different ends of the spectrum relating to the voucher issue. This is why the evangelical or not dimension has been included.
- The white space is interesting. Here, the issue faced by the statistician is sparse data when one gets down to multi-dimensional subgroups. Andrew chose to ignore all the data, which is the wise thing to do. With so few samples, it is particularly easy to draw bad conclusions.
- Because of the white space, we get additional information on the spatial distribution of the demographic subgroups. The black population (at least the voters) are predominantly found in the southeast while Hispanics are in the southwest. The subgroup of income higher than $150k is essentially all white. Admittedly, this is a very crude read because we only have two levels (below 2% of state population and above). Of the colored states, we cannot differentiate between densely populated and not.