## Discoloring the chart to re-discover its plot

##### Apr 05, 2018

Today's chart comes from Pew Research Center, and the big question is why the colors?

The data show the age distributions of people who believe different religions. It's a stacked bar chart, in which the ages have been grouped into the young (under 15), the old (60 plus) and everyone else. Five religions are afforded their own bars while "folk" religions are grouped as one, and so have "other" religions. There is even a bar for the unaffiliated. "World" presumably is the aggregate of all the other bars, weighted by the popularity of each religion group.

So far so good. But what is it that demands 9 colors, and 27 total shades? In other words, one shade for every data point on this chart.

Here is a more restrained view:

***

Let's follow the designer's various decisions. The choice of those age groups indicates that the story is really happening at the "margins": Muslims and Hindus have higher proportions of younger followers while Jews and Buddhists have higher concentrations of older followers.

Therein lies the problem. Because of the lengths, their central locations, and the tints, the middle section of each bar is the most eye-catching: the reader is glancing at the wrong part of the chart.

So, let me fix this by re-ordering the three panels:

Is there really a need to draw those gray bars? The middle age group (grab-all) only exists to assure readers that everyone who's supposed to be included has been included. Why plot it?

The above chart says "trust me, what isn't drawn here constitutes the remaining population, and the whole adds to 100%."

***

Another issue of these charts, exacerbated by inflexible software defaults, is the forced choice of imbuing one variable with a super status above the others. In the Pew chart, the rows are ordered by decreasing proportion of the young age group, except for the "everyone" group pinned as the bottom row. Therefore, the green bars (old age group) are not in a particular order, its pattern much harder to comprehend.

In the final version, I break the need to keep bars of the same religion on the same row:

Five colors are used. Three of them are used to cluster similar religions: Muslims and Hindus (in blue) have higher proportions of the young compared to the world average (gray) while the religions painted in green have higher proportions of the old. Christians (in orange) are unusual in that the proportions are higher than average in both young and old age groups. Everyone and unaffiliated are given separate colors.

The colors here serve two purposes: connecting the two panels, and revealing the cluster structure.

You can follow this conversation by subscribing to the comment feed for this post.

Scatter plot?

Thinking about the blocks of age distribution I really question the bins chosen, especially that under 15 bin.

The study was actually a demographic survey of global census data, so I wonder, for each individual census, was the under 15 actually actively involved or was it just the parent, stating the under 15's religious affiliation?

Anywho, I would want to really penetrate the methodology before making any broad conclusions if this is an example of the analytical rigour.

Reminds of a Stephen Few quote that I heartily agree with"

"If you’re inclined to think that a data visualization should limit itself to a single graph, it’s time to leave that constraint behind. The best solutions often require multiple graphs."

(Whether a given piece of software plots these two charts together as one entity or not, this solution is two separate charts, as far as I am concerned)

I also agree with Michael that I find the data itself questionable, considering the focus on under 15, and the dubiousness of their involvement and/or honesty in the process. I am having a hard time finding much useful to actually say from the data, whether plotted well or poorly :)

I don't think I would have set up the colors this way - the connections seem a little weak.

But the one thing I would have done differently for sure is to remove the "Everyone" bar, and instead plot that value as a reference line. I find it considerably easier to comprehend the relationship to a comparison value in that way - each bar is more directly compared to the one line that crosses them all.

I appreciate the more concise view provided by the new dual graph, concentrated color. What I think would clarify the data would be an additional bit of information of family size as the other responses are correct about the viability of the under 15 years old answers. I would offer up that family size choice ( of lack of choice) is a cultural difference with religion playing a significant part. That is something that these two graphs cannot show but might be useful to provide.
Great post, though, very useful. Its just we data folks can never leave well enough alone and want to know the rest of the story.

The comments to this entry are closed.