Learn EDA (exploratory data analysis) from the experts
Some chart types are not scalable

The closer you look, the more confused

A twitter follower submitted this chart showing the shift in ethnicity in Texas:


If you blinked, you probably took away the wrong message. Our "prior" tells us that the proportion of Hispanics has been rising quite rapidly in Texas. So, like me, you might hone in on the blue columns which has increased drastically from 32% to 68%.

Things start to fall apart.

First, you might notice the blue label said "Non-Hispanic Whites," which is exactly the opposite of our hypothesis. For a moment, we are confused. Could it be that the Hispanics population in Texas has been shrinking?

Then, you might notice that the "information in our head" made us assume that the horizontal axis represents time. On a closer look, we discover that it's not time; what's being plotted from left to right are age groups. In fact, it's kind of a reversed time. The generations on the right side were born earlier and represent the ethnic distribution today of people born over 60 years ago while the columns on the left represent younger generations.

Finally, the gray columns are redundant and distracting.

On the other hand, the designer is admirably restrained with data labels, and included the baby and crooked man with a stick icons to provide some guidance, both of which are good ideas.


If I apply the Trifecta checkup to this chart, the biggest issue is misalignment between the interesting question of ethnic changes in Texans and the data used to explore this question. The current ethnic mix is not only impacted by the ethnic composition at birth but also by net migrations of different races and by their longevity. As pointed out above, the split by age groups forced us into a kind of reversed time thinking.

A simple fix involves expressing ages as birth years, and using a single line instead of columns:


This version doesn't address the tendency to interpret the left-right axis as time, and the excessive number of age groups.

An even better chart would put time on the horizontal axis, then have multiple lines each representing the proportion of non-Hispanic whites of a specific age group. It may be a political choice--I'm not sure why they chose to plot the declining proportion of non-Hispanic whites and lump Hispanics into "all others" as opposed to plotting the increasing mix of Hispanics.




Feed You can follow this conversation by subscribing to the comment feed for this post.


It seems my comment got mangled there near the end. I meant to say "a decreasing proportion of Caucasians (aka white non-Hispanics)".


Ok I'll retry, feel free to delete the previous.

Seems to me the focus is not, as you may have assumed, the increasing proportion of Hispanics, but instead the decreasing proportion of Caucasians (aka non-Hispanic Whites).


Yet my label is again confusing as there surely are some Hispanics who may see themselves as Caucasians. Thus Non-Hispanic Whites and Others best characterizes the intended split.

The comments to this entry are closed.