## Dot plots are under-valued, that's all

##### Aug 30, 2016

Bar charts are over-used and over-rated. Just casually, I found this example at US News:

Are you comparing bar widths? Or the printed data?

Here is a dot plot:

These are about proportion within groups, but I'd like to know the size of the groups as well. Which of these proportions matters most to the election?

Is there a particular reason you'd prefer a dot plot to a bar chart that shows just the delta between the two candidates? It seems like it'd be cleaner to show a bar going left for Clinton's margin and going right for Trump's margin. (I'm ignoring the "neither" option given the small %.) Also, I concur with Jon that the size of the populations matters to some degree and doesn't get expressed in either chart. Is there a good way to show that?

Third, not that it's your doing, but it's conspicuously missing other religious affiliations - Jewish and Muslim, for example. And maybe even muddying the waters by mixing national origin with religion (Hispanic catholic, e.g.) It seems possible the important characteristic isn't religion with that particular group so much as national origin owing to Trump's stance on immigration.

@jon_plummer. I completely agree, however I am asking myself how might I do this? If you scaled the dots relative to the size of potential votes, you can answer the question. But it is well known that it is difficult to tell the difference between the areas of two circles (outside of the extreme cases). Additionally the Reader would want to mentally sum the size of the circles to get an overall winner for these groupings.

Some discussion on this would be great as this type of problem comes up all the time!

I would just go for the good old fashioned clustered bar chart. I think for size of each group just adding n=??? to the name label would work. This can even be done in Excel.

The reason for not including Muslims and Jews is that at 1% and 2% respectively the numbers are not large enough. A typical larger survey is 3000 so that is 30 Muslims and 60 Jew approximately, so not enough to get an estimate on voting intention.

@Ken, totally agree that Jewish and Muslim would be small proportions of the population, but I perceive the absence of that data, using Kaiser's DVQ model to be a "Q" issue. That is, what question is this chart answering Is it how does religious affiliation impact voting preferences? In which case showing polarization even among small demographic groups provides useful information. Or is the question how is the population going to vote? In which case, a top-line number produced from a representative sample tells the story. Or is it something else (race, national origin, etc.). And if this graphic came from an article titled something like "how does religion impact the presidential election" they'd be foolish to assume that a simple stratification like this provides good estimates of the religion effect.

What I perceive the original chart designer did was take some data and stratify it by some aspect naively, ignoring that for this particular election one's preferences for a candidate may driven primarily by one of many factors. I'd guess unaffiliated votes are also younger voters (and they tend to skew democratic anyway), for example. To me, dot plot or bar plot may be irrelevant... it's overly reductive to the complexity of the question at hand.

@Adam At some point the noise overwhelms any information. Given that they would not like to explain that their results were only meaningful for one particular group to say +/-15% or even worse, it wouldn't provide anything useful. While I suggested that their survey may have been 3000 subjects, it also may have been 1000 in which case information on the minorities would be zero.

This is the problem that people designing quality surveys face if they want to gain data on minorities. They need to design sampling schemes that oversample these groups. The tradeoff is that more total subjects are required.

@Ken, totally agree. Having meaningful data would likely mean needing to oversample to get good representations for these small subgroups. But that's sort of my point. If this is a graphic about how religious preferences are related to voting preferences, good data is needed and that means gathering data necessary to answer the question at hand. Since the article is making the case for the decline of the influence of the Christian voter, it'd seem important to attempt to estimate the religion effect well.

I suspect the article confounds "white" and "Christian" in some odd ways, sort of conveniently ignoring that Hispanic Catholics tend to heavily favor Clinton, which suggests some evidence that the issue might not have been religion in the first place.

And the article text reads "While the majority of immigrants to the U.S. are Christians, the share of Christians has declined, while the share of other religions, including Islam, Hinduism and unaffiliated, has grown." So you'd think if the article wants to make that claim it ought to back it up with some data... I guess it depends on whether you want good research or simply something good enough to be called a news story. :)

I initially had trouble interpreting the dot plot because the positions of the 'Clinton' (top left) and 'Trump' (top right) overrode (in my mind) the use of red and blue to indicate which candidate the dot represented. So I kept interpreting the dots on the right side as representing Trump and the ones on the left as representing Clinton.

Maybe the dots just need to be bigger so their colours are more conspicuous. Better would be to also put a 'C' or 'T' inside each dot to remind the viewer who it represents.

Or a double bar chart, with the Trump and Clinton bars side-by-side on the same axis.

Rosie: I like the suggestion of replacing the dots with C/T or some kind of image.

This might be a stupid question, but I didn't understand the original numbers. They are percentages, but the sum makes more than 100%. What am I missing here?

Franklin: the numbers add up by row. There is a missing category, probably voting for third-party candidates or refusal to respond, which I have added to the new version via the vertical bar.

One of the cognitive issues with the original bar charts is that it encourages readers to read by column, seeing those bars as a kind of histogram. This then exacerbates the problem discussed above related to the vastly different sizes of the underlying demographic segments. In the dot plot, readers are subtly guided to read by row.

Rosie: I like the suggestion of using C and T instead of dots to further enhance the dot plot.

Adam/Ken/others: To continue the above discussion, I'd try using a treemap to plot the underlying distribution of religions and then within each rectangle, use a (gasp) pie chart to show the candidate preference. Ultimately, it depends on what message you are trying to convey with the visual.

