Probabilities and proportions: which one is the chart showing
May 11, 2021
The New York Times showed this chart (link):
My first read: oh my gosh, 40-50% of the unvaccinated Americans are living their normal lives - dining at restaurants, assembling with more than 10 people, going to religious gatherings.
After reading the text around this chart, I realize I have misinterpreted it.
The chart should be read by columns. Each column is a "pie chart". For example, the first column shows that half the restaurant diners are not vaccinated, a third are fully vaccinated, and the remainder are partially vaccinated. The other columns have roughly the same proportions.
The author says "The rates of vaccination among people doing these activities largely reflect the rates in the population." This line is perhaps more confusing than intended. What she's saying is that in the general population, half of us are unvaccinated, a third are fully unvaccinated, and the remainder are partially vaccinated.
Here's a picture:
What this chart is saying is that the people dining out is like a random sample from all Americans. So too the other groups depicted. What Americans are choosing to do is independent of their vaccination status.
Unvaccinated people are no less likely to be doing all these activities than the fully vaccinated. This raises the question: are half of the people not wearing masks outdoors unvaccinated?
***
Why did I read the chart wrongly in the first place? It has to do with expectations.
Most survey charts plot probabilities not proportions. I haphazardly grabbed the following Pew Research chart as an example:
From this chart, we learn that 30% of kids 9-11 years old uses TikTok compared to 11% of kids 5-8. The percentages down a column do not sum to 100%.
Nice discussion of an important topic. Those who communicate statistical ideas should clearly state whether they are presenting conditional probabilities. There would be less confusion if the graph had a title such as "Vaccination Status for Participants in Various Activities." In other words, of the people who participated in these activities, here is a breakdown of their vaccination status.
To get the best of both worlds, we could include the proportion of people who do each activity. For example, 20% of people surveyed dined out, 15% of people surveyed gathered outside, and so forth. Then you could figure out the proportions in the population.
Posted by: Rick Wicklin | May 11, 2021 at 09:11 AM
Rick points out the biggest problem with the chart, one must first understand that these proportions are of the people who answered yes to each question. No information is given to assess the proportions of those who answered yes versus no. Presumably the data are available.
It is also interesting to note that the half vaccinated proportion is half or more of the fully vaccinated, implying a surge in vaccinations in the approximately two weeks before the survey. That says people are rapidly acting on their perceived "protected" status.
Posted by: Richard Krablin | May 12, 2021 at 10:49 AM
Nice discussion, though I'm not sure I misread the chart (I read past it to the discussion without fully analysing it, but I think I read it correctly first time, though adding the values down to 100 probably helped). Are you suggesting that the rows & columns should have been transposed? I'd have probably done this first since the labels are so much longer, and you'd have less columns. But can you explain "Most survey charts plot probabilities not proportions." in a little more detail pls? Thanks!
Posted by: Antro | May 15, 2021 at 10:56 PM
Antro: Thanks for raising this important point. It is not just a transpose. The calculations are different because the base of the proportions is different. In the NYT chart, the base is the subset of population that has undertaken an activity, and then it shows the proportion of that subset who are vaccinated and unvaccinated. In the alternative - more typical - presentation, the base is the subset of population with a specific vaccination status, and then it shows the proportion of that subset who has undertaken an activity. The first thing is notice is that 20% of the former base is not the same as 20% of the latter base since the base population are different.
Here's a current example to illustrate the difference (numbers are made up but directionally correct): maybe 60% of those who died from Covid-19 are 65 and above but we can't just transpose and say 60% of 65 and above have died from Covid-19... that is clearly way too high a proportion.
Posted by: Kaiser | May 16, 2021 at 02:31 PM