## Avert your eyes

##### Jul 08, 2013

Reader omegatron came back with another shocking instance of a pie chart:

Here is the link to the AVERT organization in the U.K. that published the chart and several others.

For the umpteenth time, the pie chart plots proportions. All proportions are percentages but some percentages are not proportions. The data here would appear to be "rate of diagnosis" rather than proportion of diagnoses by age.

The data came from Table 3a of this CDC report (link), and they are clearly labelled "Rate". The footnote even disclosed that the "Rate" is measured per 100,000 people so they are being mislabeled as percentages.

Let's summarize. The percentages add up to much more than 100%, they are clearly not proportions, they are not even percentages, they are rates per 100,000.

omegatron even got confused by the colors. You'd think that the slices would be arranged by age group but no! The order of the slices is by size of the pie slices, with one exception--the lime green slice of 11.4%, which I cannot explain. In practice, this means the order goes from Under 13 to 13-14 to Over 65 to 60-64 to 50-54, etc.

A smarter use of color here would be to stick to one color while varying the tinge acccording to the rate of diagnosis. Using 13 colors for 13 age groups is distracting.

Here is the same data using a column chart:

***

As a teacher, it's shocking that such pie charts continue to see the light of day. It's very disappointing, as I'd assume every teacher who teaches the pie chart will have pointed out the pitfalls. Why is this happening?

***

With this chart, I'm mostly baffled by the top corner of the Trifecta Checkup. What is the point of this data? If I understand the "per 100,000 population" definition, these rates are computed as the number of diagnosed divided by the population in each age group. So the diagnosis rate is a function of how many people in each age group are actually infected, and how effective is the diagnosis procedures, and whether that effectiveness varies with age. Plus, the completeness of reporting by age group (the footnote acknowledged that the mathematical model does not account for incomplete reporting. To call a spade a spade, that means the model assumes complete reporting.)

The rate of diagnosis can be low because the rate of infection is low or the proportion of the infected who gets diagnosed is low. I just can't conceive of a use of data that confound these factors.

A time series treatment would be interesting althought that addresses a different question.

### Comments

You can follow this conversation by subscribing to the comment feed for this post.

You don't even mention the problems with the categories themselves.

Anyone aged 65 years of age would not have a place in this chart. Too old for the 60-64, and too young for the "over 65"

bob: thanks for mentioning it. I wrote, then deleted the sentence about age categories. The other crazy category is 13-14 when all other non-edge groups are 5-year groups. This issue originates from the CDC data though.

avert.org appears now to have seen the error of their ways, and (presumably) spent some of their donors' money on creating bar charts instead. "Do it nice, or do it twice".

The comments to this entry are closed.