## Bubbles of death 2

##### Feb 22, 2007

Here is an alternative way to present the death risk data.  It's a variation of Tukey's stem-and-leaf plot.  Instead of presenting the exact odds, I believe it is sufficient to generalize the data by grouping them into categories.  Not much is to be gained by knowing that the odds of dying from fire and smoke is 1 in 1113 as opposed to the odds being in the range 1 in 1000 to 1 in 10,000 and comparable to that of drowning, motorcycle accident, etc.

PS. Be sure to look at Derek's chart in the comments.

You can follow this conversation by subscribing to the comment feed for this post.

Maybe it's just me, but your table seems to be lying.
It says that anyone has a one-in-one chance of dying of heart disease or cancer. Well based on the Bubbles of Death I can easily see that the chances are one-in-five and one-in-seven respectively. Shouldn't your attempt to clean up the understanding of the original chart actually improve our understanding of the data?

I agree with Kaiser that most of the quoted probabilities are inappropriately precise. I see Omomyid's point about 1:1 being a strange probability for something that not everyone dies of, but I'd just fix that by multiplying all the categories so that they describe the outer limit and not the inner. So "1" becomes "10", "10" becomes "100", and so on until "100000", which becomes "1,000,000" or "1 million".

Alternatively, you could leave all the other number categories where they are, and just change the "1" to read "<10", or physically move the labels so that they are displayed staggered between the described cause groups instead of beside them.

If the very high probability causes of death are of special interest, perhaps they could be divided into smaller bins, 1/1-1/3, 1/3-1/10, 1/10-1/30, and 1/30-1/100. Beyond that, though, I agree that there's not a lot of value in learning that the probability of death from fireworks discharge is in the 1/300k-1/million bin, instead of the broader 1/100k-1/million bin.

What about making this a "ranked" stem-and-leaf similar to Subjectivity.
Vertical axis would essentially be a logarithic bins & horizontal axis would be rank within the bin (i think it would be most natural to have highest rank nearest the stem... so the "origin" would have: "any cause" then mid-rank in that bin would be "heart disease"

m4tts1m, like this?

The weakness in Kaiser's original table and the strength of Derek's example are the difference between a histogram with the label directly under the bar (a default Excel column chart) and one with labels between adjacent bars (Mike Middleton's Better Histogram).

derek, here's what i was thinking.
i think yours is cleaner then mine, but its too close to a graph, and i think this data is too general for that. My attempt satisfies my desire for bins AND a spread... but its not as clean as i had originally envisioned. o-well.
Tho, one advantage of the clumpiness is in seeing what causes are essentially equal.

I don't like m4tt's 2D chart, it's hard to make sense of the data this way, IMHO. Derek's chart is nice, though the logarithmic scale is always tricky. But it makes more sense to me than having a different order of magnitude on each line. I wonder if there's an inherently better way to show numbers that are orders of magnitude apart, and that does not require figuring out the numbers. Maybe the original "junk" chart wasn't so bad after all ...

And on an unrelated note, I wanted to point you guys to another website: Pictures of Numbers. Mike over there has some great examples of junk charts, with very thorough analyses of what's wrong. This is not to distract from this fine site, of course, but it's a great complement.

m4tt, yes I see what you mean now; I failed to read the part about bins plus position in bin. I don't think "rank" is the word you meant, though: rank is taken care of in Kaiser's original table, 1, 2, 3... I think you mean relative logarithmic value within bin.

I like the style of your table/graph, it's a genuine halfway house between my ranked dot plot and Kaiser's table, and it addresses what I see as a significant weakness of these ranked dot plots that span large logarithmic intervals, which is their tendency to occupy a large rectangular space with their diagonal pattern, without making good use of most of it. I wonder if I could recast my Solar System objects dot plot in that form?

I agree with your table. My eyes are focused on that motorcycle accidents part. I like riding in the sun on my motorbike a lot. My folks told me to be a little bit more cautious on my riding because they know that I go fast. They've consulted an Oakland motorcycle accident lawyer in case I'd have an accident on the road. I'm supposed to say that there's a slim chance of me being in an accident, but I guess it's better to be safe than sorry.

Even though only 1 out of 10 people die from vehicle-related accidents, we should take note that there are bigger chances that this may happen. It can happen to you whether you are healthy, or sick with a deadly disease.

The numbers prove that despite the policies, there are just people who are negligent and stubborn enough to ignore road rules. Take for instance those who drink and drive. I can see that there is truth in your table.

The comments to this entry are closed.