« Losing count of Doomsday | Main | Is it random? »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341e992c53ef00d83511654a69e2

Listed below are links to weblogs that reference Convenience charting:

Comments

derek

I just noticed a really bizarre feature of the Washington Post graph: a little pair of break lines below "75" on the "percentage survival" scale, below which was a "0" next to the "childs age" scale.

I wonder what the graph's designer thought he was achieving by that? I suppose he had vaguely remembered that you're supposed to start a scale from zero, but hadn't quite thought it through...

meep

The other problem is if the kid dies before the other kids are born... so a family may have had 12 kids over time because they kept losing them. Looking at huge early age mortality, this seems likely to me. Little Mary died, so Ma and Pa try for another. =That= one died as a baby, so they go for it again.

Mike Anderson

Other than the differing scales, poor curve labels, weird shading, and impossibly vague reference, these are perfectly good--if idealized--survival curves. More sophisticated versions of these charts are the norm in many medical journals, as well as engineering journals, where they're called reliability functions. The transformed chart is nice, BUT NOBODY USES THAT KIND OF CHART.

Confidence intervals are nice, and the charter could have started off with ONE curve plus confidence interval to let the reader "calibrate his eye." But the story of these charts is how the survival function changes with number of children, and confidence bands would have cluttered a static chart to the point of unreadability.

Definitely NOT a junk or convenience chart, just a interesting chart poorly executed.

Jon Peltier

Derek: I find the break to help me know without thinking about the numbers that the axis doesn't really start at zero. So I don't think it's a defect.

Mike: I agree. As a former engineer, I'm familiar with survival curves. The reworked chart does nothing for me. I don't think it's depicting standard errors or confidence intervals, it just shows the distribution in a different way, a way I find harder to internalize quickly.

What's lacking in the original charts is the analysis. Meep took a good first cut at it ("a family may have had 12 kids over time because they kept losing them"). Likewise, maternal mortality was relatively high in the first month or so post partum, and the more kids she had, the more worn out a mother was, so the less likely she would be to survive yet another delivery. In addition, both mothers and fathers would be older for each subsequent birth, and the survival curves would be expected to be lower for older subjects.

Jon Peltier

Continuing the analysis of why the curves change as they do: If a kid had more siblings, it would enjoy a lower proportion of a parent's care. The parents would also be older and maybe less capable of providing adequate care

derek

By the way, Kaiser, I'm liking these "multiple one-dimensional scatter graphs" more and more as I use them and see them used; I produced a labelled one at work yesterday that really clarified the discussion we were having.

Is there a catchy name for them? If there isn't I think we should coin one and spread it around. How about "dash graph"?

Steven Citron-Pousty

I have to agree with Mike I found the original graphs easier to understand than the new graphs. The lines do an effective job of showing me decreasing survival over time and the 4 lines on the same chart help show the relative differences. I am not saying they are great charts but I think they show the information better than the new set...

derek

Jon, I know it's boring of me to say "Tufte" all the time, but I'm with Tufte on this one: if the fact that the scale doesn't go down to the origin needs emphasising, then don't let the vertical and horizontal scales touch. This designer went to the trouble of taking the vertical scale all the way down until it touched the horizontal scale, and then added an artificial gap, when a natural gap would have done the same job.

Actually, I see on closer examination that he also went to the trouble of making the horizontal scale extend past the zero mark just so that it could touch the vertical!

Robert

Survivorship curves are very common, but graphs of age-specific death probabilities are not at all rare--it just depends on what the researcher is focusing on. I've read the original PNAS paper; the WashPost figure on age-specific child survival is for sibship size, but there is also a figure in the paper for birth order.

Kaiser

I'll just add that we should feel for people like Dustin (and I'm sure many other readers) who do not have engineering or statistics training. Concepts of survival and censorship mean nothing to them.

On confidence levels, that would depend on the sample sizes, which in turn determine how wide those bands would be. I'm suggesting that the bands for 8-11, 12+ would be very wide given what I think is the low incidence of such families but I could be wrong.

One subtle thing that I changed was the data series on the horizontal axis (family size instead of child's age).

meep

Wait a sec -- and that mortality of mother by number of kids in family.... might it also be that women who have 12+ kids are =older= when they give birth to the last one? Even back in ye olden days women could be having children into their 40s (not likely, but it's possible).

I don't see that they're controlling for factors that would bias this data, such as having more children because of a factor such as having a farm to run (as opposed to a less labor-intensive store), or because for some reason the children keep dying at a young age and thus you go for more kids. And yes, people in ye olden days could control having or not having kids in the obvious manner.

Robert

Kaiser:

Mean children ever born in this population is 8.04, so the bands for 8-11 and 12+ aren't as small as you might think.

Meep:

The kid survivorship estimates were from Cox regressions with controls that included whether the parents were alive 5 years after the birth of the last kid, sibship size, and birth order. And this population is one known to be without much parity-specific birth control.

Robert

Kaiser:

I thought someone else had already mentioned this but in re-reading the comments I see that it wasn't specifically pointed out. In your original post, you said that the original WashPost figure had mis-labeled the child survivorship vertical axis with percentages rather than probabilities. Actually, survivorship curves are for the proportion surviving. (These are, of course, equivalent to the cumulative probability of surviving to a particular age)

Kaiser

Wow, we are getting technical here.

Robert: especially since the survival curves are not empirical but fitted, the vertical axis represents the "ideal" probability of survival. Hiding in there is the frequentist view of probabilities as the limiting case of proportions.

Robert: thanks for pointing out the mean family size; I didn't note this data was from the 19th century and also from a Salt Lake City-based database

I've re-read the article and noticed that the way they described the curve was something like "probability of a child dying by age 18" rather than "probability of a child surviving through age 18". This is one of those subjective things about graphing. Some of us will prefer the former concept; others the latter.


Robert

Kaiser wrote: "Wow, we are getting technical here."

Yes, and that's actually a remarkable thing. The study was pretty technical but the graphics in the WashPost, flawed though they are, make the topic quite accessible.

Jon Peltier

Derek -

"[I]f the fact that the scale doesn't go down to the origin needs emphasising, then don't let the vertical and horizontal scales touch."

Good point, but then you'd have to add a vertical line along the axis, so it was clear that it didn't extend all the way down, thus adding chart-ink. (That's a bit tongue-in-cheek: Unlike Tufte, I don't mind a few extra black pixels if it helps clarify, as does an interrupted axis, or one that doesn't touch the other.)

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues