Try a new way of learning dataviz; course announcement
A little something while I'm away

Relevance, to you or me: a response to Cairo

Alberto Cairo discussed a graphic by the New York Times on the slowing growth of Medicare spending (link).

Medicarespend_combinedThe chart on the top is published, depicting the quite dramatic flattening of the growth in average spending over the last years--average being the total spend divided by the number of Medicare recipients. The other point of the story is that the decline is unexpected, in the literal sense that the Congressional Budget Office planners did not project its magnitude. (The planners did take the projections down over time so they did project the direction correctly.)

Meanwhile, Cairo asked for a chart of total spend, and Kevin Quealy obliged with the chart shown at the bottom. It shows almost straight line growth.

Cairo's point is that the average does not give the full picture, and we should aim to "show all the relevant data".


I want to follow that line of thinking further.

My first reaction is Cairo did not say "show all the data", he said "show the relevant data".  That is a crucial difference. For complex social problems like Medicare, and in general, for "Big Data", it is not wise to show all the data. Pick out the data of interest, and focus on those.

A second reaction. How can "relevance" be defined? Doesn't it depend on what the question is? Doesn't it depend on the interests and persuasion of the chart designer (or reader)? One of the key messages I wish to impart in my book Numbersense (link) is that reasonable people using uncontroversial statistical methods to analyze the same dataset can come to different, even opposite, conclusions. 

Statistical analysis is concerned with figuring what is relevant and what isn't. This is no different from Nate Silver's choice of signal versus noise. Noise is not just what is bad but also what is irrelevant.

In practice, you present what is relevant to your story. Someone else will do the same. The particular parts of the data that support each story may be different. The two sides have to engage each other, and debate which story has a greater chance of being close to the truth. If the "truth" can be verified in the future, the debate is more easily settled.

Unfortunately, there is no universal standard of relevance.


Going back to the NYT story. The chart on total Medicare spending is not as useful as it may seem. This is because an aggregate metric like this for a social phenomenon is influenced by a multitude of factors. Clearly, population growth is a notable factor here. When they use the word "real", I don't know if this means actualized (as opposed to projected), or "in real terms" (that is, inflation adjusted). If not the latter, the value of money would be another factor affecting our interpretation of the lines.

Without some reference levels for population and value of money, it is hard to interpret whether the straight-line growth implies higher or lower spending intensity. For the second chart, I suggest plotting the growth in the number of Medicare recipients. I believe one of the goals of the Affordable Care Act is to reduce the ranks of the uninsured so a direct depiction of this result is interesting.

The average spend can be thought of as population-adjusted. It is a more interpretable number -- but as Cairo pointed out, it is also narrow in scope. This is a tradeoff inherent in all of statistics. To grow understanding, we narrow the scope; but as we focus, we lose the big picture. So, we compile a set of focal points to paint a fuller picture.




Feed You can follow this conversation by subscribing to the comment feed for this post.

Jon Peltier

One more graph would make the analysis more complete: Total Medicare Recipients.

The comments to this entry are closed.