Chapter 1 of Numbersense (link)uses the example of U.S. News ranking of law schools to explore the national pastime of ranking almost anything. Since there is no objective standard for the "correct" ranking, it is pointless to complain about "arbitrary" weighting and so on. Every replacement has its own assumptions.
A more productive path forward is to understand how the composite ranking is created, and shine a light on the underlying assumptions.
***
The New York Times recently published an article entitled "What's the Matter with Eastern Kentucky?" (link). The problem with Eastern Kentucky, as the reporter saw it, is that those counties rank at the bottom of their list. Here is their ranking methodology:
The team at The Upshot, a Times news and data-analysis venture, compiled six basic metrics to give a picture of the quality and longevity of life in each county of the nation: educational attainment, household income, jobless rate, disability rate, life expectancy and obesity rate. Weighting each equally, six counties in eastern Kentucky’s coal country (Breathitt, Clay, Jackson, Lee, Leslie and Magoffin) rank among the bottom 10.
There is a companion blog at The Upshot, giving more context, and a county-level map of the ranking (link). Here are the relevant sentences.
The Upshot came to this conclusion by looking at six data points for each county in the United States: education (percentage of residents with at least a bachelor’s degree), median household income, unemployment rate, disability rate, life expectancy and obesity. We then averaged each county’s relative rank in these categories to create an overall ranking.
(We tried to include other factors, including income mobility and measures of environmental quality, but we were not able to find data sets covering all counties in the United States.)
We used disability — the percentage of the population collecting federal disability benefits but not also collecting Social Security retirement benefits — as a proxy for the number of working-age people who don’t have jobs but are not counted as unemployed.
How should we read this article?
***
What is this a ranking of? What is the research question? The answer is "how hard it is to live in specific counties". Right away, we know any answer is subjective, even if data is proffered.
Look out for the relative weights. The authors tell us it's equally weighted. "Equal weighting" implies fairness but frequently hides the inequity. Are those six factors equally important? Are there strong correlations among some of those factors?
The blog post discloses that each of the six metrics is first converted to ranks before being averaged. This means we need to worry about how much each metric vary from county to county. Take obesity rate for example. Here is a map of obesity at the county-level published by the CDC, based on a model estimate (link).
The people who made this map placed the counties into five groups. The middle groups are narrowly defined, for example, 29.2% to 30.8%. Any analyst who converts the county-level obesity rates to ranks makes over 3000 gradations of obesity rate. Said differently, the worst county is rated as over 3000 times worse than the best county. In the case of obesity, the medical community would consider most of these counties unhealthy.
This is an example that shows too much granularity hurts you, a core insight of statistics that may seem counterintuitive.
***
Ultimately, it's for you to decide whether you believe this ranking makes sense or not. I'm not here to dismiss it because as I said in Numbersense (link), you can replace this methodology with something else, but the new method will also have its own assumptions.
Recent Comments