Sloppy statistics
Aug 23, 2008
As hinted in the previous post, there are rare situations in which pie charts are acceptable; typically, these charts must show proportions that add up to 100%. If column charts (or line charts) are used instead, readers who aren't careful may assume incorrectly that the columns add up to the whole.
Pie charts show distributions. How should one state the key message of the following pie chart?
I. Type A is the majority.
II. The most frequent type is Type A.
III. Type A is a minority.
IV. Every other type but A form the majority.
I would pick statement II, followed by statement I. Statement I is the only false statement out of the four if one uses a strict definition of "majority" (more than half). If one goes by the spirit rather than the word of the law, statement I does pick up the key message albeit imprecisely. Statement III is a true statement but particularly misleading in the context of this pie chart. For every type is a minority type if we define "minority" as less than half. Statement IV is a tortuous way to define a "majority" where there is none.
Neither III nor IV points to a key feature of the data. It seems ridiculous to even include them. Lets reveal the underlying data.
Last week, a story coursed through the mainstream media, relating to the above projections published by the Census Bureau. (Projections were created for 2050 but mention was made of the fact that the largest racial group would account for less than half the population by 2042.) Here were some of the headlines:
"2042 to see a white minority" (New York Post, 8/14/2008) -- III
"Minorities fixed to become new majority" (Daily Vidette, Illinois State University, 8/20/2008) -- IV
"US set for dramatic change as white America becomes minority by 2042" (Guardian, 8/15/2008) -- III
"...minorities collectively will make up the majority of people in America by 2042..." (Detroit Free Press, 8/21/2008) -- IV
Like I said, statement III is strictly speaking true but by 2042 every race is projected to be a minority. Statement IV is just odd: of course, if one started adding up enough "minority" types, one will eventually attain majority.
Not all is lost, however. The following headlines painted a more vivid image:
"Whites to lose majority status in US by 2042" (Wall Street Journal, 8/14/2008)
"White Americans no longer a majority by 2042" (Associated Press, 8/13/2008)
Elsewhere, a Boston Globe column makes an important observation: that Hispanic whites should probably be grouped with whites rather than Hispanics. Technically, he argued that Hispanic is not a race. From his point of view, the pie chart looks like this:
the boston article, i really dont get it... there are 22 millions *spanish* in the us? how can it change 25 pct of the us racial breakdown ? people are increasingly "white" ? does the guy realize how the view he puts in Franklin's mouth are complete historical non sense if he considers a bit of non us history.. ? and then one day races do not exist and that will be heaven, what ?
the guy's supposedly only conservative in the boston globe, may be that is why he tries to outBS his audience..
Posted by: nicolas | Aug 24, 2008 at 03:02 AM
The sloppy approach to statistics that leads to pie charts extends to sloppy analysis of those pie charts.
Posted by: Jon Peltier | Aug 24, 2008 at 09:44 AM
I think your criticism would have been moot if you had bothered to mirror what these articles do, which is compare the projected situation in 2050 with today. Currently, the US population is made up of about 80% non-Hispanic whites, so all the other races/ethnicities only make up a total of 20%. That is an absolute majority, and a large one at that.
The projection says that by 2050, whites will no longer have that absolute majority, and that is a big change. That will have huge implications on the kind of politics made in this country, the culture, language (think the discussion about making English the official language was stupid? You ain't seen nothing yet!), etc. The US are clearly dominated by white anglo-saxons right now, and the changes that are bound to happen will not go over quietly, and there will be a lot of resistance.
Of course, these overall statistics never describe every place the same way. The city where I live is about 35% black and slightly less than 50% non-Hispanic white right now. A cut from 4 out of 5 to 2 out of 5 white people across the country will mean huge changes, making white folks a rather small minority in many places.
So I completely disagree with your analysis. The definition of a majority is unambiguous, and there are terms for what you describe (absolute and relative majority, plurality vs. simple majority, etc.). The pie chart here works perfectly well by showing the relative sizes of different groups that make up the entire population. The rest is a comparison with where we are right now, and the realization that that will have consequences beyond the angles on a pie chart.
Posted by: Robert Kosara | Aug 24, 2008 at 10:42 AM
Robert: I'm not arguing the point that there will be drastic and discomforting changes. I'm saying that the statement "whites to lose majority status by 2042" is much more responsible than the statement "2042 to see a white minority". Both statements are true but not equal. This is a common theme in statistics: there are many true statements that paint a partial and thus misleading picture of the underlying data. Simpson's paradox is one famous example.
Using this type of logic, we are led to conclude that the U.S. is a minority sports power since it won less than 50% of the Olympic medals, and that the collection of countries not called U.S. is the majority power. Completely nuts, in my book.
Posted by: Kaiser | Aug 24, 2008 at 04:50 PM
How about:
V. Type A represents a plurality?
Less tortured--more correct!
Posted by: Tom Webster | Aug 25, 2008 at 08:20 AM
I haven't looked at the referenced Census press release or data, but I'll say a couple of things anyway.
One point is alluded to in the mention of a Boston Globe column: that Hispanic isn't a race. The current standard on data collection of race and ethnicity is to collect *two* separate pieces of data: Ethnicity (Hispanic / non-Hispanic) and Race (a pick from a short list). Some versions of Race identification use a "multi" category, while others do not. Generally, even those that do use multi don't track the components of that multi-race. So, someone can be Hispanic *plus* something else, or non-Hispanic plus something else, but there wouldn't be tracking of say Asian plus something else.
There's also a self-identification issue. I've seen data for twins where one twin was coded "black" and the other twin was "white". I'm guessing both are really "multi". The identification sometimes changes from year to year, too, for various reasons.
As in this analysis, there is the question of how to combine data on Ethnicity and Race. Often, when the data are combined, Hispanic trumps Race, so the race categories are effectively a disaggregation of non-Hispanic. If it's not combined into one data point, then it's typically an attempt to (as was mentioned above) identify the white/non-white split.
I can't say I agree with attempts to focus on the Hispanic/non-Hispanic split and disaggregations any more than I'd agree with a focus on a White/non-White split.
I'll also note that even here, where even the anti-piechart folks think a piechart might have something to say, piecharts are misleading. While they could be used to effectively demonstrate the reduced percentage of whites in the country, it doesn't show that it's primarily because the projected pie is bigger -- other categories than non-Hispanic whites are projected for faster growth in the coming years.
Posted by: DQKennard | Aug 26, 2008 at 03:43 PM