Pick-and-choose
Apr 08, 2008
Gelman pointed to this Brendan Nyhan post dissecting David Sirota's chart purportedly showing a "race chasm" in the Democratic primaries. The left chart is David's original and the right is a Nyhan revision.
Please see Nyhan for the political interpretation. Here, I want to note a number of improvements Brendan made to the chart:
- Sirota plotted the ranks of the percent of black population, which is misleading. Nyhan plotted the actual percentages on his horizontal axis
- Sirota connected the dots which highlighted the noise (ups and downs) in the data. Nyhan fitted a linear model (he also tried other non-linear versions).
- Sirota plotted Obama's overall margin of win/loss. Nyhan plotted his margin among white voters only, which more directly addressed the issue.
- Nyhan exposed the excluded states in a footnote. Sirota didn't. For this chart, this piece of information is very important since so many states were excluded.
Nyhan walked us through multiple charts he used to explore the data. Much of the time was spent picking and choosing states to include or exclude. We learnt that Sirota excluded states with large Hispanic populations, which Nyhan disagreed with while Nyhan wanted to exclude Florida, which Sirota decided against, even though Sirota excluded Michigan, which Nyhan consented but Nyhan also wanted to exclude the causus states, and so on...
Judging from the charts, this picking and choosing appears not to have changed the outcome in this case. In general, one should exercise great care in such decisions because one might end up seeing what one wants to see.
The following chart is missing from the post, which I think points out something more telling than the negative correlation between Obama's margin with white voters and the proportion of black population.
No relation with this post; but here is a great chart : http://pixdaus.com/single.php?id=37189&frm=krsn
Posted by: V | Apr 09, 2008 at 02:10 AM
I would argue that Sirota's chart is ineffective as well. Although it is better than Gelman's chart, Sirota's chart lacks proper labeling and isn't reader friendly. For the most part, we're making charts to help the average person understand a topic, not chart junkies. And then there is the orange line. The reader shouldn't have to guess what it means. By the way, I don't know what it means. Maybe the reflects poorly on me. Sirota uses the percentage of the black population for the horizontal axis, but labels it in an obscure way. Why not just label it with a percentage sign? Same thing with the vertical axis. A lot of people will not take the time to dissect a chart if it requires a lot of work to interpret it. We're making charts to be read and understood after all. I would label the vertical axis as percentage of the vote and keep the win/lose line running horizontally. Another fault is that Sirota totally disregards the black vote. I then would propose color coding the votes for white and black people. So two dots for each state. That will better show the the "race chasm." Plotting only white voters does not give us any insight into the black vote. For all the reader knows, the black voters voted the same way as white voters. To further help the readability of the chart, I would create a grid in five percent intervals to help the reader interpret the value of each dot. I would run a .5 pt gray line for 10 percent intervals and a .25 pt gray line for the five percent intervals in between those. Using the gray lines will cut down on visual noise. For the major axis lines I would use a 1 pt. black line. This will dramatically improve the readability of the chart. Also, the only states I would disregard are Michigan and Florida. They were the only states to screw up their primaries.
Posted by: GJ | Apr 10, 2008 at 02:43 PM
Sorry, I transposed the two names. I was referring to Nyhan's chart, not Sirota's.
Posted by: GJ | Apr 10, 2008 at 02:46 PM