In a recent post on the dataviz blog, I featured a project by FiveThirtyEight on the impact of potential state-specific abortion bans. For comments on the visualization, please see the other blog post. This post discusses the analytics behind the visualization.
The data journalists measure the impact on women wanting abortions along two dimensions: "distance" and "congestion". Distance is the distance to travel to the nearest abortion clinic, and congestion is represented by a proxy, which is the service capacity of the said abortion clinics. (To be precise, the authors said they counted "the number of reproductive-age women that each clinic serves, which serves as a rough stand-in for appointment availability.")
I'd rather they call a spade a spade. I don't think much is lost by calling the second metric "capacity". Congestion is a result of matching supply and demand while they only have data on supply.
***
The smoothness and niceness of the dataviz hide some of its underlying assumptions (and thus biases). As consumers of dataviz, we must recognize them.
Look at the following pair of map excerpts. They both feature Arizona and its neighbors. The left panel shows the current state while the right panel is the projected state if Arizona should ban abortions. I have highlighted Pima county in which there is an abortion clinic.
In this example, the county went from the lower left corner of the color legend to the upper right. Women who have easy access to a clinic today would have to travel much further (266 miles) to find one in the future. In addition, that substitute clinic in New Mexico has a much lower capacity than the one in Pima county.
It appears that they are plotting the absolute values of distance and capacity. How about plotting the relative changes? How much further would women have to travel relative to the current state? What's the difference in capacity between the current clinic and the substitute clinic?
***
Clicking around, I dwelled on Coconino county, which showed something odd.
Notice that the color for Coconino county went from dark green (current) to lighter green (post-ban). It appears as if Coconino residents would benefit from an abortion ban. How could it be?
Digging into this reveals assumptions that the dataviz designer must make when producing such a chart. Clark county on the left panel is rated as low congestion (i.e. high capacity) while Maricopa county is rated as medium capacity. Both counties are rated low distance from Coconino county. And yet, the algo assumes that women in Coconino prefer the facility in Maricopa.
We don't have detailed data on every woman seeking service at these clinics. And if we do, we'll find that they don't all go to the same clinic. The designer is aiming for the average behavior here. Someone else may decide to connect Coconino to Clark county in the current state. These two designers would have justified their algos by arguing their assumptions.
In the case of FiveThirtyEight, there can be a number of reasons why Maricopa is preferred: state loyalty, insurer preference for in-state providers, the drive to Maricopa may be nicer, safer, and/or faster than Clark county, the capacity in Maricopa may be sufficient, the types of services offered by the two clinics are not the same, etc. etc.
It would be interesting to learn what proportions of the counties have the same predicted outcome as Coconino, i.e. the women would have benefited from an Arizona ban, at least according to the two criteria being considered.
***
There is another assumption revealed when comparing the pre- and post-ban Arizona. Notice that the colors of all surrounding states do not change at all.
This highlights what I pointed out before. "Congestion" is not congestion. It is "capacity". If it is congestion, then the fact that a whole county's women from Coconino are now expected to show up at Clark county's clinic would have caused more congestion in Clark.
***
In short, in order to make this data visualization, the designer must make assumptions about people's behavioral choices. There is an underlying optimization model that decides what women would do depending on where they live, and which states have bans. Not having such a model creates so many possibilities that the graphic would become too complex.
Comments