In China, 2019 is the Year of the Pig. Half of the world's pigs live in China. This graphic is inspired by this BouncyMaps project, which generated the following cartogram.
Tim Harford tweeted about a nice project visualizing of the world's distribution of population, and wondered why he likes it so much.
That's the question we'd love to answer on this blog! Charts make us emotional - some we love, some we hate. We like to think that designers can control those emotions, via design choices.
I also happen to like the "Population Mountains" project as well. It fits nicely into a geography class.
1. Chart Form
The key feature is to adopt a 3D column chart form, instead of the more conventional choropleth or dot density. The use of columns is particularly effective here because it is natural - cities do tend to expand vertically upwards when ever more people cramp into the same amount of surface area.
Imagine the same chart form is used to plot the number of swimming pools per square meter. It just doesn't make the same impact.
2. Color Scale
The designer also made judicious choices on the color scale. The discrete, 5-color scheme is a clear winner over the more conventional, continuous color scale. The designer made a deliberate choice because most software by default uses a continuous color scale for continuous data (population density per square meter).
Also, notice that the color intervals in 5-color scale is not set uniformly because there is a power law in effect - the dense areas are orders of magnitude denser than the sparsely populated areas, and most locations are low-density.
These decisions have a strong influence on the perception of the information: it affects the heights of the peaks, the contrasts between the highs and lows, etc. It also injects a degree of subjectivity into the data visualization exercise that some find offensive.
The background map is stripped of unnecessary details so that the attention is focused on these "population mountains". No unnecessary labels, roads, relief, etc. This demonstrates an acute awareness of foreground/background issues.
4. Insights on the "shape" of the data
The article makes the following comment:
What stands out is each city’s form, a unique mountain that might be like the steep peaks of lower Manhattan or the sprawling hills of suburban Atlanta. When I first saw a city in 3D, I had a feel for its population size that I had never experienced before.
I'd strike out population size and replace with population density. In theory, the sum of the areas of the columns in any given surface area gives you the "population size" but given the fluctuating heights of these columns, and the different surface areas (sprawls) of different cities, it is an Olympian task to estimate the volumes of the population mountains!
The more salient features of these mountains, most easily felt by readers, are the heights of the peak columns, the sprawl of the cities, and the general form of the mass of columns. The volume of the mountain is one of the tougher things to see. Similarly, the taller 3D columns hide what's behind them, and you'd need to spin and rotate the map to really get a good feel.
Here is the contrast between Paris and London, with comparable population sizes. You can see that the population in Paris (and by extension, France) is much more concentrated than in the U.K. This difference is a surprise to me.
Some of the other mountains, especially those in India and China, look a bit odd to me, which leads me to wonder about the source of the data. This project has a very great set of footnotes that not only point to the source of the data but also a discussion of its limitations, including the possibility of inaccuracies in places like India and China.
Check out Population Mountains!
I very much enjoyed reading The Chronicle's article on "education deserts" in the U.S., defined as places where there are no public colleges within reach of potential students.
In particular, the data visualization deployed to illustrate the story is superb. For example, this map shows 1,500 colleges and their "catchment areas" defined as places within 60 minutes' drive.
It does a great job walking through the logic of the analysis (even if the logic may not totally convince - more below). The areas not within reach of these 1,500 colleges are labeled "deserts". They then take Census data and look at the adult population in those deserts:
This leads to an analysis of the racial composition of the people living in these "deserts". We now arrive at the only chart in the sequence that disappoints. It is a pair of pie charts:
The color scheme makes it hard to pair up the pie slices. The focus of the chart should be on the over or under representation of races in education deserts relative to the U.S. average. The challenge of this dataset is the coexistence of one large number, and many small numbers.
Here is one solution:
The Chronicle made a commendable effort to describe this social issue. But the analysis has a lot of built-in assumptions. Readers should look at the following list and see if you agree with the assumptions:
At the end of the piece, the author creates a "story time" moment. Story time is when you are served a bunch of data or analyses, and then when you are about to doze off, the analyst calls story time, and starts making conclusions that stray from the data just served!
Story time starts with the following sentence: "What would it take to make sure that distance doesn’t prevent students from obtaining a college degree? "
The analysis provided has nowhere shown that distance has prevented students from obtaining a college degree. We haven't seen anything that says that people living in the "education deserts" have fewer college degrees. We don't know that distance is the reason why people in those areas don't go to college (if true) - what about poverty? We don't know if 60 minutes is the hurdle that causes people not to go to college (if true).We know the number of adults living in those neighborhoods but not the number of potential students.
The data only showed two things: 1) which areas of the country are not within 60 minutes' driving of the subset of public colleges under consideration, 2) the number of adults living in those Census blocks.
So we have a case where the analysis is incomplete but the visualization of the analysis is superb. So in our Trifecta analysis, this chart poses a nice question and has nice graphics but the use of data can be improved. (Type QV)
The animation grabs your attention. I'm not convinced by the right side of the color scale in which the white comes after the red. I'd want the white in the middle then the yellow and finally the red.
In order to understand this map and the other map in the article, the reader has to bring a lot of domain knowledge. This visualization isn't easy to decipher for a layperson.
Here I put the two animations side by side:
The area being depicted is the same. One map shows "ground deformation" while the other shows "subsidence". Are they the same? What's the connection between the two concepts (if any)? On a further look, one notices that the time window for the two charts differ: the right map is clearly labeled 1995 to 2003 but there is no corresponding label on the left map. To find the time window of the left map, the reader must inspect the little graph on the top right (1996 to 2000).
This means the time window of the left map is a subset of the time window of the right map. The left map shows a sinusoidal curve that moves up and down rhythmically as the ground shifts. How should I interpret the right map? The periodicity is no longer there despite this map illustrating a longer time window. The scale on the right map is twice the magnitude of the left map. Maybe on average the ground level is collapsing? If that were true, shouldn't the sinusoidal curve drift downward over time?
I also wonder how this curve is related to the map it accompanies. The curve looks like a model - perfect oscillations of a fixed period and amplitude. But one suppose the amount of fluctuation should vary by location, based on geographical features and human activities.
The author of the article points to both natural and human impacts on the ground level. Humans affect this by water usage and also by management policies dictated by law. It would be very helpful to have a map that sheds light on the causes of the movements.
The Thai cave rescue was a great story with a happy ending. It's also one that lends itself to visualization. A good visualization can explain the rescue operation more efficiently than mere words.
A good visual should bring out the most salient features of the story, such as:
In terms of what made the rescue challenging, some of the following are pertinent:
There were many attempts at visualizing the Thai cave rescue operation. The best ones I saw were: BBC (here, here), The New York Times (here), South China Morning Post (here) and Straits Times (here). It turns out each of these efforts focuses on some of the aspects above, and you have to look at all of them to get the full picture.
BBC's coverage began with a top-down view of the route of the rescue, which seems to be the most popular view adopted by news organizations. This is easily understood because of the standard map aesthetic.
The BBC map is missing a smaller map of Thailand to place this in a geographical context.
While this map provides basic information, it doesn't address many of the elements that make the Thai cave rescue story compelling. In particular, human beings are missing from this visualization. The focus is on the actions ("diving", "standing"). This perspective also does not address the water level, the key underlying environmental factor.
Another popular perspective is the sideway cross-section. The Straits Times has one:
The excerpt of the infographic presents a nice collection of data that show the effort of the rescue. The sideway cross-sectional section shows the distance and the up-and-down nature of the journey, the level of flooding along the route, plus a bit about the headroom available at different points. Most of these diagrams bring out the "horizontal" distance but somehow ignore the "vertical" distance. One possibility is that the real trajectory is curvy - but if we can straighten out the horizontal, we should be able to straighten out the vertical too.
The NYT article gives a more detailed view of the same perspective, with annotations that describe key moments along the rescue route.
If, like me, you like to place humans into this picture, then you have to go back to the Straits Times, where they have an expanded version of the sideway cross-section.
This is probably my most favorite single visualization of the rescue operation.
There are better cartoons of the specific diving actions, though. For example, the BBC has this visual that shows the particularly narrow part of the route, corresponding to the circular inset in the Straits Times version above.
NYT also has a set of cartoons. Here's one:
There is one perspective that curiously has been underserved in all of the visualizations - this is the first-person perspective. Imagine the rescuer (or the kids) navigating the rescue route. It's a cross-section from the front, not from the side.
Various publications try to address this by augmenting the top-down route view with sporadic cross-sectional diagrams. Recall the first map we showed from the BBC. On the right column are little annotations of this type (here):
I picked out this part of the map because it shows that the little human figure serves two potentially conflicting purposes. In the bottom diagram, the figurine shows that there is limited headroom in this part of the cave, plus the actual position of the figurine on the ledge conveys information about where the kids were. However, on the top cross-section, the location of the figure conveys no information; the only purpose of the human figure is to show how tall the cave is at that site.
The South China Morning Post (here - site appears to be down when I wrote this) has this wonderful animation of how the shape of the headroom changed as they navigated the route. Please visit their page to see the full animation. Here are two screenshots:
This little clip adds a lot to the story! It'd be even better if the horizontal timeline at the bottom is replaced by the top-down route map.
Thank you all the various dataviz teams for these great efforts.
The U.S. was primarily an agrarian economy in 1997, if you believe your eyes.
Here is a poorly-scaled bubble map:
New Yorkers have all become Citibikers, if you believe what you see.
Last week, I saw a nice dot map embedded inside this New York Times Graphics feature on the destruction of the Syrian city of Raqqa.
Before I conclude that the destruction was broadly felt, I'd like to check the scale on the map to make sure it doesn't have the problem seen above. What is helpful here is the scale provided on the map itself.
That line segment representing a quarter mile fits about 15 dots side by side. Then, I found out that a Manhattan avenue (longer) block is roughly a quarter mile. That means the map places about 15 buildings to an avenue block. In my experience, that sounds about right: I'd imagine 15-20 buildings per block.
So I'm convinced that the designer chose an appropriate scale to display the data. It is actually true that the entire city of Raqqa was pretty much annihilated by U.S. bombs.
Here is how the Economist sees it - geographically speaking.
In the Trifecta Checkup analysis, one of the questions to ask is "What does the visual say?" and with respect to the question being asked.
The question is how much has the problem of human waste in SF grew from 2011 to 2017.
What does the visual say?
The number of complaints about human waste has increased from 2011 to 2014 to 2017.
The areas where there are complaints about human waste expanded.
The worst areas are around downtown, and that has not changed during this period of time.
Now, what does the visual not say?
Let's make a list:
In other words, the set of maps provides almost all no information about the excrement problem in San Francisco.
After you finish working, go back and ask what the visual is saying about the question you're trying to address!
As a reference, I found this map of the population density in San Francisco (link):
Let's describe what's going on here.
The map plots cities (N = 2,562) in the U.S. Each city is represented by a bubble. The color of the bubble ranges from purple to green, encoding the percentile ranking based on the amount of credit card debt that was paid down by consumers. Purple represents 1st percentile, the lowest amount of paydown while green represents 99th percentile, the highest amount of paydown.
The bubble size is encoding exactly the same data, apparently in a coarser gradation. The more purple the color, the smaller the bubble. The more green the color, the larger the bubble.
The design decisions are baffling.
Purple is more noticeable than the green, but signifies the less important cities, with the lesser paydowns.
With over 2,500 bubbles crowding onto the map, over-plotting is inevitable. The purple bubbles are printed last, dominating the attention but those are the least important cities (1st percentile). The green bubbles, despite being larger, lie underneath the smaller, purple bubbles.
What might be the message of this chart? Our best guess is: the map explores the regional variation in the paydown rate of credit card debt.
The analyst provides all the data beneath the map.
From this table, we learn that the ranking is not based on total amount of debt paydown, but the amount of paydown per household in each city (last column). That makes sense.
Shouldn't it be ranked by the paydown rate instead of the per-household number? Divide the "Total Credit Card Paydown by City" by "Total Credit Card Debt Q1 2018" should yield the paydown rate. Surprise! This formula yields a column entirely consisting of 4.16%.
What does this mean? They applied the national paydown rate of 4.16% to every one of 2,562 cities in the country. If they had plotted the paydown rate, every city would attain the same color. To create "variability," they plotted the per-household debt paydown amount. Said differently, the color scale encodes not credit card paydown as asserted but amount of credit card debt per household by city.
Here is a scatter plot of the credit card amount against the paydown amount.
A perfect alignment!
This credit card debt paydown map is an example of a QDV chart, in which there isn't a clear question, there is almost no data, and the visual contains several flaws. (See our Trifecta checkup guide.) We are presented 2,562 ways of saying the same thing: 4.16%.
P.S. [6/22/2018] Added scatter plot, and cleaned up some language.
The two variables plotted are the wealth of each province (measured by GDP per capita) and the level of Internet penetration. The designer made the following choices:
If we apply the self-sufficiency test (i.e. by removing the printed data from the chart), it's immediately clear that the visual elements convey zero information about Internet penetration. This is a serious problem for a chart about the "digital silkroad"!
If those two variables are chosen, it would seem appropriate to convey to readers the correlation between the two variables. The following sketch is focused on surfacing the correlation.
(Click on the image to see it in full.) Here is the top of the graphic:
The individual maps are not strictly necessary. Just placing provincial names onto the grid is enough, because regional pattern isn't salient here.
The Internet penetration data were grouped into five categories as well, putting it on equal footing as GDP per capita.
Stolen drugs is a problem at federal VA hospitals according to the following map.
VISUAL - Pass by a whisker. The chosen visual form of a map is standard for geographic data although the map snatches story-telling from our claws, just as people steal drugs from hospitals. Looking at the map, it's not clear what the message is. Is there one?
The 50 states plus DC are placed into five groups based on the reported number of incidents of theft. From the headline, it appears that the journalist conducted a Top 2 Box analysis, defining "significant" losses of drugs as 300 incidents or more. The visual design ignores this definition of "significance."
DATA - Fail. The map tells us where the VA hospitals are located. It doesn't tell us which states are most egregious in drug theft. To learn that, we need to compute a rate, based on the number of hospitals or patients or the amount of spending on drugs.
Looking more carefully, it's not clear they used a Top 2 Box analysis either. I counted seven states with the highest level of theft, followed by another seven states with the second highest level of theft. So the cutoff of twelve states awkwardly lands in between the two levels.
QUESTION - Fail. Drug theft from hospitals is an interesting topic but the graphic does not provide a good answer to the question.
Even if we don't have data to compute a rate, the chart is a bit better if proportions are emphasized, rather than counts.
The proportions are most easily understood from the base of four quarters making the whole. The first group is just over a quarter; the second group is exactly a quarter. The third group plus the first group roughly make up a half. The fourth and fifth groups together almost fills out a quarter.
In the original map, we are told about at least 400 incidents of theft in Texas but given no context to interpret this statistic. What proportion of the total thefts occur in Texas?