This chart is in the Sept/Oct edition of Harvard Magazine:
Pretty standard fare. It even is Tufte-sque in the sparing use of axes, labels, and other non-data-ink.
Does it bug you how much work you need to do to understand this chart?
Here is the junkchart version:
In the accompanying article, the journalist declared that student progress on NAEP tests came to a virtual standstill, and this version highlights the drop in performance between the two periods, as measured by these "gain scores."
The clarity is achieved through proximity as well as slopes.
The column chart form has a number of deficiencies when used to illustrate this data. It requires too many colors. It induces involuntary head-shaking.
Most unforgivingly, it leaves us with a puzzle: does the absence of a column means no progress or unknown?
PS. The inclusion of 2009 on both time periods is probably an editorial oversight.
My friend Tonny M. sent me a tip to two pretty nice charts depicting the state of U.S. healthcare spending (link).
The first shows U.S. as an outlier:
This chart is a replica of the Lane Kenworthy chart, with some added details, that I have praised here before. This chart remains one of the most impactful charts I have seen. The added time-series details allow us to see a divergence from about 1980.
The second chart shows the inequity of healthcare spending among Americans. The top 10% spenders consume about 6.5 times as much as the average while the bottom 16% do not spend anything at all.
This chart form is standard for depicting imbalance in scientific publications. But the general public finds this chart difficult to interpret, mostly because both axes operate on a cumulative scale. Further, encoding inequity in the bend of the curve is not particularly intuitive.
So I tried out some other possibilities. Both alternatives are based on incremental, not cumulative, metrics. I take the spend of the individual ten groups (deciles) and work with those dollars. Also, I provide a reference point, which is the level of spend of each decile if the spend were to be distributed evenly among all ten groups.
The first alternative depicts the "excess" or "deficient" spend as column segments.
The second alternative shows the level of excess or deficient spending as slopes of lines. I am aiming for a bit more drama here.
Now, the interpretation of this chart is not simple. Since illness is not evenly spread out within the population, this distribution might just be the normal state of affairs. Nevertheless, this pattern can also result from the top spenders purchasing very expensive experimental treatments with little chance of success, for example.
A reader didn't like this graphic in the Wall Street Journal:
One could turn every panel into a bar chart but unfortunately, the situation does not improve much. Some charts just can't be fixed by altering the visual design.
The chart is frustrating to read: typically, colors are used to signify objects that should be compared. Focus on the brown wedges for a moment: Basic EDA 46%, Data cleaning 31%, Machine learning 27%, etc. Those are proportions of respondents who said they spent 1 to 3 hours a day on the respective tasks. That is one weird way of describing time use. The people who spent 1 to 3 hours a day on EDA do not necessarily overlap with those who spent 1 to 3 hours a day on data cleaning. In addition, there is no summation formula that lets us know how any individual, or the average data scientist, spends his or her time during a typical day.
But none of this is the graphics designer's fault.
The trouble with the chart is in the D corner of the Trifecta checkup. The survey question was poorly posed. The data came from a study by O'Reilly Media. They asked questions of this form:
How much time did you spend on basic exploratory data analysis on average?
A. Less than 1 hour a week B. 1 to 4 hours a week C. 1 to 3 hours a day D. 4 or more hours a day
It is not obvious that those four levels are mutually exhaustive. In fact, they aren't. One hour a day for five working days is a total of 5 hours a week. Those who spent between 4 and 5 hours a week have nowhere to go.
Further, if one had access to individual responses, it's likely that many respondents either worked too many hours or too few hours.
The panels are separate questions which bear no relationship to each other, even though the tasks are clearly related by the fact that there are only so many working hours in a day.
To fix this chart, one must first fix the data. To fix the data, one must ask the right questions.
Scott Klein's team at Propublica published a worthy news application, called "Hell and High Water" (link) I took some time taking in the experience. It's a project that needs room to breathe.
The setting is Houston Texas, and the subject is what happens when the next big hurricane hits the region. The reference point was Hurricane Ike and Galveston in 2008.
This image shows the depth of flooding at the height of the disaster in 2008.
The app takes readers through multiple scenarios. This next image depicts what would happen (according to simulations) if something similar to Ike plus 15 percent stronger winds hits Galveston.
One can also speculate about what might happen if the so-called "Mid Bay" solution is implemented:
This solution is estimated to cost about $3 billion.
I am drawn to this project because the designers liberally use some things I praised in my summer talk at the Data Meets Viz conference in Germany.
Here is an example of hover-overs used to annotate text. (My mouse is on the words "Nassau Bay" at the bottom of the paragraph. Much of the Bay would be submerged at the height of this scenario.)
The design has a keen awareness of foreground/background issues. The map uses sparse static labels, indicating the most important landmarks. All other labels are hidden unless the reader hovers over specific words in the text.
I think plotting population density would have been more impactful. With the current set of labels, the perspective is focused on business and institutional impact. I think there is a missed opportunity to highlight the human impact. This can be achieved by coding population density into the map colors. I believe the colors on the map currently represent terrain.
This is a successful interactive project. The technical feats are impressive (read more about them here). A lot of research went into the articles; huge amounts of details are included in the maps. A narrative flow was carefully constructed, and the linkage between the text and the graphics is among the best I've seen.
I spent time with the family in California, wiping out any chance of a white Christmas, although I hear that the probability would have been miniscule even had I stayed.
I did come across a graphic that tried to drive the point home, via NOAA.
Unfortunately, this reminded me a little of the controversial Florida gun-deaths chart (see here):
In this graphic, the designer played with the up-is-bigger convention, drawing some loud dissent.
Begin with the question addressed by the NOAA graphic: which parts of the country has the highest likelihood of having a white Christmas? My first instinct is to look at the darkest regions, which ironically match the places with the smallest chance of snow.
Surely, the designer's idea is to play with white Christmas. But I am not liking the result.
Then, I happen upon an older version (2012) of this map, also done by NOAA. (See this Washington Post blog for example.)
There are a number of design choices that make this version more effective.
The use of an unrelated brown color to cordon off the bottom category (0-10%) is a great idea.
Similarly, the play of hue and shade allows readers to see the data at multiple levels, first at the top level of more likely, less likely, and not likely, and then at the more detailed level of 10 categories.
Finally, there is no whiteness inside the US boundary. The top category is the lightest shade of purple, not exactly white. In the 2015 version above, the white of the snowy regions is not differentiated from the white of the Great Lakes.
I am still not convinced about the inversion of the darker-is-larger convention though. How about you?
My friend Alberto Cairo said it best: if you see bullshit, say "bullshit!"
He was very incensed by this egregious "infographic": (link to his post)
Emily Schuch provided a re-visualization:
The new version provides a much richer story of how Planned Parenthood has shifted priorities over the last few years.
It also exposed what the AUL (American United for Life) organization distorted the story.
The designer extracted only two of the lines, thus readers do not see that the category of services that has really replaced the loss of cancer screening was STI/STD testing and treatment. This is a bit ironic given the other story that has circulated this week - the big jump in STD among Americans (link).
Then, the designer placed the two lines on dual axes, which is a dead giveaway that something awful lies beneath.
Further, this designer dumped the data from intervening years, and drew a straight line from the first to the last year. The straight arrow misleads by pretending that there has been a linear trend, and that it would go on forever.
But the masterstroke is in the treatment of the axes. Let's look at the axes, one at a time:
The horizontal axis: Let me recap. The designer dumped all but the starting and ending years, and drew a straight line between the endpoints. While the data are no longer there, the axis labels are retained. So, our attention is drawn to an area of the chart that is void of data.
The vertical axes: Let me recap. The designer has two series of data with the same units (number of people served) and decided to plot each series on a different scale with dual axes. But readers are not supposed to notice the scales, so they do not show up on the chart.
To summarize, where there are no data, we have a set of functionless labels; where labels are needed to differentiate the scales, we have no axes.
This is a tried-and-true tactic employed by propagandists. The egregious chart brings back some bad memories.
This chart, which I found flipping through Stern magazine in Germany, accomplishes one important goal. It makes me stop flipping, and look.
The chart presents a point of view that is refreshing. The Airbus A320 is a true collaborative effort. The chart presents a good amount of information efficiently. Reminds me of diagrams in instruction manuals for building airplane models.
It is in essence a map. And as with maps, it has a built-in bias. The size of a part is not proportional to its importance or value. So, one issue with this diagram is it draws attention to large parts with uncomplicated shapes.
One way to address this is to use an informative legend. Notice that the map up top takes up a lot of space while serving little purpose. Instead, one can use a bar chart with a colored bar for each country. This bar chart allows one to add an extra measure. For example, the proportion of value accounted for by each country.
European readers: I wonder if there is a standard color scheme for different countries. What do you think of their choice of color?
Surveys generate a lot of data. And, if you have used a survey vendor, you know they generate a ton of charts.
I was in Germany to attend the Data Meets Viz workshop organized by Antony Unwin. Paul and Sascha from Zeit Online presented some of their work at the German publication, and I was highly impressed by this effort to visualize survey results. (I hope the link works for you. I found that the "scroll" fails on some platforms.)
The survey questions attempted to assess the gap between West and East Germans 25 years after reunification.
The best feature of this presentation is the maintenance of one chart form throughout. This is the general format:
The survey asks whether working mothers is a good thing or not. They choose to plot how the percent agreeing that working mothers is good changes over time. The blue line represents the East German average and the yellow line the West German average. There is a big gap in attitude between the two sides on this issue although both regions have experienced an increase in acceptance of working mothers over time.
All the other lines in the background indicate different subgroups of interest. These subgroups are accessible via the tabs on top. They include gender, education level, and age.
The little red "i" conceals some text explaining the insight from this chart.
Hovering over the "Men" tab leads to the following visual:
Both lines for men sit under the respective average but the shape is roughly the same. (Clicking on the tab highlights the two lines for men while moving the aggregate lines to the background.)
The Zeit team really does an amazing job keeping this chart clean while still answering a variety of questions.
They did make an important choice: not to put every number on this chart. We don't see the percent disagreeing or those who are ambivalent or chose not to answer the question.
Like I said before, what makes this set of charts is the seamless transitions between one question and the next. Every question is given the same graphical treatment. This eliminates learning time going from one chart to the next.
Here is one using a Likert scale, and accordingly, the vertical axis goes from 1 to 7. They plotted the average score within each subgroup and the overall average:
Here is one where they combined the top categories into a "Bottom 2 Box" type metric:
Finally, I appreciate the nice touch of adding tooltips to the series of dots used to aid navigation.
The theme of the workshop was interactive graphics. This effort by the Zeit team is one of the best I have seen. Market researchers take note!
First, I saw Alberto tweet his design for the Wall Street Journal (below is the English version):
The yellow space is the size of the smallest "livable" apartment in Hong Kong, known as the "mosquito" apartment. Livability is defined by the real estate developers.
If you've lived in a tropical area like Hong Kong, you'll understand the obsession with mosquitoes. The itching for days! The sneaky little things that suck your blood!
In Manhattan, it seems like we prefer saying the shoebox apartment. By comparison, it's not that scary. It's larger in size too.
The graphic is fantastic as it offers various comparisons of everyday spaces, like a NYC parking space and a basketball court, for which many Americans have some sense of their proportion.
This chart leads me down an unexpected path. I found a set of very powerful photos, commissioned by a humanitarian association in Hong Kong. Overwhelming. Here's one:
Yes, that is the entire living space for this family. All of forty square feet.
This article describes the project, as well as links to a number of other equally astounding photos.
These photos are unfair competition for any graphic designer.
Finally, I came across an inspiring, ingenious design. Gary Chang, who is an architect in Hong Kong, created his own apartment (344 square feet, almost 10 times larger than that in the photo, and twice as large as the mosquito apartment) in this amazing, space-saving design.
Through a series of movable walls, and beds, his apartment can be configured in 24 different ways. This is a small multiples layout!
Here is an article about his achievement, together with a video tour of his home. Not to be missed. It defines making something out of nothing.
Here is a little graphic describing certain transformations: