On Twitter, someone pointed me to the following map of journalists who were killed between 1993 and 2015.
I wasn't sure if the person who posted this liked or disliked this graphic. We see a clear metaphor of gunshots and bloodshed. But in delivering the metaphor, a number of things are sacrificed:
the number of deaths is hard to read
the location of deaths is distorted, both in large countries (Russia) where the deaths are too concentrated, and in small countries (Philippines) where the deaths are too dispersed
despite the use of a country-level map, it is hard to learn the deaths by country
The Committee to Protect Journalists (CPJ), which publishes the data, used a more conventional choropleth map, which was reproduced and enhanced by Global Post:
They added country names and death counts via a list at the bottom. There is also now a color scale. (Note the different sets of dates.)
In a Trifecta Checkup, I would give this effort a Type DV. While the map is competently produced, it doesn't get at the meat of the data. In addition, these raw counts of deaths do not reveal much about the level of risk experienced by journalists working in different countries.
The limitation of the map can be seen in the following heatmap:
While this is not a definitive visualization of the dataset, I use this heatmap to highlight the trouble with hiding the time dimension. Deaths are correlated with particular events that occurred at particular times.
Iraq is far and away the most dangerous but only after the Iraq War and primarily during the War and its immediate aftermath. Similarly, it is perfectly safe to work in Syria until the last few years.
A journalist can use this heatmap as a blueprint, and start annotating it with various events that are causes of heightened deaths.
Now the real question in this dataset is the risk faced by journalists in different countries. The death counts give a rather obvious and thus not so interesting answer: more journalists are killed in war zones.
A denominator is missing. How many journalists are working in the respective countries? How many non-journalists died in the same countries?
Also, separating out the causes of death can be insightful.
A friend asked me to comment on the following chart:
Specifically, he points out the challenge of trying to convey both absolute and relative metrics for a given data series.
This chart presents projections of growth in the U.S. mobile display advertising market. It is specifically pointing out that the programmatic segment of this market is growing rapidly (visualized as the black columns).
The blue and red lines then make a mess of the situation. Even though both of these lines espress percentages, they report to different scales. The red line represents growth rates while the blue line represents share of market.
Both of these metrics are relative metrics useful for interpreting the trend. The growth rates (red) interpret the dollar values on the basis of past values while the market shares (blue) interpret the dollar values on the basis of the total market.
It is rarely a good idea to have many scales on the same canvas. Looking at the blue line for the moment, it is shocking to find that the values depicted almost doubled from one end to the other end. The blue line appears much too gentle.
In the makeover, I expressed everything in the same scale (billions of dollars). I used side-by-side charts (small multiples) to isolate each trend that is found in the data. I allow readers to look at each individual segment of the market, and then examine how the individual trends affect the total market.
One might argue that the stacked column chart by itself is sufficient. If there is a severe space limitation, I'd let go of the other two panels. However, having those panels makes the messages easier to obtain. This is particularly true of the steady growth assumption behind the programmatic spending trend (the orange columns).
Reader Jeffrey S. saw this graphic inside a Dec 2 tweet from the National Weather Service (NWS) in Phoenix, Arizona.
In a Trifecta checkup (link), I'd classify this as Type QV.
The problems with the visual design are numerous and legendary. The column chart where the heights of the columns are not proportional to the data. The unnecessary 3D effect. The lack of self-sufficiency (link). The distracting gridlines. The confusion of year labels that do not increment from left to right.
The more hidden but more serious issue with this chart is the framing of the question. The main message of the original chart is that the last two years have been the hottest two years in a long time. But it is difficult for readers to know if the differences of less than one degree from the first to the last column are meaningful since we are not shown the variability of the time series.
The green line makes an assertion that 1981 to 2010 represents the "normal". It is unclear why that period is normal and the years from 2011-5 are abnormal. Maybe they are using the word normal in a purely technical way to mean "average." If true, it is better to just say average.
*** For this data, I prefer to see the entire time series from 1981 to 2015, which allows readers to judge the variability as well as the trending of the average temperatures. In the following chart, I also label the five years with the highest average temperatures.
My friend Alberto Cairo said it best: if you see bullshit, say "bullshit!"
He was very incensed by this egregious "infographic": (link to his post)
Emily Schuch provided a re-visualization:
The new version provides a much richer story of how Planned Parenthood has shifted priorities over the last few years.
It also exposed what the AUL (American United for Life) organization distorted the story.
The designer extracted only two of the lines, thus readers do not see that the category of services that has really replaced the loss of cancer screening was STI/STD testing and treatment. This is a bit ironic given the other story that has circulated this week - the big jump in STD among Americans (link).
Then, the designer placed the two lines on dual axes, which is a dead giveaway that something awful lies beneath.
Further, this designer dumped the data from intervening years, and drew a straight line from the first to the last year. The straight arrow misleads by pretending that there has been a linear trend, and that it would go on forever.
But the masterstroke is in the treatment of the axes. Let's look at the axes, one at a time:
The horizontal axis: Let me recap. The designer dumped all but the starting and ending years, and drew a straight line between the endpoints. While the data are no longer there, the axis labels are retained. So, our attention is drawn to an area of the chart that is void of data.
The vertical axes: Let me recap. The designer has two series of data with the same units (number of people served) and decided to plot each series on a different scale with dual axes. But readers are not supposed to notice the scales, so they do not show up on the chart.
To summarize, where there are no data, we have a set of functionless labels; where labels are needed to differentiate the scales, we have no axes.
This is a tried-and-true tactic employed by propagandists. The egregious chart brings back some bad memories.
It's gratifying to live through the incredible rise of statistics as a discipline. In a recent report by the American Statistical Association (ASA), we learned that enrollment at all levels (bachelor, master and doctorate) has exploded in the last 5-10 years, as "Big Data" gather momentum.
But my sense of pride takes a hit while looking at the charts that appear in the report. These graphs demonstrate again the hegemony of Excel defaults in the world of data visualization.
Here are all five charts organized in a panel:
Chart #5 (bottom right) catches the eye because it is the only chart with two lines instead of three. You then flip to the prior page to find the legend. The legend tells you the red line is Bachelor and the green line is PhD. That seems wrong, unless biostats departments do not give out Master degrees.
This is confirmed by chart #2, where we find the blue line (Master) hugging zero.
Presumably the designer removed the blue line from chart #5 because the low counts mean that it fluctuates wildly between 0 and 100 percent and so disrupts the visual design. But the designer forgets to tell readers why the blue line is missing.
It turns out the article itself contradicts all of the above:
For biostatistics degrees, for which NCES started providing data specifically in 1992, master’s degrees track the overall increase from 2010– 2014 at 47%...The number of undergraduate degrees in biostatistics remains below 30.
In other words, the legend is mislabeled. The blue line represents Bachelor while the red line, Master. (The error was noticed after the print edition went out because the online version has the correct legend.)
There is another mystery. Charts #2, #3, and #5, all dealing with biostats, have time starting from 1992, while Charts #1 and #4 starts from 1987. The charts aren't lined up in a way that would allow comparisons across time.
Similarly, the vertical scale of each chart is different (aside from Charts #3 and #4). This design choice impairs comparison across charts.
In the article, it is explained that 1992 was when the agency started collecting data about biostatistics degrees. Between 1987 and 1992, were there no biostatistics majors? were biostatistics majors lumped into the counts of statistics majors? It's hard to tell.
While Excel is a powerful tool that has served our community well, its flexibility is often a source of errors. The remedy to this problem is to invest ample time in over-riding pretty much every default decision in the system.
This chart, a reproduction of Chart #1 above, was entirely produced in Excel.
Surveys generate a lot of data. And, if you have used a survey vendor, you know they generate a ton of charts.
I was in Germany to attend the Data Meets Viz workshop organized by Antony Unwin. Paul and Sascha from Zeit Online presented some of their work at the German publication, and I was highly impressed by this effort to visualize survey results. (I hope the link works for you. I found that the "scroll" fails on some platforms.)
The survey questions attempted to assess the gap between West and East Germans 25 years after reunification.
The best feature of this presentation is the maintenance of one chart form throughout. This is the general format:
The survey asks whether working mothers is a good thing or not. They choose to plot how the percent agreeing that working mothers is good changes over time. The blue line represents the East German average and the yellow line the West German average. There is a big gap in attitude between the two sides on this issue although both regions have experienced an increase in acceptance of working mothers over time.
All the other lines in the background indicate different subgroups of interest. These subgroups are accessible via the tabs on top. They include gender, education level, and age.
The little red "i" conceals some text explaining the insight from this chart.
Hovering over the "Men" tab leads to the following visual:
Both lines for men sit under the respective average but the shape is roughly the same. (Clicking on the tab highlights the two lines for men while moving the aggregate lines to the background.)
The Zeit team really does an amazing job keeping this chart clean while still answering a variety of questions.
They did make an important choice: not to put every number on this chart. We don't see the percent disagreeing or those who are ambivalent or chose not to answer the question.
Like I said before, what makes this set of charts is the seamless transitions between one question and the next. Every question is given the same graphical treatment. This eliminates learning time going from one chart to the next.
Here is one using a Likert scale, and accordingly, the vertical axis goes from 1 to 7. They plotted the average score within each subgroup and the overall average:
Here is one where they combined the top categories into a "Bottom 2 Box" type metric:
Finally, I appreciate the nice touch of adding tooltips to the series of dots used to aid navigation.
The theme of the workshop was interactive graphics. This effort by the Zeit team is one of the best I have seen. Market researchers take note!
I was enjoying this yummy piece of German cake the other day.
I started flipping through the recent issue of Stern magazine, and came across this German pie chart that probably presents results from a poll. In particular, it draws attention to changes between the current and the prior poll, I think.
When a pie chart is used to handle data with more than three or four categories, we frequently encounter objects with a rainbow of colors, and a jumble of text labels. In this case, the order of the labels in the legend doesn't match the order of the pie sectors.
In addition, such pie charts almost always fail the self-sufficiency test. All of the data are printed on the chart itself, inviting readers to ignore the visual presentation.
A bumps-style chart works well for this type of data. I tried something different here:
The challenge is to elegantly handle the current data plus the change from the last poll.
Boston Globe has an eye-catching full-page poster about Hillary's current endorsements among 115 important New Hampshire people. (link) This is an excerpt of the poster:
Each of the 115 people are represented by a circle, with their names, titles and reasons for importance written below. The circles are colored according to the following legend:
I like the concept behind the chart, identifying the important endorsements and tabulating their current positions.
A tiny addition to the legend would much improve the readability of the whole poster:
I also wonder how the people are ordered on the chart. They are certainly not alphabetical. Is it geographical? By presumed influence? It's not clear.
Explaining the order will improve our comprehension. Let's assume the circles are ordered with the most influential people at the top. This knowledge immediately alters our perception of the chart. We now can see that Hillary has done well pretty evenly across the spectrum while O'Malley's two endorsement are in the bottom half of the ledger.
If this is indeed the ordering criterion, all the chart needs is an annotation to let readers know.