« December 2015 | Main | February 2016 »

How to print cash, graphically

Twitter user @glennrice called out a "journalist" for producing the following chart:


You can't say the Columbia Heartbeat site doesn't deserve a beating over this graph. I don't recognize the software but my guess is one of these business intelligence (BI) tools that produce canned reports with a button click.

Until I read the article, I kept thinking that there are several overlapping lines being plotted. But it's really a 3D plus color effect!

Wait there's more. This software treats years as categories rather than a continuous number. So it made equal-sized intervals of 2 years, 1 year, 2 years, and 8 years. I am still not sure how this happened because the data set given at the bottom of the article contains annual data.

The y-axis labels, the gridlines, the acronym in the chart title, the unnecessary invocation of start-at-zero, etc. almost make this feel like a parody.


Aside from visual design issues, I am not liking the analysis either. The claim is that taxes have been increasing every year in Columbia, Missouri, and that the additional revenue ended up sitting in banks as cash. 

We need to see a number of other data series in order to accept this conclusion. What was the growth in tax revenues relative to the increase in cash? What was the growth in population in Columbia during this period? Did the cash holding per capita increase or decrease? What were the changes in expenditure on schools, public works, etc.?

This is a Type DV chart. There is an interesting question being asked but the analysis must be sharpened and the graphing software must be upgraded asap.

PS. On second thought, I think the time axis might be deliberately distorted. Judging from the slope of the line, the cumulative increase in the last 8 years equals the increase in past two-year increments so if the proper scale is used, the line would flatten out significantly, demolishing the thesis of the article. Thus, it is a case of printing cash, graphically.



Where but when and why: deaths of journalism

On Twitter, someone pointed me to the following map of journalists who were killed between 1993 and 2015.


I wasn't sure if the person who posted this liked or disliked this graphic. We see a clear metaphor of gunshots and bloodshed. But in delivering the metaphor, a number of things are sacrificed:

  • the number of deaths is hard to read
  • the location of deaths is distorted, both in large countries (Russia) where the deaths are too concentrated, and in small countries (Philippines) where the deaths are too dispersed
  • despite the use of a country-level map, it is hard to learn the deaths by country

The Committee to Protect Journalists (CPJ), which publishes the data, used a more conventional choropleth map, which was reproduced and enhanced by Global Post:


They added country names and death counts via a list at the bottom. There is also now a color scale. (Note the different sets of dates.)


In a Trifecta Checkup, I would give this effort a Type DV. While the map is competently produced, it doesn't get at the meat of the data. In addition, these raw counts of deaths do not reveal much about the level of risk experienced by journalists working in different countries.

The limitation of the map can be seen in the following heatmap:


While this is not a definitive visualization of the dataset, I use this heatmap to highlight the trouble with hiding the time dimension. Deaths are correlated with particular events that occurred at particular times.

Iraq is far and away the most dangerous but only after the Iraq War and primarily during the War and its immediate aftermath. Similarly, it is perfectly safe to work in Syria until the last few years.

A journalist can use this heatmap as a blueprint, and start annotating it with various events that are causes of heightened deaths.


Now the real question in this dataset is the risk faced by journalists in different countries. The death counts give a rather obvious and thus not so interesting answer: more journalists are killed in war zones.

A denominator is missing. How many journalists are working in the respective countries? How many non-journalists died in the same countries?

Also, separating out the causes of death can be insightful.

Treating absolute and relative data simultaneously

A friend asked me to comment on the following chart:


Specifically, he points out the challenge of trying to convey both absolute and relative metrics for a given data series.

This chart presents projections of growth in the U.S. mobile display advertising market. It is specifically pointing out that the programmatic segment of this market is growing rapidly (visualized as the black columns).

The blue and red lines then make a mess of the situation. Even though both of these lines espress percentages, they report to different scales. The red line represents growth rates while the blue line represents share of market.

Both of these metrics are relative metrics useful for interpreting the trend. The growth rates (red) interpret the dollar values on the basis of past values while the market shares (blue) interpret the dollar values on the basis of the total market.

It is rarely a good idea to have many scales on the same canvas. Looking at the blue line for the moment, it is shocking to find that the values depicted almost doubled from one end to the other end. The blue line appears much too gentle.


In the makeover, I expressed everything in the same scale (billions of dollars). I used side-by-side charts (small multiples) to isolate each trend that is found in the data. I allow readers to look at each individual segment of the market, and then examine how the individual trends affect the total market.


One might argue that the stacked column chart by itself is sufficient. If there is a severe space limitation, I'd let go of the other two panels. However, having those panels makes the messages easier to obtain. This is particularly true of the steady growth assumption behind the programmatic spending trend (the orange columns).

Happy new year. Did you have a white Christmas?

Happy 2016.

I spent time with the family in California, wiping out any chance of a white Christmas, although I hear that the probability would have been miniscule even had I stayed.

I did come across a graphic that tried to drive the point home, via NOAA.


Unfortunately, this reminded me a little of the controversial Florida gun-deaths chart (see here):


In this graphic, the designer played with the up-is-bigger convention, drawing some loud dissent.

Begin with the question addressed by the NOAA graphic: which parts of the country has the highest likelihood of having a white Christmas? My first instinct is to look at the darkest regions, which ironically match the places with the smallest chance of snow.

Surely, the designer's idea is to play with white Christmas. But I am not liking the result.


Then, I happen upon an older version (2012) of this map, also done by NOAA. (See this Washington Post blog for example.)


There are a number of design choices that make this version more effective.

The use of an unrelated brown color to cordon off the bottom category (0-10%) is a great idea.

Similarly, the play of hue and shade allows readers to see the data at multiple levels, first at the top level of more likely, less likely, and not likely, and then at the more detailed level of 10 categories.

Finally, there is no whiteness inside the US boundary. The top category is the lightest shade of purple, not exactly white. In the 2015 version above, the white of the snowy regions is not differentiated from the white of the Great Lakes.

I am still not convinced about the inversion of the darker-is-larger convention though. How about you?