A small step for interactivity

Alberto links to a nice Propublica chart on average annual spend per dialysis patient on ambulances by state. (link to chart and article)

Propublica_ambulance

It's a nice small-multiples setup with two tabs, one showing the states in order of descending spend and the other, alphabetical.

In the article itself, they excerpt the top of the chart containing the states that have suspiciously high per-patient spend.

Several types of comparisons are facilitated: comparison over time within each state, comparison of each state against the national average, comparison of trend across states, and comparison of state to state given the year.

The first comparison is simple as it happens inside each chart component.

The second type of comparison is enabled by the orange line being replicated on every component. (I'd have removed the columns from the first component as it is both redundant and potentially confusing, although I suspect that the designer may need it for technical reasons.)

The third type of comparison is also relatively easy. Just look at the shape of the columns from one component to the next.

The fourth type of comparison is where the challenge lies for any small-multiples construction. This is also a secret of this chart. If you mouse over any year on any component, every component now highlights that particular year's data so that one can easily make state by state comparisons. Like this for 2008:

Propublica_ambulance_2008

You see that every chart now shows 2008 on the horizontal axis and the data label is the amount for 2008. The respective columns are given a different color. Of course, if this is the most important comparison, then the dimensions should be switched around so that this particular set of comparisons occurs within a chart component--but obviously, this is a minor comparison so it gets minor billing.

***

I love to see this type of thoughtfulness! This is an example of using interactivity in a smart way, to enhance the user experience.

The Boston subway charts I featured before also introduce interactivity in a smart way. Make sure you read that post.

Also, I have a few comments about the data analysis on the sister blog.


Conventions, novelty and the double edge

This chart from Reuters is making the rounds on Twitter today.

Reuters_US-FLORIDA0214

Quickly, tell me whether the Gun Law in Florida did well or poorly.

That of course is the entire purpose of the chart.

***

If you are like me, that is, you have knowledge in your head of time-series line charts, you probably experienced that moment where the bottom fell out and you didn't know which way was up.

This is the double edge of novelty in charts. There should be a very high bar against running counter to convention. Readers do bring their "baggage" to the chart, and the designer should take that into consideration.

Some commentators are complaining about trickery. That may be true. But it's also possible the designer actually thought reversing the direction of the vertical axis made the chart better.

Don't forget about we have another convention: up is good and down is bad. Fewer murders is good and more murders is bad. So why not make it such that a rising line indicates goodness (fewer murders)?

***

Going back to the Trifecta Checkup. This chart has dual problems. We just talked about the syncing between the data and the graphical element.

The other issue is that the data is insufficient to draw conclusions about the underlying question: what explains the shift in number of murders since the late 2000s? This is a complex problem--the chapter in Freakonomics about abortion and crime rate is still instructive, not for the disputed conclusion but for the process of testing various hypotheses. The reduction of the complex causal structure to a single factor is dissatisfying.

 

 

 


Advocacy graphics

Note: If you are here to read about Google Flu Trends, please see this roundup of the coverage. My blog is organized into two sections: the section you are on is about data visualization; the other section concerns Big Data and use of statistical thinking in daily life--click to go there. Or, you can follow me on Twitter which combines both feeds.

***

Because the visual medium is powerful, it is a favorite of advocates. Creating a chart for advocacy is tricky. One must strike the proper balance between education and messaging. The chart needs to present the policy position strongly and also enlighten the unconverted with useful information.

In my interview with MathBabe Cathy O'Neil (link), she points to this graphic by Pew that illustrates where death-penalty executions have been administered in the past two decades in the U.S. (link) Here is a screenshot of the geographic distribution for 2006:

Pew_deathpenalty

The chart is a variant of the CDC map of obesity, which I discussed years ago. At one level, the structure of the data is the same. Each state is evaluated on a particular metric (proportion obese, and number of executions) once a year. Both designers choose to roll through a sequence of small-multiple maps.

The key distinction is that the obesity map encodes the data in color while the executions map encodes data in the density of semi-transparent, overlapping dots, each dot representing a single execution.

Perhaps the idea is to combat one of the weaknesses of color encoding: humans don't have an instinctive sense of the mapping between a numerical scale and a color scale. If the color transitions from yellow to orange, how many more executions would that map to? By contrast, if you see 200 dots instead of 160, we know the difference is 40.

***

The switch to the dots aesthetic introduces a host of problems.

Density, as you recall from geometry class, is the count divided by the area. High density can be due to a lot of executions or a very small area. Look at Delaware (DE) versus Georgia (GA). The density of red appears similar but there have been far fewer executions in Delaware.

This is a serious mistake. By using dot density, the designer encourages readers to think in terms of area of each state but why should the number of executions be related to area? As Cathy pointed out, a more relevant reference point is the population of each state. An even cleverer reference point might be the number of criminals/convictions in each state.

Pew_deathpenalty_noteAnother design issue relates to the note at the bottom of the chart (shown on the right). Here, the designer is fighting against the reader's knowledge in his/her head. It is natural for a dot on a map to represent location and yet the spatial distribution of the dots here provide no information. Credit the designer for clarifying this in a footnote; but also let this be a warning that there are other visual representation that does not require such disclaimers.

***

I am confused by why dots appear but never disappear. It seems that the chart is plotting cumulative counts of executions from 1977, rather than the number of executions in each year, as the chart title suggests. (If you go to the Pew website, you find a version with "cumulative" in the title; when they produced the animated gif, they decided to simplify the title, which is a poor decision.)

It requires a quick visit to Wikipedia to learn that there was a break in executions in the 70s. This is a missed opportunity to educate readers about the context of this data. Similarly, a good chart presenting this data should distinguish between states that have banned the death penalty and states that have zero or low numbers of executions.

***

A great way to visualize this data is via a heatmap. Here, I whipped up a quick sketch (pardon the sideway text on the legend):

Executions_sketch

I forgot to add the footnote listing the states where the death penalty is banned. Also can add an axis labeling to the side histogram showing counts.

 

 


Pets may need shelter from this terrible chart

Josh tweeted quite a shocking attack ad to me last week. He told me it came from the DC Metro. The ad is taken out by a group called HumaneWatch.Org, which apparently is a watchdog checking up on charity organizations. The ad attacks a specific group called the Humane Society of the United States. Here is the map that is the centerpiece of the copy:

Dcmetro_map_sm

Trifecta_checkupI like to use the Trifecta checkup to evaluate graphics. It's a nice way to organize your visualization critique. You progress through three corners: figuring out what is the practical question being addressed by the graphic, then evaluating what data is being deployed, and finally whether the graphical elements (the chart itself) is well executed in relation to the question and the data.

THE QUESTION:

Based on the map, it appears that HumaneWatch is interested in the spending on pet shelters. Every number shown is tiny: on a quick scan, the range may be from 0% to 0.35%. The all-caps title "A Whole Lotta Nothing" confirms that this is the intended message.

Knowing nothing about either of these organizations leaves me confused. Should the "Humane Society" be spending the bulk of its budget on pet shelters? If it doesn't, is it because the staff is pilfering money, or because it has wasteful spending, or because pets are not its major cause, or because pet shelters are not the key way this organization helps pets?

I did look up Humane Society to learn that it is an animal rights group. The four bullet points at the bottom of the ad provide a clue as to what the designer wanted to convey: namely, that this charity is a scam, with too much overhead spending, and spending on pensions.

Dcmetro_map_bottom

So I think the question being asked is sufficiently clarified, and it's a pretty important one. How is this organization spending its donations? Is it irresponsible compared to other similar organizations?

THE DATA:

The data should be in sync with the question being addressed; that's why there is a link between the two corners of the Trifecta. Given the trouble I endured understanding the question being addressed, it would come as no surprise that this chart scores poorly on the DATA corner.

I don't understand why budget spent on pet shelters is the key bone of contention. Based on the perceived objectives, it seems that they should display directly what proportion of the budget went to overhead, and what proportion went to pensions, with suitable comparisons.

The analysis by state is a disease of having too much data. Let's imagine that the proportions averaged across all states come to 0.1%. If we replaced those 50 numbers with one number printed across all states: "The Humane Society spends less than 0.1% of its budget on pet shelters.", the message would have been identical, while being less confusing.

And it's not just confusion. Cutting the data by state introduces complications. The analyst would need to make sure that any differences between states are not due to factors such as the number of pets, the proportion of households owning pets, the average spending per pet, the supply and demand for pet shelters, the existence of alternatives to pet shelters, etc. None of these issues need to worry the designer who does not slice the data down.

The same reason goes for why the absolute amount of spending (encoded in the colors of individual states) is not worth the ink it's printed on. The range between 0% and 0.35% has been chopped into seven pieces, which creates artificial gaps between the states. This design muddles the graphic's key message, "A Whole Lotta Nothing".

 
THE CHART ITSELF:

As we land on the final corner of the Trifecta, we ignore our previous complaint and accept that the proportion of budget is an interesting data series to visualize, and turn attention to the graphical elements. This chart scores poorly on chart execution as well!

Notice that the designer simultaneously plots two data series on the same map, the dollar value of pet shelter spending, and it as a proportion of budget. The former is encoded in the color of the state areas while the latter is printed directly as data labels. This is a map equivalent of "dual-axes" line charts, and equally unreadable.

Dcmetro_map_colorsBased on the color legend, our brain tells us the yellow states are better than the blue states but the huge numbers printed on the map conveys the opposite message. The progression of colors makes little sense. The red and yellow stand out but those states are in the middle of the range.

It's a little blurry but I think there is a number of New England states in the high spending category (black and dark gray colors), and the map just happens to obscure this key feature.

 

SUMMARY:

PRACTICAL QUESTION: Fair

DATA: Very Poor

CHART: Poor


Light entertainment: Behold the 10 percent change!

Reader Orjan L. sent in this Swedish delight:

Swedish_1

It's on the last page of this report, and I'm told it's about the number of weapons seized by Swedish customs each year.

***

On p. 8, I found a hockey-stick chart:

Swedish_2

Sweden in ecstasy.

***

For those who love cross-over charts, look no further than p. 3 which has a reverse hockey stick.

Swedish_3


Hard work pays off

At the NY Tech Meetup, Andrei Scheinkman showed off some work his team at Huffington Post did relating to gun violence in America.

Huff_gunviolencemap

 

Interactive version is here. The animation shows day by day, where the victims of gun violence were located. The table below contains the details of each victim, and links to the news story covering the event.

***

What is not seen on the chart is even more impressive. Andrei described how they looked around for databases that would provide them the raw materials for creating this chart but no timely source exists. This means that a team of 15 (if I heard correctly) spent a month or so manually collecting all the data on a spreadsheet.

It's also the reason why they cannot continue the map indefinitely, as people have other things to do.

Andrei also contrasted this visualization with a text article that describes the state of gun violence in words. You guessed it, the visual presentation is hands-down more compelling.


Doing legwork, doing justice

The New York Times brought attention to the Bronx courtrooms this weekend. (link) The following small-multiples chart effectively illustrates how the Bronx system is uniquely unproductive, compared to the other boroughs:

Nyt_bronxcourts_0

The above chart shows the outcomes. The next chart shows the possible cause.

Nyt_bronx_courtsIt appeared that at any time of the day, at least one-third of the courtrooms are not actively conducting business. In fact, outside of the period between 10:30 and 12:30, and 2:30, less than half of the courtrooms have a judge present.

I want to draw your attention to the caption below the chart. It said: "The Times visited all 47 courtrooms at the Bronx County Hall of Justice in 30-minute intervals totally how many were open and actively in session, ..."

Too often, we analyze and plot whatever data has been collected conveniently by some machine. Such data frequently do not address the questions we'd like to answer. We let the data dictate our research question.

Most great work in statistics come from people who put in the effort to define their research goals first, and then manually collect the specific data needed to accomplish those goals.

 




Interpreting some charts about guns

Felix linked to a set of charts about guns in the U.S. (and elsewhere). The original charts, by Liz Fosslien, are found here.

I like the clean style used by Fosslien. Some of the charts are thought-provoking. Many of them may raise more questions than they answer. Here are a few that caught my eye.

Handguns_1

A simplistic interpretation would claim that banning handguns is futile, and may even have an adverse impact on murder rate. However, this chart does not reveal the direction of causality. Did some countries ban handguns because they are reacting to higher violence? If that is the case, this chart is confirming that the countries with handgun bans are a self-selected group.

***

Handguns_2

The U.S. is an outlier, both in terms of firearm ownership and firearm homicides. This makes the analysis much harder because the U.S. is really in a class of its own. It's not at all clear whether there is a positive correlation in the cluster below, and even if there is, whether we can draw a straight line up to the U.S. dot is also dubious.

***

Handguns_3

Fosslien is being cheeky to deny us the identity of the other outlier, the country with few firearms but even higher death rate from intentional homicide. These scatter plots are great by the way to show bivariate distributions.

***

Handguns_4

I'd still prefer a line chart for this type of data but this particular paired bar chart works for me as well. The contents of this chart is a shock to me.

***

Handguns_5

I just don't get this one. Why is there a fan?


Statistical adjustment in charts

On the book blog, I often talk about the reasons why statisticians adjust data, and why it is necessary in order to paint a proper picture of what the data is saying. (See here or here.)

On this blog, I have frequently complained about how the "prior information" on maps is too strong - large regions dominate our perception regardless of the data. In the U.S., large but sparsely populated states attain disproportionate attention.

So, why not bring "statistical adjustment" to maps?

***

That's exactly what cartograms do. For example, look at the following pair of maps created by the people at Leicestershire County Council. (PDF link here)

Lei_map12

The map on the left and the cartogram on the right plot identical data. The only difference is that each hexagon on the cartogram represents an equal number of people. The two views give very different impressions: the big dark green patch on the middle-right of the map -- representing a relatively sparse neighborhood -- is shrunk to a single dark green hexagon on the cartogram. Meanwhile, the most deprived areas (dark purple) which look relatively small on the map are expanded to quite a few hexagons.

According to the map, most of the county live in areas ranked in the half considered less deprived (green), and that is good news. But wait... there is a lot of purple in the cartogram!

The real piece of news is that the majority of people live in the half of the neighborhoods considered more deprived (purple) but this uncomfortable fact is well-hidden in the mostly green map on the left.

Given that the measures of "deprivation" are about people, not geographical neighborhoods, the cartogram is much closer to the real world experience... notwithstanding the obvious geographical distortion introduced by the statistical adjustment.

According to Alex L., who is part of the team producing these graphics:

LSOAs were created for the 2001 [UK] Census to disseminate the data and are generally considered to represent 'neighbourhoods'. They are created to have a broadly consistent population (approx 1500 people in 2001) and socio-economic traits.

***

Question: Is there any reason to show the map at all?

 


The wall of blinking lights

Reader Alex L. submitted this chart showing the evolution of quality of life in Warwickshire in the U.K.

Warwick_walloflights

 This wall of lights is drawing way too much power. Let's make a list of fixes:

  • Stretch out the hemisphere, turning those arcs into horizontal lines
  • Allow readers to read horizontally, rather than centrifugally (?)
  • Align horizontally all of the labels for the quality of life indicators
  • Allow readers to read indicator labels in one direction, rather than inside-out on the right hemisphere and outside-in on the left hemisphere
  • Assume readers understand that the first year for which there is data is the "baseline year"
  • Remove the distance between one data point and another, which makes unnecessary the white gridlines
  • Use rectangles (rather than circles) as they can be packed more tightly
  • Order the indicators in a meaningful way

Eventually this chart reveals itself as a heatmap:

Redo_warwick1

The heatmap is much better. But the heatmap doesn't expose the trends clearly, especially the differences between indicators. The heatmap function (in R) has a built-in clustering method which automatically groups the indicators by similarity of trends. The color scheme should really be reversed since on this chart, red is good, and blue is bad; the default orientation of the column labels is also annoying. The bad indicators are clustered to the top, the good ones in the middle and the neutral ones at the bottom.

***

The next version uses the line chart, in a small-multiples setting. Now we have something to chew on.

Redo_warwick2

Although not done here, we can order this set of charts using the clustering results from the heatmap.

The lesson is that the pretty colors in the heatmap really tell us much less than the plain levels in a line chart.