On Friday, I'm attending and speaking at the Leaders in Software and Art Conference, organized by Isabel Draves. LISA is an amazing gathering of artists interested in technology and software. For example, there is a panel on 3D printing and hardware hacking, and one on "creative coding, art and advertising". Check out videos from past years, and click here to register. My talk is at around 3:30 in a tightly packed day of activities.
Andrew Sullivan highlighted a chart showing the public attitude toward climate change globally:
Andrew summarized the above chart thus: "Sadly, America is home to far more climate skeptics than the global average."
This conclusion may be correct but the chart is less convincing than it appears.
Let's pull out the Junk Charts Trifecta Checkup. Recall that there are three sides to the triangle. The question is well-posed, and the bar chart is an adequate choice for this data. We thank the designer for not printing the entire data set on the tight space, and to start the vertical axis at zero.
There are a few improvements one can still make to the bar chart. Start with turning it around so that the reader doesn't have to turn his/her head around. Also, extend the axis to 100% helps the interpretation a little bit.
If you have keen eyes, you notice that Greece showed up at the top of the revamped chart. The bar for Korea is also a tad too short in the original chart; it should be at 85%.
To what extent is the set of countries "global"? Take a look:
It missed all of Scandanavia, most of Indochina, India, much of Africa, and all of Central America.
In the Trifecta checkup, we note that the data may not be complete for the posed question. Given this flaw, the map is perhaps a better choice to show us where the holes are.
New York/Tri-State residents: Meet me at NYU Bookstore tonight, 6-7:30 pm. (link)
When I wrote about the graphic showing the vote distribution around Syria in the Congress a few posts ago (link), readers offered opinions about what's a better graphic might look like. Having considered these submissions, I came up with a new visualization.
This graphic is one that facilitates an assessment of the prospect of the Syria resolution passing, given the known and leaning votes. It addresses various scenarios of how the undecided votes would break out. It also considers the extreme -- and unlikely -- case in which all leaning yes votes are sustained, all leaning no votes reverse, and all undecided vote yes. In that scenario, the President would have 131% of the votes needed for passing the resolution.
In this graphic, the real story of the data is revealed: based on the then known and leaning votes, the President would face certain defeat. Even if all the undecided broke in his favor, he would still only get to 86% of the votes needed to pass.
The top bar, showing composition, is a concession to those who wanted to understand how each party is voting under each scenario. It's a minor concern here.
Comparison to the original chart, reproduced below, is almost unfair. What is the prospect of the resolution passing? It's impossible to tell.
My graphic exposes less data, hides all No and Leaning No votes, displays no vote totals, and focuses on a computed metric, the proportional progress towards the 271 vote goal.
Kevin Drum shows the following graphic (link) to illustrate where the House stood on authorizing force in Syria.
What interests me is whether the semi-circle concept adds to the chart. It evokes the physical appearance of a chamber, presumably where such a debate has taken place -- although most televised hearings tend to exhibit lots of empty seats.
The half-filled circles in particular do not make peace with me.
Reader Steph G. didn't like the effort by WRAL (North Carolina) to visualize the demographics of protestors in Raleigh. It sounds like the citizens of NC are making their voices heard. Maybe my friends in Raleigh can give us some background.
There are definitely problems with the choice of charts. But I rate this effort a solid B. In the Trifecta Checkup, they did a good job describing the central question, as well as compiled an appropriate dataset. I love it when people go out to collect the right data rather than use whatever they could grab. The issue was the execution of the charts.
The first was a map showing where the arrested protestors came from.
Maps are typically used to show geographical distribution. The chosen color scheme (two levels of green and gray) compresses the data so much that we learn almost nothing about distribution. I clicked on Wake County to learn that there were 178 arrests there. The neighboring Randolph County had only 1 arrest but you can't tell from the colors.
The next chart shows the trend of arrests over time. I like the general appearance (except for the shadows). The problem is the even spacing of the columns when the gaps between the arrests are uneven.
Here's a quick redo, with proper spacing:
The final set of charts is inspired. They compare the demographics of those arrested protestors against the average North Carolina resident. For example:
For categories like Age with quite a few levels, the pie chart isn't a good choice. It's also hard to compare across pie charts. A column or dot chart works better.
The New York Times brought attention to the Bronx courtrooms this weekend. (link) The following small-multiples chart effectively illustrates how the Bronx system is uniquely unproductive, compared to the other boroughs:
The above chart shows the outcomes. The next chart shows the possible cause.
It appeared that at any time of the day, at least one-third of the courtrooms are not actively conducting business. In fact, outside of the period between 10:30 and 12:30, and 2:30, less than half of the courtrooms have a judge present.
I want to draw your attention to the caption below the chart. It said: "The Times visited all 47 courtrooms at the Bronx County Hall of Justice in 30-minute intervals totally how many were open and actively in session, ..."
Too often, we analyze and plot whatever data has been collected conveniently by some machine. Such data frequently do not address the questions we'd like to answer. We let the data dictate our research question.
Most great work in statistics come from people who put in the effort to define their research goals first, and then manually collect the specific data needed to accomplish those goals.
Felix linked to a set of charts about guns in the U.S. (and elsewhere). The original charts, by Liz Fosslien, are found here.
I like the clean style used by Fosslien. Some of the charts are thought-provoking. Many of them may raise more questions than they answer. Here are a few that caught my eye.
A simplistic interpretation would claim that banning handguns is futile, and may even have an adverse impact on murder rate. However, this chart does not reveal the direction of causality. Did some countries ban handguns because they are reacting to higher violence? If that is the case, this chart is confirming that the countries with handgun bans are a self-selected group.
The U.S. is an outlier, both in terms of firearm ownership and firearm homicides. This makes the analysis much harder because the U.S. is really in a class of its own. It's not at all clear whether there is a positive correlation in the cluster below, and even if there is, whether we can draw a straight line up to the U.S. dot is also dubious.
Fosslien is being cheeky to deny us the identity of the other outlier, the country with few firearms but even higher death rate from intentional homicide. These scatter plots are great by the way to show bivariate distributions.
I'd still prefer a line chart for this type of data but this particular paired bar chart works for me as well. The contents of this chart is a shock to me.
Xan G. has a must-read post comparing different ways of showing the electoral map. See here.
The key learning is something I often point out on this blog: geographical data can have a greater impact when it is unshackled from the map.
Xan pointed to a series of ideas that are improvements upon the map.
Here's an attempt to portray the election night as a horse race. This borrows an idea from the sports world where a baseball game can be portrayed with such a chart.
I love this sort of presentation. Similar to a baseball game, someone can look at this chart after the fact and experience the ups and downs of an Obama/Romney supporter without actually being there.
Then Xan spoils some of the fun by transforming the above into the following chart, which portrays Obama's win as a rout. All the suspense is gone!
As Xan explains it, he took Nathan Silver's predictions of "sure wins" and plotted those first. Thus, Obama started the night at almost 200 while Romney started with about 170.
While indeed the fun is gone, this is a more accurate view of this just-concluded election. I was a spoilt sport myself that night as I kept telling my friends that the only reason why Romney seemed close at the start was that the Red States generally have smaller populations, and thus took less time to count their votes. In addition, the Red States also tend to favor Republican candidates by very large margins so that the winner could be called early without counting most of the votes.
I have other thoughts on the state of reporting on polls, which I'll cover in a later post.
The November issue of Bloomberg Markets published the following pair of pyramid charts:
This chart fails a number of tests:
Tufte's data-ink ratio test
There are a total of six data points in the entire graphic. A mathematician would say only four data points, since the "no opinion" category is just the remainder. The designer lavishes this tiny data set with a variety of effects: colors, triangles, fonts of different tints, fonts of different sizes, solid and striped backgrounds, and legends, making something that is simple much more complex than necessary. The extra stuff impedes rather than improves understanding. In fact, there were so many parts that the designer even forgot to add little squares on the right panel beside the category labels.
Junk Charts's Self-sufficiency test
The data are encoded in the heights of the pyramids, not the areas. The shapes of the areas are inconsistent, which also makes it impossible to decipher. The way it is set up, one must compare the green, striped triangle with two trapezoids. This is when a designer realizes that he/she must print the data labels onto the chart as well. That's when self-sufficiency is violated. Cover up the data labels, and the graphical elements themselves no longer convey the data to the readers. More posts about self-sufficiency here.
Junk Charts's Trifecta checkup
The juxtaposition of two candidates' positions on two entirely different issues does not yield much insights. One is an economic issue, one is military in nature. Is this a commentary of the general credibility of the candidates? or their credibility on specific issues? or the investors' attitude toward the issues? Once the pertinent question is clarified, then the journalist needs to find the right data to address the question. More posts about the Trifecta checkup here.
Minimum Reporting Requirements for polls
Any pollster who doesn't report the sample size and/or the margin of error is not to be taken seriously. In addition, we should want to know how the sample was selected. What does it mean by "global investors"? Did the journalist randomly sample some investors? Did investors happen to fill out a survey that is served up somehow?
The following bar charts, while not innovative, speak louder.