On Friday, I'm attending and speaking at the Leaders in Software and Art Conference, organized by Isabel Draves. LISA is an amazing gathering of artists interested in technology and software. For example, there is a panel on 3D printing and hardware hacking, and one on "creative coding, art and advertising". Check out videos from past years, and click here to register. My talk is at around 3:30 in a tightly packed day of activities.
Andrew Sullivan highlighted a chart showing the public attitude toward climate change globally:
Andrew summarized the above chart thus: "Sadly, America is home to far more climate skeptics than the global average."
This conclusion may be correct but the chart is less convincing than it appears.
Let's pull out the Junk Charts Trifecta Checkup. Recall that there are three sides to the triangle. The question is well-posed, and the bar chart is an adequate choice for this data. We thank the designer for not printing the entire data set on the tight space, and to start the vertical axis at zero.
There are a few improvements one can still make to the bar chart. Start with turning it around so that the reader doesn't have to turn his/her head around. Also, extend the axis to 100% helps the interpretation a little bit.
If you have keen eyes, you notice that Greece showed up at the top of the revamped chart. The bar for Korea is also a tad too short in the original chart; it should be at 85%.
To what extent is the set of countries "global"? Take a look:
It missed all of Scandanavia, most of Indochina, India, much of Africa, and all of Central America.
In the Trifecta checkup, we note that the data may not be complete for the posed question. Given this flaw, the map is perhaps a better choice to show us where the holes are.
The Giants QB Eli Manning is in the news for the wrong reason this season. His hometown paper, the New York Times, looked the other way, focusing on one metric that he still excels at, which is longevity. This is like the Cal Ripken of baseball. The graphic (link) though is fun to look at while managing to put Eli's streak in context. It is a great illustration of recognition of foreground/background issues. (I had to snip the bottom of the chart.)
After playing around with this graphic, please go read Kevin QuigleyQuealy's behind-the-scenes description of the various looks that were discarded (link). He showed 19 sketches of the data. Sketching cannot be stressed enough. If you don't have discarded sketches, you don't have a great chart.
Pay attention to tradeoffs that are being made along the way. For example, one of the sketches showed the proportion of possible games started:
I like this chart quite a bit. The final selection arranges the data by team rather than by player so necessarily, the information about proportion of possible games started fell by the wayside.
(Disclosure: I'm on Team Philip. Good to see that he is right there with Eli even on this metric.)
New York/Tri-State residents: Meet me at NYU Bookstore tonight, 6-7:30 pm. (link)
When I wrote about the graphic showing the vote distribution around Syria in the Congress a few posts ago (link), readers offered opinions about what's a better graphic might look like. Having considered these submissions, I came up with a new visualization.
This graphic is one that facilitates an assessment of the prospect of the Syria resolution passing, given the known and leaning votes. It addresses various scenarios of how the undecided votes would break out. It also considers the extreme -- and unlikely -- case in which all leaning yes votes are sustained, all leaning no votes reverse, and all undecided vote yes. In that scenario, the President would have 131% of the votes needed for passing the resolution.
In this graphic, the real story of the data is revealed: based on the then known and leaning votes, the President would face certain defeat. Even if all the undecided broke in his favor, he would still only get to 86% of the votes needed to pass.
The top bar, showing composition, is a concession to those who wanted to understand how each party is voting under each scenario. It's a minor concern here.
Comparison to the original chart, reproduced below, is almost unfair. What is the prospect of the resolution passing? It's impossible to tell.
My graphic exposes less data, hides all No and Leaning No votes, displays no vote totals, and focuses on a computed metric, the proportional progress towards the 271 vote goal.
Reader Andrew C. was unhappy about the following stacked bar chart, published by Teach for America, touting its diversity. (link)
The lightning symbol that splits apart the Caucasian bar is a harbinger of trouble. For the designer deployed seven different scales -- in a bar chart with seven bars. The following chart reveals this idiosyncracy. In each revised bar, the blue portion is made proportional to the length of the bar. The implied full length of the bottom bar literally ran off the page!
Even if we ignore the gray bits, the blue portions are still not proportional as the much-too-long 0.5% section shows.
It would appear that the key piece of information is buried in the subtitle and does not feature in the original chart. In the revised version, I highlighted the 38% people of color, and then showed how that proportion splits by race.
Sometimes I wonder if I should just become a chart doctor. Andrew recently wrote that journals should have graphical editors. Businesses also need those, judging from this submission through Twitter (@francesdonald). Link is here.
You don't know whether to laugh or cry at this pie chart:
The author of the article complains that all the tall buildings around the world are cheats: vanity height is defined as the height above which the floors are unoccupied. The sample proportions aren't that different between countries, ranging from 13% to 19% (of the total heights). Why are they added together to make a whole?
The following boxplot illustrates both the average and the variation in vanity heights by region, and tells a more interesting story:
Recall that in a boxplot, the gray box contains the middle 50% of the data and the white line inside the box indicates the median value. UAE has a tendency to inflate the heights more while the other three regions are not much different.
The other graphic included in the same article is only marginally better, despite a much more attractive exterior:
This chart misrepresents the actual heights of the buildings. At first glance, I thought there must be a physical limit to the number of occupied floors since the grayed out sections are equal heights. If the decision has been made to focus on the vanity height, then just don't show the rest of the buildings.
Also, it's okay to assume a minimal intelligence on the part of readers - I mean, is there a need to repeat the "non-occupiable height" label 10 times? Similarly, the use of 10 sets of double asterisks is rather extravagant.
It's a mystery to me how there are always people who ignore certain rudimentary rules of graphing data. I'm talking about such clear guidelines as:
Bar charts encode data in the heights of the bars -- therefore:
You should start each bar at height zero, and
You should not vary the width of the bars (unless you are introducing another dimension), and
You should space the bars unevenly if your measurement times are unevenly spaced.
I mean, how is it in the year 2013, the BBC shows viewers this? (tip from UK reader Clarke C.)
The chart is absurd on its face. Men did not double in height between 1871 and 1971. This chart was broadcast in the show "breakfast" which apparently is the BBC UK version of Good Morning America.
I'd just use a line chart. The figurine construct is cute but too much trouble because you have to grow the width while growing the height. If you encode data in the area, then the height is no longer proportional to the real height.
Years ago, we featured something similar: how penguins evolved into humans (link). Curiously, also a gift from British media.
On Twitter, Joe D. disliked the following chart on the Information is Beautiful blog:
The chart carries a long list of flaws.
The column labeled "%" is probably the most jarring. The meaning of these numbers changes with the color. When pink, they give the proportion of females; when blue, the proportion of males. As the stated purpose of the chart is to explore the male-female balance at different websites, it is a bad decision to fold two dimensions into one. While you're thinking about what I just said, what do you think the percentages in gray mean? Your guess is as good as mine.
Now, I appreciate that the designer uses a margin of error (implicitly), and separated these three sites as representing "equality", even though only one of them has the exact 50/50 split.
Wait, for Orkut (second row), it's 51 percent female, and for Foursquare, it's 52 percent male. The gender is coded in the figurines. You can check that with your magnifying glass.
It gets better.
The list of websites is ordered by increasing polarity but only within the three sections. Logically, the three "equality" sites should sit between the "matriarchy" and the "patriarchy". Pinterest and Reddit, the two most polarized sites, should stand on the edges. On the diagram shown right, I simulated a reader who wants to scan through the list of websites from the most female-oriented (Pinterest) to the most male-oriented (Reddit). It's quite the obstacle course.
Let's get to Joe D.'s issue with the chart. How many people does each figurine represent? It's quite a mouthful. Each figurine represents one percent of the unique visitors at the specific website but only in excess of fifty-percent. In effect, the Facebook figurine represents a huge number of people compared to the figurine of a less popular website like tagged. The designer did not explain the inclusion criteria for websites.
If you didn't get that definition, just ignore the figurines and think of this chart as a bar chart in which the bars start at 50 percent (rather than zero as it should). A standard population pyramid appears to do a better job - just add bars to the left of the diagram and properly align the male and female sections.
As I said before, read the fine print.
Here's the fine print:
If I am not mistaken, the designer applied the gender proportions to the traffic totals to obtain the rightmost column, labeled "million more monthly female or male visitors". The trouble is one number pertains to U.S. visitors while the other pertains to worldwide traffic. By multiplying them, the designer makes an assumption: that gender ratio is equivalent inside and outside the U.S., for every website.
Just to give you a sense of scale, according to this chart, Facebook has an excess of 155 million female visitors per month. According to Comscore, the key provider of such data, Facebook has about 145 million total U.S. visitors in June, 2013. It's not a small deal to mix up the geographies.
This example illustrates what I call "use at your own peril". It's like the surgeon's warning in restaurants in the U.S.: we warn you that drinking alcohol while pregnant could lead to birth defects, but you are free to do whatever you want with this information.
As of this writing, the original chart has thousands of Facebook likes, hundreds of shares on Linkedin and Pinterest, etc.
It appears that a lot of people are enjoying the chart more than Joe and I do.
Finally, here is a sketch of how I would plot this type of data. (U.S. traffic data from Comscore, various months of 2012, where I can find them. Comscore is a fee-based service so it is not easy to find data for the smaller sites unless you have a subscription.)
One piece of advice I give for those wanting to get into data visualization is to trash the defaults (see the last part of this interview with me). Jon Schwabish, an economist with the government, gives a detailed example of how this is done in a guest blog on the Why Axis.
Here are the highlights of his piece.
He starts with a basic chart, published by the Bureau of Labor Statistics. You can see the hallmarks of the Excel chart using the Excel defaults. The blue, red, green color scheme is most telling.
Just by making small changes, like using tints as opposed to different colors, using columns instead of bars, reordering the industry categories, and placing the legend text next to the columns, Schwabish made the chart more visually appealing and more effective.
The final version uses lines instead of columns, which will outrage some readers. It is usually true that a grouped bar chart should be replaced by overlaid line charts, and this should not be limited to so-called discrete data.
Schwabish included several bells and whistles. The three data points are not evenly spaced in time. The year-on-year difference is separately plotted as a bar chart on the same canvass. I'd consider using a line chart here as well... and lose the vertical axis since all the data are printed on the chart (or else, lose the data labels).
This version is considerably cleaner than the original.
I noticed that the first person to comment on the Why Axis post said that internal BLS readers resist more innovative charts, claiming "they don't understand it". This is always a consideration when departing from standard chart types.
Another reader likes the "alphabetical order" (so to speak) of the industries. He raises another key consideration: who is your audience? If the chart is only intended for specialist readers who expect to find certain things in certain places, then the designer's freedom is curtailed. If the chart is used as a data store, then the designer might as well recuse him/herself.