The Giants QB Eli Manning is in the news for the wrong reason this season. His hometown paper, the New York Times, looked the other way, focusing on one metric that he still excels at, which is longevity. This is like the Cal Ripken of baseball. The graphic (link) though is fun to look at while managing to put Eli's streak in context. It is a great illustration of recognition of foreground/background issues. (I had to snip the bottom of the chart.)
After playing around with this graphic, please go read Kevin QuigleyQuealy's behind-the-scenes description of the various looks that were discarded (link). He showed 19 sketches of the data. Sketching cannot be stressed enough. If you don't have discarded sketches, you don't have a great chart.
Pay attention to tradeoffs that are being made along the way. For example, one of the sketches showed the proportion of possible games started:
I like this chart quite a bit. The final selection arranges the data by team rather than by player so necessarily, the information about proportion of possible games started fell by the wayside.
(Disclosure: I'm on Team Philip. Good to see that he is right there with Eli even on this metric.)
Business Insider links to this blog with a chart depicting the top beer brands by state.
I like the quilt-like appearance brought on by using the packaging of different brands. The nine glowing yellow islands sitting in the Atlantic Ocean I find annoying. This happens a lot because those New England states are smaller in area than most.
The design problem evaporates if you choose a small multiples approach. As shown below, there is the added benefit that the regional pattern of brand preference is clearly visible whereas in the original chart, it is rather hard to figure out.
I won't comment on the data source here. It's highly suspect.
Kevin Drum shows the following graphic (link) to illustrate where the House stood on authorizing force in Syria.
What interests me is whether the semi-circle concept adds to the chart. It evokes the physical appearance of a chamber, presumably where such a debate has taken place -- although most televised hearings tend to exhibit lots of empty seats.
The half-filled circles in particular do not make peace with me.
On Twitter, Joe D. disliked the following chart on the Information is Beautiful blog:
The chart carries a long list of flaws.
The column labeled "%" is probably the most jarring. The meaning of these numbers changes with the color. When pink, they give the proportion of females; when blue, the proportion of males. As the stated purpose of the chart is to explore the male-female balance at different websites, it is a bad decision to fold two dimensions into one. While you're thinking about what I just said, what do you think the percentages in gray mean? Your guess is as good as mine.
Now, I appreciate that the designer uses a margin of error (implicitly), and separated these three sites as representing "equality", even though only one of them has the exact 50/50 split.
Wait, for Orkut (second row), it's 51 percent female, and for Foursquare, it's 52 percent male. The gender is coded in the figurines. You can check that with your magnifying glass.
It gets better.
The list of websites is ordered by increasing polarity but only within the three sections. Logically, the three "equality" sites should sit between the "matriarchy" and the "patriarchy". Pinterest and Reddit, the two most polarized sites, should stand on the edges. On the diagram shown right, I simulated a reader who wants to scan through the list of websites from the most female-oriented (Pinterest) to the most male-oriented (Reddit). It's quite the obstacle course.
Let's get to Joe D.'s issue with the chart. How many people does each figurine represent? It's quite a mouthful. Each figurine represents one percent of the unique visitors at the specific website but only in excess of fifty-percent. In effect, the Facebook figurine represents a huge number of people compared to the figurine of a less popular website like tagged. The designer did not explain the inclusion criteria for websites.
If you didn't get that definition, just ignore the figurines and think of this chart as a bar chart in which the bars start at 50 percent (rather than zero as it should). A standard population pyramid appears to do a better job - just add bars to the left of the diagram and properly align the male and female sections.
As I said before, read the fine print.
Here's the fine print:
If I am not mistaken, the designer applied the gender proportions to the traffic totals to obtain the rightmost column, labeled "million more monthly female or male visitors". The trouble is one number pertains to U.S. visitors while the other pertains to worldwide traffic. By multiplying them, the designer makes an assumption: that gender ratio is equivalent inside and outside the U.S., for every website.
Just to give you a sense of scale, according to this chart, Facebook has an excess of 155 million female visitors per month. According to Comscore, the key provider of such data, Facebook has about 145 million total U.S. visitors in June, 2013. It's not a small deal to mix up the geographies.
This example illustrates what I call "use at your own peril". It's like the surgeon's warning in restaurants in the U.S.: we warn you that drinking alcohol while pregnant could lead to birth defects, but you are free to do whatever you want with this information.
As of this writing, the original chart has thousands of Facebook likes, hundreds of shares on Linkedin and Pinterest, etc.
It appears that a lot of people are enjoying the chart more than Joe and I do.
Finally, here is a sketch of how I would plot this type of data. (U.S. traffic data from Comscore, various months of 2012, where I can find them. Comscore is a fee-based service so it is not easy to find data for the smaller sites unless you have a subscription.)
One piece of advice I give for those wanting to get into data visualization is to trash the defaults (see the last part of this interview with me). Jon Schwabish, an economist with the government, gives a detailed example of how this is done in a guest blog on the Why Axis.
Here are the highlights of his piece.
He starts with a basic chart, published by the Bureau of Labor Statistics. You can see the hallmarks of the Excel chart using the Excel defaults. The blue, red, green color scheme is most telling.
Just by making small changes, like using tints as opposed to different colors, using columns instead of bars, reordering the industry categories, and placing the legend text next to the columns, Schwabish made the chart more visually appealing and more effective.
The final version uses lines instead of columns, which will outrage some readers. It is usually true that a grouped bar chart should be replaced by overlaid line charts, and this should not be limited to so-called discrete data.
Schwabish included several bells and whistles. The three data points are not evenly spaced in time. The year-on-year difference is separately plotted as a bar chart on the same canvass. I'd consider using a line chart here as well... and lose the vertical axis since all the data are printed on the chart (or else, lose the data labels).
This version is considerably cleaner than the original.
I noticed that the first person to comment on the Why Axis post said that internal BLS readers resist more innovative charts, claiming "they don't understand it". This is always a consideration when departing from standard chart types.
Another reader likes the "alphabetical order" (so to speak) of the industries. He raises another key consideration: who is your audience? If the chart is only intended for specialist readers who expect to find certain things in certain places, then the designer's freedom is curtailed. If the chart is used as a data store, then the designer might as well recuse him/herself.
Notice that I have indiced every metric against the league average. This is shown in the first panel. I use a red dot to warn readers that the direction of this metric is opposite to the others (left of center is a good thing!)
You can immediately make a bunch of observations:
Alex Smith was quite poor, except for interceptions.
Colin Kaepernick had similar passing statistics as Smith. His only advantage over Smith was the rushing.
Joe Flacco, as we noted before, is as average as it goes (except for rushing yards).
Tyrrod Taylor is here to remind us that we have to be careful about backup players being included in the same analysis.
The second version is a heatmap.
This takes inspiration from the fact that any serious reader of the spider chart will be reading the eight spokes (dimensions) separately. Why not plot these neatly in columns and use color to help us find the best and worst?
Imagine this to be a large table with as many rows as there are quarterbacks. You will able to locate the red (hot) zones quickly. You can also scan across a row to understand that player's performance relative to the average, on every metric.
I like this visualization best, primarily because it scales beautifully.
The final version is a profile chart, or sometimes called a parallel coordinates plot. While I am an advocate of profile charts, they really only work when you have a small number of things to compare.