I sketched out this blog post right before the Superbowl - and was really worked up as I happened to be flying into Atlanta right after they won (well, according to any of our favorite "prediction engines," the Falcons had 95%+ chance of winning it all a minute from the end of the 4th quarter!) What I'd give to be in the SuperBowl-winning city the day after the victory!
Maybe next year. I didn't feel like publishing about SuperBowl graphics when the wound was so very raw. But now is the moment.
The following chart came from Orange County Register on the run-up to the Superbowl. (The bobble-head quarterbacks also came from OCR). The original article is here.
The choice of a set of dot plots is inspired. The dot plot is one of those under-utilized chart types - for comparing two or three objects along a series of metrics, it has to be one of the most effective charts.
To understand this type of design, readers have to collect three pieces of information: first is to recognize the dot symbols, which color or shape represents which object being compared; second is to understand the direction of the axis; third is to recognize that the distance between the paired dots encodes the amount of difference between the two objects.
The first task is easy enough here as red stands for Atlanta and blue for New England - those being the team colors.
The second task is deceptively simple. It appears that a ranking scale is used for all metrics with the top ("1st") shown on the left side and the bottom ("32nd") shown on the right. Thus, all 32 teams in the NFL are lined up left to right (i.e. best to worst).
Now, focus your attention on the "Interceptions Caught" metric, third row from the bottom. The designer indicated "Fewest" on the left and "Most" on the right. For those who don't know American football, an "interception caught" is a good defensive play; it means your defensive player grabs a ball thrown by the opposing team (usually their quarterback), causing a turnover. Therefore, the more interceptions caught, the better your defence is playing.
Glancing back at the chart, you learn that on the "Interceptions Caught" metric, the worst team is shown on the left while the best team is shown on the right. The same reversal happened with "Fumbles Lost" (fewest is best), "Penalties" (fewest is best), and "Points Allowed per Game" (fewest is best). For four of nine metrics, right is best while for the other five, left is best.
The third task is the most complicated. A ranking scale always has the weakness that a gap of one rank does not yield information on how important the gap is. It's a complicated decision to select what type of scale to use in a chart like this, and in this post, I shall ignore this issue, and focus on a visual makeover.
I find the nine arrays of 32 squares, essentially the grid system, much too insistent, elevating information that belongs to the background. So one of the first fixes is to soften the grid system, and the labeling of the axes.
In addition, given the meaningless nature of the rank number (as mentioned above), I removed those numbers and used team logos instead. The locations on the axes are sufficient to convey the relative ranks of the two teams against the field of 32.
Most importantly, the directions of all metrics are now oriented in such a way that moving left is always getting better.
While using logos for sports teams is natural, I ended up replacing those, as the size of the dots is such that the logos are illegible anyway.
The above makeover retains the original order of metrics. But to help readers address the key question of this chart - which team is better, the designer should arrange the metrics in a more helpful way. For example, in the following version, the metrics are subdivided into three sections: the ones for which New England is significantly better, the ones for which Atlanta is much better, and the rest for which both teams are competitive with each other.
In the Trifecta checkup (link), I speak of the need to align your visual choices with the question you are trying to address with the chart. This is a nice case study of strengthening that Q-V alignment.
Attendess of my Copenhagen seminar this week saw an example of a Type QV chart (description of Trifecta checkup here), where the biggest problem is a disconnect between the question being addressed and the visual form.
The visually arresting form makes the number 60 scream. It is a small puzzle to figure out what 60 stands for. The red color is the 9th worst level of corruption out of 10 given in the scale. There were 60 countries placed into this level.
It's all very meaningless. The chart itself is proof that the countries were divided into uneven - apparently arbitrarily sized - segments. We learn nothing about how this "corruption perceptions index" is constructed, and which or how many countries were rated in total.
And even if all those issues could be resolved, knowing the histogram of countries ranked by perceived corruption does not tell us anything about corruption in those countries - it only informs about a minor aspect of the ranking scheme.
Notice, also, the country labels provided on the left column include just 7 countries, those in the best and worst levels, thus missing all of the sixty countries that caught our attention.
The last baffling decision is to create an eleventh phantom level dressed in black.
To reconstruct this, one first has to decide on a worthwhile question to illustrate.
Chris P. tipped me about this wonderful webpage containing an analysis of high-grossing movies. The direct link is here.
First, a Trifecta checkup: This thoughtful web project integrates beautifully rendered, clearly articulated graphics with the commendable objective of bringing data to the conversation about gender and race issues in Hollywood, an ambitious goal that it falls short of achieving because the data only marginally address the question at hand.
There is some intriguing just-beneath-the-surface interplay between the Q (question) and D (data) corners of the Trifecta, which I will get to in the lower half of this post. But first, let me talk about the Visual aspect of the project, which for the most part, I thought, was well executed.
The leading chart is simple and clear, setting the tone for the piece:
I like the use of color here. The colored chart titles are inspired. I also like the double color coding - notice that the proportion data are coded not just in the lengths of the bar segments but also in the opacity. There is some messiness in the right-hand-side labeling of the first chart but probably just a bug.
This next chart also contains a minor delight: upon scrolling to the following dot plot, the reader finds that one of the dots has been labeled; this is a signal to readers that they can click on the dots to reveal the "tooltips". It's a little thing but it makes a world of difference.
I also enjoy the following re-imagination of those proportional bar charts from above:
This form fits well with the underlying data structure (a good example of setting the V and the D in harmony). The chart shows the proportion of words spoken by male versus female actors over the course of a single movie (Tin Men from 1987 is the example shown here). The chart is centered in the unusual way, making it easy to read exactly when the females are allowed to have their say.
There is again a possible labeling hiccup. The middle label says 40th minute which would imply the entire movie is only 80 minutes long. (A quick check shows Tin Men is 110 minutes long.) It seems that they are only concerned with dialog, ignoring all moments of soundtrack, or silence. The visualization would be even more interesting if those non-dialog moments are presented.
The reason why the music and silence are missing has more to do with practicality than will. The raw materials (Data) used are movie scripts. The authors, much to their merit, acknowledge many of the problems that come with this data, starting with the fact that directors make edits to the scripts. It is also not clear how to locate each line along the duration of the movie. An assumption of speed of dialog seems to be required.
I have now moved to the Q corner of the Trifecta checkup. The article is motivated by the #OscarSoWhite controversy from a year or two ago, although by the second paragraph, the race angle has already been dropped in favor of gender, and by the end of the project, readers will have learned also about ageism but the issue of race never returned. Race didn't come back because race is not easily discerned from a movie script, nor is it clearly labeled in a resource such as IMDB. So, the designers provided a better solution to a lesser problem, instead of a lesser solution to a better problem.
In the last part of the project, the authors tackle ageism. Here we find another pretty picture:
At the high level, the histograms tell us that movie producers prefer younger actresses (in their 20s) and middle-aged actors (forties and fifties). It is certainly not my experience that movies have a surplus of older male characters. But one must be very careful interpreting this analysis.
The importance of actors and actresses is being measured by the number of words in the scripts while the ages being analyzed are the real ages of the actors and actresses, not the ages of the characters they are playing.
Tom Cruise is still making action movies, and he's playing characters much younger than he is. A more direct question to ask here is: does Hollywood prefer to put younger rather than older characters on screen?
Since the raw data are movie scripts, the authors took the character names, and translated those to real actors and actresses via IMDB, and then obtained their ages as listed on IMDB. This is the standard "scrape-and-merge" method executed by newsrooms everywhere in the name of data journalism. It often creates data that are only marginally relevant to the problem.
This ABC News chart seemed to have taken over the top of my Twitter feed so I better comment on it.
Someone at ABC News tried really hard to dress up the numbers. The viz is obviously rigged - Obama at 79% should be double the length of Trump's 40% but not even close!
In the Numbersensebook (Chapter 1), I played the role of the Devious Admissions Officer who wants to game the college rankings. Let me play the role of the young-gun dataviz analyst, who has submitted the following chart to the highers-up:
I just found out the boss blew the fuse after seeing my chart. The co-workers wore dirty looks, saying without saying "you broke it, you fix it!"
How do I clean up this mess?
Let me try the eye-shift trick.
The solid colors draw attention to themselves, and longer bars usually indicate higher or better so the quick reader may think that Obama is the worst and Trump is the best at ... well, "Favorability on taking office," as the added title suggests.
Next, let's apply the foot-chop technique. This fits nicely on a stacked bar chart.
I wantonly drop 20% of dissenters from every President's data. Such grade inflation actually makes everyone look better, a win-win-win-win-win-win-win proposition. While the unfavorables for Trump no longer look so menacing, I am still far from happy as, with so much red concentrated at the bottom of the chart, eyes are focused on the unsightly "yuge" red bar, and it is showing Trump with 50% disapproval.
I desperately need the white section of the last bar to trump its red section. It requires the foot-ankle-knee-thigh treatment - the whole leg.
Now, a design issue rears its head. With such an aggressive cut, there would be no red left in any of the other bars.
I could apply two cuts, a less aggressive cut at the top and a more aggressive cut at the bottom.
The Presidents neatly break up into two groups, the top three Democrats, and the bottom four Republicans. It's always convenient to have an excuse for treating some data differently from others.
Then, I notice that the difference between Clinton and GW Bush is immaterial (68% versus 65%), making it awkward to apply different cuts to the two neighbors. No problem, I make three cuts.
The chart is getting better and better! Two, three, why not make it five cuts? I am intent on making the last red section as tiny as possible but I can't chop more off the right side of GHW Bush or Reagan without giving away my secret sauce.
The final step is to stretch each bar to the right length. Mission accomplished.
This chart will surely win me some admiration. Just one lingering issue: Trump's red section is still the longest of the group. It's time for the logo trick. You see, the right ends of the last two bars can be naturally shortened.
The logo did it.
Faking charts can take as much effort as making accurate ones.
The ABC News chart encompasses five different scales. For every President, some percentage of dissenters were removed from the chart. The amount of distortion ranges from 15% to 47% of respondents.
It's a layered donut. There isn't much context here except that the chart comes from USDA. Judging from the design, I surmise that the key message is the change in proportion by food groups between 1970 and 2014. I am assuming that these food groups are exhaustive so that it makes sense to put them in a donut chart, with all pieces adding up to 100%.
The following small-multiples line chart conveys most of the information:
The story is the big jump in "Added fats and oils". In the layered donut, the designer highlighted this by a moire effect, something to be avoided.
Note the parenthetical 2010 next to the Added fats and oils label. The data for all other food groups come from 2014 but the number for the most important category is four years older. The chart would be more compelling if they used 2010 data for everything.
One piece of information is ostensibly absent in the line chart version - the growth in the size of the pie. The total of the data increased about 20% from 1970 to 2014. In theory, the layered donut can convey this growth by the perimeters of the circles. But it doesn't appear that the designer saw this as an important insight since the total area of the outer donut is clearly more than 20% of the area of the inner donut.
In February, I am bringing my dataviz lecture to various cities: Atlanta (Feb 7), Austin (Feb 15), and Copenhagen (Feb 28). Click on the links for free registration.
I hope to meet some of you there.
On the sister blog about predictive models and Big Data, I have been discussing aspects of a dataset containing IMDB movie data. Here are previous posts (1, 2, 3).
The latest instalment contains the following chart:
The general idea is that the average rating of the average film on IMDB has declined from about 7.5 to 6.5... but this does not mean that IMDB users like oldies more than recent movies. The problem is a bias in the IMDB user base. Since IMDB's website launched only in 1990, users are much more likely to be reviewing movies released after 1990 than before. Further, if users are reviewing oldies, they are likely reviewing oldies that they like and go back to, rather than the horrible movie they watched 15 years ago.
Modelers should be exploring and investigating their datasets before building their models. Same thing for anyone doing data visualization! You need to understand the origin of the data, and its biases in order to tell the proper story.
This WSJ graphic caught my eye. The accompanying article is here.
The article (judging from the sub-header) makes two separate points, one about the total amount of money raised in IPOs in a year, and the change in market value of those newly-public companies one year from the IPO date.
The first metric is shown by the size of the bubbles while the second metric is displayed as distances from the horizontal axis. (The second metric is further embedded, in a simplified, binary manner, in the colors of the bubbles.)
The designer has decided that the second metric - performance after IPO - to be more important. Therefore, it is much easier for readers to know how each annual cohort of IPOs has performed. The use of color to map to the second metric (and not the first) also helps to emphasize the second metric.
There are details on this chart that I admire. The general tidiness of it. The restraint on the gridlines, especially along the horizontal ones. The spatial balance. The annotation.
And ah, turning those bubbles into lollipops. Yummy! Those dotted lines allow readers to find the center of each bubble, which is where the values of the second metrics lie. Frequently, these bubble charts are presented without those guiding lines, and it is often hard to find the circles' anchors.
That leaves one inexplicable decision - why did they place two vertical gridlines in the middle of two arbitrary years?
The question is: what parameter is used to illustrate the figures? - Line length or angle?
The answer is - line length. But the eye is likely to use the angle as the measure and this is where an error may arise. It's almost an optical illusion - the smaller number of students lie on the circumferences of smaller circles - and a smaller length goes further around the circle. Thus, for example, Turkey attracts about one fifth of those students attracted by Germany but it looks like it's nearer half (45 degrees vs 90 degrees).
(The case of Spain is really bizarre - it looks like it's gone round the circle by over 280 degrees but actually what they've done is to break off the line at 90 degrees and stick the bit they broke off back on the diagram at the left.)
I have never seen this type of 'bar chart' before but it is really misleading.
Long-time readers may remember my discussion of the "race-track graph." (here) The "optical illusion" Lawrence mentions above is well known to any track runner. The inside lanes are shorter than outside lanes, so you stagger the starting positions.