A reader sends us to this chart published in the Telegraph (UK paper):
Which countries have ratified the EFSF changes?
Yes, indeed. Which countries?
I contributed the following post to the Statistics Forum. They are having a discussion comparing information visualization and statistical graphics. I use the following matrix to classify charts in terms of how much work they make readers do, and how much value readers get out of doing said work.
To read the rest of it, click here.
Like Australia-based reader Ken B., I don't understand why many chart designers insist on using charts to deliver lessons to the public on map geography. Here is a recent example from Down Under, on earthquakes: (click on this link for the interactive version)
Was there a quake that shook the middle of the Pacific? Did a new geological formation give New Zealand a Pinocchio nose? No and no. The ugly presentation of the 2010 and 2011 Christchurch earthquakes -- as two ends of a dumbbell -- makes clear the straitjacket that maps are when it comes to delivering quantitative information.
Besides, the bubbles represent the relative magnitude of the quakes when one would hope that their sizes represent the geographical extent of the damage; at least, that would be information that has a spatial dimension.
The location of the quake is the only data with a spatial dimension surfaced on this plot. The only purpose of the map background is to tell us where Christchurch, Sichuan, etc. are on a map. In order to deliver this map lesson, the designer has to hide all of the more interesting data, like the relative magnitudes, the time-lines, the extent of the damage, the mortality rates, etc. In my mind, that is a very poor tradeoff.
Here is a close-up of California:
Anytime someone expands the possibilities of a chart type, like the word cloud, it's a commendable project. So I'm quite enthusiastic about what they tried to do here. Not every new feature is successful, though.
These are the things I like:
These are things I don't like:
So, I think they did a reasonable job in rethinking the possibilities of word clouds. It's well intentioned and there is room for improvement.
Lastly, they might get some ideas from the Baby Names navigator.
The work of Hans Rosling and Gapminder (now part of Google) highlighted moving images as part of the graphics toolbox. Let me call these "graphlicks", graph-movies. It is clear that lots of people love graphlicks.
There is one open problem in graphlicks that needs creative solutions: how to incorporate memory into the experience?
If a movie is required to show patterns in the data, it would be to highlight a temporal pattern - the changes over time are interesting. As the movie goes from Day/Month/Year 1 to Day/Month/Year X, the old stuff is usually taken off the canvass to make way for the newer stuff. In effect, we rely on the reader's memory compared on the current scene in the movie.
What gets me thinking about this is a graphlick created by my friend Adam, whose startup Empirasign compiles and markets data on mortgage prices and other financial data:
Youtube link here.
The data relates to 30-year mortgages originated in 2010. The coupon rate shown on the horizontal axis ranges from below 4% to 8%, which are the cash flows an investor gets. Each line chart shows how the "market" was valuing the 30-year mortgages of different coupon rates on a particular day. The price is an index, equalling 100 at issue.
The general shape of the line indicates that the market valued the higher-coupon ones more than the lower-coupon ones (except for the right tip of the line). Since interest rates have been coming down, the mortgages issued at 4% coupon were newer ones than those issued at 7-8%, which means they had higher "duration risk" for investors, thus lower value. The dip beyond 7% may be due to a countervailing "prepayment risk": if the debtholder prepays, the investor would be forced to take 100 for something they may have paid over 100.
As you play the graphlick, two features of the data ought to stand out: the general shift upwards of the line which indicates that the market was increasing the valuation of these mortgages over the year (regardless of coupon); also the stronger volatility on the left-side of the line.
Noticing either feature requires the reader to remember the trajectory of the lines. What are some ways to help the reader?
The Trifecta checkup requires us to align all three aspects to make a great chart. It is sometimes the case that a wise choice has been made regarding the type of chart, but the other elements are missing. Reader Parker S. sent in an example of such a chart.
This chart created by ESPN illustrates the evolution of the "power ranking" of the San Diego Chargers football team within each 18-week-long season and across multiple years.
Parker couldn't figure out the practical question this chart is supposed to answer (the top corner of the Trifecta).
It seems to me that the more interesting question is how different teams fare from week to week within a given season, rather than how one team fared from week to week over consecutive seasons.
In fact, one of the secrets of the Bumps chart -- the reason why it feels far less cluttered than it has the right to be -- is that no two data points will overlap, that is, for any given week, only one team occupies any particular rank. This simple rule is violated when the same team's rank across multiple seasons is plotted, and thus the chart feels very busy.
It proves impossible to find a source of ESPN power rankings that has all teams for a given season. However, I found something similar at CBS Sportsline, a competitor. Here is their version of the ranking chart:
They got the practical question right but severely under-utilized the form. We can see how the Chargers season is going but have no ability to compare them to other teams.
We can start with the question of visualizing how Chargers and their AFC West compatriots are doing relative to the rest of the league:
The AFC West is a mediocre division this season, with all four teams in the middle of the pack, none in the top quarter of the table. The Chargers started high, plunged and are recovering while the Oakland Raiders have improved over the course of the season.
The Bumps chart is more powerful when the full set of data is plotted, and when the lines are highlighted with reference to the question being answered. Are AFC teams or NFC teams doing better?
The next one highlights the teams that earned the largest change in ranking from week 1 to week 10. The background (gray lines) consists of those teams whose rankings in Week 10 were within 5 places of their initial rankings.
The practical question might be whether Week 1 rankings are a good predictor of Week 10 rankings. The following chart shows that most teams in the top quartile remain there (except San Diego which is coming back, and Dallas which could be coming back too), the bottom-quartile teams also tend to remain there, while not surprisingly, the middle teams don't tend to stay in the middle. The color scheme should be reversed if one wants to highlight the dispersion of the rankings of these middle teams by Week 10.
My dislike of donut charts has been well documented. Click here.
What I want to discuss is the use of interactivity, a feature of this chart but something that backfires. The underlying data is a 5-level rating of "corporate sentiment" by industry, by country, and over time. That would be 4 dimensions jostling for space on a surface. Obviously, some decisions have to be made as to which dimension to highlight and which to push to the background.
This chart highlights the 5-level ratings using the donut device. All other dimensions are well hidden by the interactive feature. Pressing on the forward/backward buttons reveals the industry dimension. Pressing on the arrow on the top left corner reveals the time dimension. Pressing on the map reveals the country dimension.
The problem with this level of detachment is that readers are obstructed from viewing multiple dimensions at once. For instance, it is very hard to understand the differences in sentiment between different industries, or between different countries, or the change in sentiment over time.
The version on the right shows, for instance, the distribution of ratings by industry for Q3 2010, and for all Asia combined. This is a rough sketch, and one would want to fix quite a few things: making the sector labels horizontal, reducing the distance between the columns, labeling the ratings 1 as "very positive", ordering the sectors from most positive to least positive, etc.
A chart of ratings by country (aggregate of all industry sectors) would follow the same format. Similarly, one can compare ratings across countries, for a given sector... and this can be replicated 11 times for each sector. Similarly, ratings across industries for any given country.
For comparisons across time, I'd suggest using average ratings rather than keeping track of five proportions. This reduces a lot of clutter that does not improve readers' comprehension of the trends. A line chart would be preferred.
A better way to organize the chart is to start with the types of questions that the reader is likely to want to answer. Clicking on each question (say, compare ratings across industries within a country) would reveal one of the above collections of charts.
Another improvement is to add annotations. For instance, one wonders whether the airlines colluded to all give a 2 rating. It is always a great idea to direct readers' attention to the most salient parts of a chart, especially if it contains a lot of data.
I look at a fair number of online videos, especially those embedded on blogs. But I haven't seen this feature implemented broadly. It is a wow feature.
Look at the dots above the progress bar: they tell you what topic is being discussed and allow you to jump back and forth between segments. (the particular dot I moused over said "Randy Moss") The video I saw came from this link.
This simple-looking feature is immensely useful to users. You can efficiently search through the audio file and find the segments you're interested in. It's like bookmarks students might put on pages of a textbook for easy reference, except these are audio bookmarks.
Why isn't this feature more prevalent? I think it's because of the amount of manual effort needed to set this up. Imagine how the data has to be processed. In the digital age, the audio file is a bunch of bits (ones and zeroes) so no computer or humans will be able to identify topics from data stored in that way. So, someone would need to listen to the audio file, and mark off the segments manually, and tag the segments. Then, the audio bookmarks can be plotted on the progress bar... basically a dot plot with time on the horizontal axis.
In theory, you can train a computer to listen to an audio file and approximate this task. The challenge is to attain the required accuracy so you don't need to hire an army of people to correct mistakes.
A very simple concept but immensely functional. Great job!
Bill Zeller, a PhD student at Princeton, sent me the link to his project "graph your inbox", that is an attempt to visualize the "data" in your Gmail account.
Seems to me that it acts as a sophisticated "search my mail" engine. The most interesting part is the ability to click on a point or a bar in one of the charts, and have the corresponding emails show up in the preview panel. This interactive ability is also available in the modern commercial graphing packages, and they are extremely useful for data exploration.
Technically, this is a compelling achievement. The amounts of data being processed, organized, summarized, plotted.
I think he needs to figure out some compelling use cases for something like this. Can you help? How would you use this capability if it is available?
I am happy to provide the following review of this interesting book by Martin and Simon, who are readers of Junk Charts. Martin also publishes a blog, and he's the one who has created bumps charts for the Tour de France races (which also appear in the book).
Interactive Graphics for Data Analysis is an advanced book written by two researchers who have deep experience developing graphics software. People who like to go beyond the basics will find it a useful addition to the literature.
To give you an idea of the level of sophistication, just in Chapter 1 (titled Interactivity), the two authors utilize set operations, SQL statements, and parallel coordinate plots. They assume you have some sense of what those are. That said, those sections can be skipped without interrupting the flow of the book.
The following key messages from these authors are worth repeating:
The book is divided into two sections: Principles and Examples. The second half, the Examples section, consists of case studies in which the authors show examples of how to investigate the structure of a given data set.
The example of using the fatty-acid contents of Italian olive oils to deduce their regional origin is a good visualization of how the statistical technique of classification trees work. Here is the telling diagram:
Notice that data with the same color are oils from the same region, the rectangular sections are results of the statistical classification procedure, and we would like to see most (if not all) of the data within each section having the same color.
Without a doubt, graphics designers should be aware of the issues raised by these authors. The book appears to be written for students who are creating statistical software (complete with end-of-chapter exercises.) I'm left wondering what users of graphics software can do with this information because much of this material relates to the design of graphics software. Knowing these issues makes you want to do things the software may not be designed to do efficiently. For example, most software packages I have used do not have a simple toggle to sort categorical variables by various means (alphabetical, increasing or decreasing frequency, increasing or decreasing value of another variable, etc.).