An amazing amount of data is being visualized here. Mousing on the mapwill pick up the specific data for each county. There is a bar up top for discovering the evolution over time. It would be great if there is an animation button so the map can be played out without clicking. An animated gif will also do (similar to the disease map we featured some time ago).
The colors on the first map represent the origin of the top ethnic group in each county. Within each group, the tint of the color further displays the percentage of the population that group accounts for. The subgroups appear to be 0-2%, 2-5%, over 5%. The last subgroup is very wide.
Not so keen on the second map with all those bubbles. They show the number of people from each country by county. The bubble size is proportional to population. Every version of this map looks the same because the population is concentrated in the cities and the interior is sparsely populated, no matter what ethnic group.
Regardless, this is another laudable effort by the crew at theTimes.
Reference: "Immigration Explorer", New York Times, March 10 2009.
Jens, a long-time reader, tried to re-make the boring data tables used to report poll data. Here is an example from USA Election Polls (left) and his enhanced version (right).
Like Jens, I find most of the tabular presentation of poll data underwhelming. Too much data hiding all the useful information. For example, the pollster and polling date data provide a context for super-serious poll watchers to interpret the data; however, they do not present themselves in a way that actually help readers. Read further for versions that bring out this data much better.
Meanwhile, Jens' revision uses color and ordering to bring out the current state of affairs. The addition of electoral votes allows us to understand the relative weight of each row, countering the weakness of the tabular format, that each row has the same height, implying erroneously that they have the same importance.
There are a number of good web-sites where this type of data is presented in attractive ways.
I have been a fan of Political Arithmetik, which made great use of the pollster and polling date data mentioned above. Those data have been averaged to show the overall trend while the individual poll results are plotted as dots in the background. The polling date data is embedded in the horizontal positions of the dots. Even more impressively, the margins of error are presented. Remarkably, this race has been a statistical tie for all these months, the 95% lower limit never quite making it above the zero level.
Another great site is fivethirtyeight.com. Below, they essentially turned Jen's enhanced table into a map. The legend on the right perhaps represents what they call "East Coast bias"? All of Nathan's graphs are very attractively produced; I just wish he'd put more labels on them (such as the differentials corresponding to shades of red and blue.)
You have to gradually build up your gut by
eating larger and larger amounts of food, and then be sure to work it
all off so body fat doesn't put a squeeze on the expansion of your
stomach in competition -- Takeru Kobayashi, six-time champion of the Coney Island hot dog eating contest
Kobayashi is a phenom. He can stuff 60 hot dogs or 100 burgers in ten or twelve minutes and show no consequences. Ordinary people can't hope to emulate these feats.
Junk Charts sees Kobayashi as a hero; an anti-hero really. We are ordinary people; we can't hope to cram it like Koby. A message we keep repeating here is: too much data sinks a chart.
Not long after this chart showed up in the Economist, several readers urged us to take a look. It's a well-nourished chart indeed, one to challenge Kobayashi, but for all that it contains, the reader has to try very hard to find insights. What with the multiple colors, iron-fisted gridlines, above-and-below boxes, dotted and solid lines, and a legend with nine pieces split in two spots? Besides, the U.S. boxes grab all the attention by virtue of them being wider (country being more partisan).
The key to unraveling this chart is to identify the relevant comparisons:
UK average vs US average
UK left vs US left
UK right vs US right
UK independent vs US independent
And then for the gluttonous:
UK right vs US left
UK left vs independent vs right
US left vs independent vs right
In the junkchart version, we address these comparisons sequentially.
(Apologies for the tiny font.)
We are again using a small multiples approach that places four comparisons next to each other: average, left, independent, right. Consistently, the British is to the left of Americans. The only places where the two cultures meet are where liberals agree on "ideology" and "military action".
Also note that we use a symmetric horizontal scale centered at 0. There are too many charts out there where the center is not at the center!
A similar presentation addresses the other three comparisons. Democrats in the U.S. are miles to the right of Tories in terms of "religion". In the UK, Labor and Tories are not much different except on "ideology". In the US, Independents lean closer to Democrats.
Joining the lines (I hear the grumbles) helps bring out the gap between the groups being compared. Without lines, the chart would look like this.
It is often hard to keep track of which dot is which as they trade order from issue to issue.
PS. Anyone knows what is being measured on the horizontal axis? The original graph mysteriously stated "respondents' views".
Eric Talmadge: "Pigout champion Kobayashi limbers up for hot dog gold" June 25, 2004
Reader Daniel sent us a great example of how even little things matter a lot in chart-making. The left chart is the original. The right chart (created by Daniel) shuffled the order of the legend to match the curves, and spaced them out. All of a sudden, the chart is much easier to read.
Like any technology, charts also come with peripherals: I'm talking about legends, data labels, grid-lines and so on. These things typically give us the most trouble, especially with complex data sets. The analogy is apt: one may feel inextricably knotted up like bunches of cords and wires.
Interactive graphics is a particularly elegant solution to this problem, and Google Finance has done a fantastic job leading the way. One trick is to show the legend only when the user asks for it. Using bar charts (on the left), Google summarizes neatly the performance of stocks within each industry sector. The bar chart gives a sense of the dispersion which adds to the average returns printed next to them. For example, most sectors gained on average but then about 30% of the individual stocks in most sectors actually declined on that day. So the fact that technology stocks gained 0.48% on average doesn't necessarily mean that the two tech stocks you own gained 0.48% or gained at all.
Typically, we would put a legend on the side or at the bottom of the chart, which all be told, is an ugly duckling next to a well-executed chart. Here, the legend is hidden behind the "What's this?" link. The side benefit is that the legend can be as verbose as needed since it doesn't interfere with the chart.
There are a few minor things to consider:
"What's this?" is not very informative: Why not call it a "legend" or "key"?
The graph designer seems to think that the most important information sought by readers was the extremes, i.e. the percentage of stocks that gained/lost more than 2%. By darkening the sides of the bar, it draws attention away from the middle which is the boundary between the gainers and the losers. I'd like to see that boundary delineated.
Similar to the above point, I'd sketch out a version which aligns the gainer/loser boundary to the middle so it's easy to see the balance between gainers and losers. This version however would require more space
I'd provide sorting by average return, and by percentage of gainers
I've been reading my friend's anti-smoking tome, and traced this "infographic" back to its source (World Health Organization).
I was very intrigued by the "lines of death" which seemed to make the point that the risk of death had a spatial correlation: specifically, that the death risk for male smokers was higher in northern hemisphere (above the line), primarily developed countries, as compared to the southern hemisphere, mostly developing nations.
I find that somewhat counter-intuitive but in a fascinating book like this, that brings together both scientific, psychological and societal commentary, I was expecting to learn new things.
Looking at the legend, the red areas were regions in which deaths from tobacco use accounted for over 25% of "total deaths among men and women over 35". This explained some, as perhaps there were more reasons to die (warfare, other diseases, mine accidents, etc.) in developing nations than in developed nations, or that they had larger populations (so more deaths even at lower rates).
However, the description of the "lines of death" raised my eyebrows. It is now claimed that more than 25% of middle-aged people (35-69 years old) die from tobacco use in the red regions.
Did they mean 25% of the dead middle-aged people die from smoking? Or 25% of all middle-aged folks die from smoking? A gigantic difference!
Percentages are very tricky things to use. Every time I see a percentage, the first thing I ask is what is the base population. Here, the baseline appeared to have gotten lost in translation.
This set of maps also shows the peril of focusing too much on entertainment value, and losing the plot.
For those concerned about the effect of smoking on our society and our children, I highly recommend Dr. Rabinoff's highly readable new book, "Ending the tobacco holocaust". It contains lots of interesting tidbits and really brings together every cogent argument that exists, including the common ones you've heard and others you haven't.