Attractive, interactive graphic challenges lazy readers

The New York Times spent a lot of effort making a nice interactive graphical feature to accompany their story about Uber's attempt to manipulate its drivers. The article is here. Below is a static screenshot of one of the graphics.

Nytimes_uber_simulation

The illustrative map at the bottom is exquisite. It has Uber cars driving around, it has passengers waiting at street corners, the cars pick up passengers, new passengers appear, etc. There are also certain oddities: all the cars go at the same speed, some strange things happen when cars visually run into each other, etc.

This interactive feature is mostly concerned with entertainment. I don't think it is possible to infer either of the two metrics listed above the chart by staring at the moving Uber cars. The metrics are the percentage of Uber drivers who are idle and the average number of minutes that a passenger waits. Those two metrics are crucial to understanding the operational problem facing Uber planners. You can increase the number of Uber cars on the road to reduce average waiting time but the trade-off is a higher idle rate among drivers.

***

One of the key trends in interactive graphics at the Times is simplication. While a lot of things are happening behind the scenes, there is only one interactive control. The only thing the reader can control is the number of drivers in the grid.

As one of the greatest producers of interactive graphics, I trust that they know what they are doing. In fact, this article describes some comments made by Gregor Aisch, who works at the Times. The gist is: very few readers play with their interactive graphics. Someone else said, "If you make a tooltip or rollover, assume no one will ever see it." I also have heard someone say (hope this is not merely a voice in my own head): "Every extra button or knob you place on the graphic, you lose another batch of readers." This might be called the law of the interactive knob, analogous to the law of the printed equation, in the realm of popular book publishing, which stipulates that every additional equation you print in a book, you lose another batch of readers.

(Note, however, that we are talking about graphics for communications here, not exploratory graphics.)

***

Several years ago, I introduced the concept of "return on effort" in this blog post. Most interactive graphics are high effort to produce. The question is whether there is enough reward for the readers. 

Junkcharts_return_on_effort_matrix


Raining, data art, if it ain't broke

Via Twitter, reader Joe D. asked a few of us to comment on the SparkRadar graphic by WeatherSpark.

At the time of writing, the picture for Baltimore is very pretty:

Sparkradar

The picture for New York is not as pretty but still intriguing. We are having a bout of summer and hence the white space (no precipitation):

Sparkradar_newyork

Interpreting this innovative chart is a tough task - this is a given with any innovative chart. Explaining the chart requires all the text on this page.

The difficulty of interpreting the SparkRadar chart is twofold.

Firstly, the axes are unnatural. Time runs vertically, defying the horizontal convention. Also, "now" - the most recent time depicted - is at the very bottom, which tempts readers to read bottom to top, meaning we are reading time running backwards into the past. In most charts, time run left to right from past to present (at least in the left-right-centric part of the world that I live in.)

Location has been reduced to one dimension. The labels "Distance Inside" and "Distance from Storm" confuse me - perhaps those who follow weather more closely can justify the labels. Conventionally, location is shown in two dimensions.

The second difficulty is created by the inclusion of irrelevant data (aka noise). The square grid prescribes a fixed box inside which all data are depicted. In the New York graphic, something is going on in the top right corner - far away in both time and space - how does it help the reader?

***

Now, contrast this chart to the more standard one, a map showing rain "clouds" moving through space.

Bing_precipitationradar_baltimore

(From Bing search result)

The standard one wins because it matches our intuition better.

Location is shown in two dimensions.

Distance from the city is shown on the map as scaled distance.

Time is shown as motion.

Speed is shown as speed of the motion. (In SparkRadar, speed is shown by the slope of imaginary lines.)

Severity is shown by density and color.

Nonetheless, a panel of the new charts make great data art.

 

 


Visualizing survey results excellently

Surveys generate a lot of data. And, if you have used a survey vendor, you know they generate a ton of charts.

I was in Germany  to attend the Data Meets Viz workshop organized by Antony Unwin. Paul and Sascha from Zeit Online presented some of their work at the German publication, and I was highly impressed by this effort to visualize survey results. (I hope the link works for you. I found that the "scroll" fails on some platforms.)

The survey questions attempted to assess the gap between West and East Germans 25 years after reunification.

The best feature of this presentation is the maintenance of one chart form throughout. This is the general format:

Zeit_workingmum_all

 

The survey asks whether working mothers is a good thing or not. They choose to plot how the percent agreeing that working mothers is good changes over time. The blue line represents the East German average and the yellow line the West German average. There is a big gap in attitude between the two sides on this issue although both regions have experienced an increase in acceptance of working mothers over time.

All the other lines in the background indicate different subgroups of interest. These subgroups are accessible via the tabs on top. They include gender, education level, and age.

The little red "i" conceals some text explaining the insight from this chart.

Hovering over the "Men" tab leads to the following visual:

Zeit_workingmum_men

Both lines for men sit under the respective average but the shape is roughly the same. (Clicking on the tab highlights the two lines for men while moving the aggregate lines to the background.)

The Zeit team really does an amazing job keeping this chart clean while still answering a variety of questions.

They did make an important choice: not to put every number on this chart. We don't see the percent disagreeing or those who are ambivalent or chose not to answer the question.

***

Like I said before, what makes this set of charts is the seamless transitions between one question and the next. Every question is given the same graphical treatment. This eliminates learning time going from one chart to the next.

Here is one using a Likert scale, and accordingly, the vertical axis goes from 1 to 7. They plotted the average score within each subgroup and the overall average:

Zeit_trustparliament

Here is one where they combined the top categories into a "Bottom 2 Box" type metric:

Zeit_smoking

***

Finally, I appreciate the nice touch of adding tooltips to the series of dots used to aid navigation.

Zeit_dotnavigation

The theme of the workshop was interactive graphics. This effort by the Zeit team is one of the best I have seen. Market researchers take note!

 


Putting a final touch on Bloomberg's terrific chart of social movements

My friend Rhonda D. wins a prize for submitting a good chart. This is Bloomberg's take on the current Supreme Court case on gay marriage (link). Their designer places this movement in the context of prior social movements such as women's suffrage and inter-racial marriage.

Bloomberg_pace_socialchange

Previously, I mentioned New York Times' coverage using "tile maps." While the Times places geography front and center, Bloomberg prefers to highlight the time scale. (In the bottom section of Bloomberg's presentation, they use tile maps as well.)

These are the little things I love about the graphic shown above:

  • The very long time horizon really allows us to see our own lifetime as a small section of the history of the nation
  • The gray upper envelope showing the size of the union is essential background data presented subtly
  • The inclusion of "prohibition" representing a movement that failed (I wish they had included more examples of movements that do not succeed)
  • The open circle and arrow indicators to differentiate between ongoing and settled issues

They should have let the movements finish by connecting the open circles to the upper envelope. Like this:

Redo_bloomberg_pace_socialchange_added2

This makes the steepness of the lines jump out even more. In addition, it makes a distinction between the movements that succeeded and the movement that failed. (Prohibition was repealed in 1933. The line between 1920 and 1933 could be more granular if such data are available.)

 


Observing Rosling’s Current Visual Style

On the sister blog, I wrote about Hans Rosling’s recent presentation in New York (link). I noted that Rosling has apparently simplified his visual palette.

Rosling is best known as the developer of the Gapminder tool, used to visualize global social statistics data collected by national statistical agencies. I wrote favorably about this tool in a series of posts (link). Gapminder made popular the moving bubble chart, although not the only graphical form present.

Gapminder_screengrab

These animated bubble charts also made Rosling a YouTube star (See here.)

***

In last week’s presentation, Rosling only showed one moving bubble chart. The rest of his graphics are noticeably simpler, something that anyone can produce on Excel or Powerpoint. Here is one example:

Image1
 

I’m particularly impressed by a simple sequence of charts in which Rosling explains the demographic changes the world is expecting to see in the next 50 to 100 years.

  Image2

This is an enhanced area chart. Each slice of area is subdivided into stick figures so that an axis for population counts becomes unnecessary.

Instead, the reader sees two useful dimensions: region of the world, and age group.

How the population ages as it grows is the feature story and the effect of aging is ingeniously portrayed as layers. This becomes apparent as Rosling lets time roll forward, and the layers literally walk off the page. (Unfortunately, I couldn't capture each step fast enough.)

Image3

 (This photo courtesy of Daniel Vadnais.)

When Rosling showed the 2085 projection, we find that the entire rectangle has filled up, so the world population has definitely grown, roughly by 30 percent. The growth happens by filling up of adults; the total number of children has not changed. This is one of the key insights from recent demographic data. The first photo above shows something remarkable: the fertility rate in Asian countries has plunged to about the same level of developed countries already.

***

This set of charts is unusually effective. It represents another level of simplification in visual means. At the same time, the message is sharpened.

As I reported the other day (link), Rosling does not believe modern tools have improved data analysis. This talk which utilized simple tools is a good demonstration of his point.


An uninformative end state

This chart cited by ZeroHedge feels like a parody. It's a bar chart that doesn't utilize the length of bars. It's a dot plot that doesn't utilize the position of dots. The range of commute times (between city centers and airports) from 18 to 111 minutes is compressed into red/yellow/green levels.

20141124_Air4

ZeroHedge got this from Bloomberg Businessweek, which has a data visualization group so this seems strange. The project called "The Airport Frustration Index" is here.

It turns out the above chart is a byproduct of interactivity. The designer illustrates the passage of time by letting lines run across the page. The imagery is that of a horse race. This experiment reminds me of the audible chart by New York Times (link).

The trick works better when the scale is in seconds, thus real time, as in the NYT chart. On the Businessweek chart, three different scales are simultaneously in motion: real time, elapsed time of the interactive element, and length of the line. Take any two airports: the amount of elapsed time between one "horse" and the other "horse" reaching the right side is not equal to the extra time needed but a fraction of it--obviously, the designer can't have readers wait, say, 10 minutes if that was the real difference in commute times!

Besides, the interactive component is responsible for the uninformative end state shown above.

***

Now, let's take a spin around the Trifecta Checkup. The question being asked is how "painful" is the commute from the city center to the airport. The data used:

Bw_commuteairport_def

Here are some issues about the data worth spending a moment of your time:

In Chapter 1 of Numbers Rule Your World (link), I review some key concepts in analyzing waiting times. The most important concept is the psychology of waiting time. Specifically, not all waiting time is created equal. Some minutes are just more painful than others.

As a simple example, there are two main reasons why Google Maps say it takes longer to get to Airport A than Airport B--distance between the city center and the airport; and congestion on the roads. If in getting to A, the car is constantly moving while in getting to B, half of the time is spent stuck in jams, then the average commuter considers the commute to B much more painful even if the two trips take the same number of physical minutes.

Thus, it is not clear that Google driving time is the right way to measure pain. One quick but incomplete fix is to introduce distance into the metric, which means looking at speed rather than time.

Another consideration is whether the "center" of all business trips coincides with the city center. In New York, for instance, I'm not sure what should be considered the "city center". If all five boroughs are considered, I heard that the geographical center is in Brooklyn. If I type "New York, NY" into Google Maps, it shows up at the World Trade Center. During rush hour, the 111 minutes for JFK would be underestimated for most commuters who are located above Canal Street.

I'd consider this effort a Type DV.

 


A promising infographic about motorcycle helmets

The New York Times graphics team shows us how to do infographics poster the right way. They recently put up a feature showing how the repeal of helmet laws is linked to increasing vehicle fatalities. The graphic is here.

One of the key charts is this one (second to last screen):

Nyt_motorcycle_fl_2

The graphic tells the story, no additional words are needed. (Actually, you'd have to come from the prior page to know that the white vertical line represented the year in which Florida repealed its helmet law.)

Of course, one state does not prove a trend. It appears that other states face the same situation. It would be nicer if they could start this next chart at an earlier time.

Nyt_motorcycle_txar

I'm surprised by how much these lines fluctuate given that the raw counts are in the hundreds.

I wonder if there is any active debate in Florida or elsewhere as it would appear that the helmet law repeal may have caused hundreds of unnecessary deaths. Have people been coming up with other explanations for the sharp rise in motorcycle fatalities involving those not wearing helmets?


Beautiful spider loses its way

On Twitter, Andy C. (@AnkoNako) asked me to look at this pretty creation at NFL.com (link).

Nfl_spiderweb

There is a reason why you don't read much about spider charts (web charts, radar charts, etc.) here. While this chart is beautifully constructed, and fun to play with, it just doesn't work as a vehicle for communication.

This example above allows us to compare four players (here, quarterbacks) on eight metrics. Each white polygon represents one player, and the orange outline represents the league average quarterback. 

What are some of the questions one might have about comparing quarterbacks?

  • Who is the best quarterback, and who is the worst?
  • Who is the better passer? (ignoring other skills, like rushing ability)
  • Is each quarterback better or worse than the average quarterback?

How will you figure these out from the spider chart?

  • Not sure. The relative value of the quarterbacks is definitely not encoded in the shape of the polygon, nor the area. To really figure this out, you'd need to look at each of the eight spokes independently, and then aggregate the comparisons in your head. Unless... you are willing to ignore seven of the eight metrics, and just look at passer rating (below right).
  • Focusing on passing only means focusing on five of the eight metrics, from pass attempts to interceptions. How do you combine five metrics into one evaluation is your own guess.
  • One can tell that Joe Flacco is basically the average quarterback as his contour is almost exactly that of the average (orange outline). Are the others better or worse thean average? Hard to tell at first glance.

***

There are a number of statistical points worth noting.

First, the chart invites users to place equal emphasis on each of the eight dimensions. (There is a control to remove dimensions.) But the metrics are clearly not equally important. You certainly should value passing yards more than rushing yards, for example.

Second, the chart ignores the correlation between these eight metrics. The easiest way to see this is the "Passer Rating", which is a formula comprising the Passing Attempts, Passing Completions, Interceptions, Touchdown Passes, and Passing Yards. Yes, all those five components have been separately plotted. Another easy way to see the problem is that Passing Yards are highly correlated with Passing Attempts or Passing Completions.

Third, the chart fails to account for different types of quarterbacks. I deliberately chose these four because Joe Flacco was a starter, Tyrod Taylor was a backup who almost never played, while at San Francisco, Alex Smith and Colin Kaepernick shared the starting duties. So for Passing Yards, the numbers were 3817, 179, 1737 and 1814 respectively. Those numbers should not be directly compared. Better statistics are something like yards per minute played, yards per offensive series, yards per plays executed, etc. The way that this data is used here, all the second- and third-string quarterbacks will be below average and most of the starters will be above average.

***

From a design perspective, there are a small number of misses.

Mysteriously, the legend always has only two colors no matter how many players are being compared. The orange is labeled Average while the white is labeled "Leader". I have no idea why any of the players should be considered the "Leader".

The only way to know which white polygon represents which player is to hover on the polygon itself. You'll notice that in my example, several of those polygons overlap substantially so sometimes, hovering is not a task easily accomplished.

The last issue is scale. Turns out that some of the metrics like interceptions, touchdown passes, rushing yards, etc. can be zeroes. Take a look at this subset of the chart where I hovered on Tyrrod Taylor.

Nfl_spider_zeroesDo you see the problem? The zero point is definitely not the center of the circle. This problem exists for any circular charts like bubble charts.

Now look at Interceptions. Because the scale is reverse (lower is better), the zero point of this metric will lie on the outer edge of the circle. This is a vexing issue because the radius is open-ended on the outside but closed-ended on the inside.

***

In the next post, I will discuss some alternative presentation of this data.


Stutter steps, and functional legends

Dona Wong asked me to comment on a project by the New York Fed visualizing funding and expenditure at NY and NJ schools. The link to the charts is here. You have to click through to see the animation.

Nyfed_funding

Here are my comments:

  • I like the "Takeaways" section up front, which uses words to tell readers what to look for in the charts to follow.
  • I like the stutter steps that are inserted into the animation. This gives me time to process the data. The point of these dynamic maps is to showcase the changes in the data over time.
  • I really, really want to click on the green boxes (the legend) and have the corresponding school districts highlighted. In other words, turning the legend into something functional. Tool developers, please take notes!
  • The other options on the map are federal, state and local shares of funding, given in proportions. These are controlled by the three buttons above. This is a design decision that privileges showing how federal funds are distributed across districts and across time. The tradeoff is that it's harder to comprehend the mix of sources of funds within each district over time.
  • I usually like to flip back and forth between actual values and relative values. I find that both perspectives provide information. Here, I'd like to see dollars and proportions.

I also find the line charts to be much clearer but the maps are more engaging. Here is an example of the line chart: (the blue dashed line is the New York state average)

Nyfed_linechart

After looking at these charts, I also want to see a bivariate analysis. How is funding per student and expenditure per student related?

Do you have any feedback for Dona?