« October 2014 | Main | December 2014 »

An uninformative end state

This chart cited by ZeroHedge feels like a parody. It's a bar chart that doesn't utilize the length of bars. It's a dot plot that doesn't utilize the position of dots. The range of commute times (between city centers and airports) from 18 to 111 minutes is compressed into red/yellow/green levels.

20141124_Air4

ZeroHedge got this from Bloomberg Businessweek, which has a data visualization group so this seems strange. The project called "The Airport Frustration Index" is here.

It turns out the above chart is a byproduct of interactivity. The designer illustrates the passage of time by letting lines run across the page. The imagery is that of a horse race. This experiment reminds me of the audible chart by New York Times (link).

The trick works better when the scale is in seconds, thus real time, as in the NYT chart. On the Businessweek chart, three different scales are simultaneously in motion: real time, elapsed time of the interactive element, and length of the line. Take any two airports: the amount of elapsed time between one "horse" and the other "horse" reaching the right side is not equal to the extra time needed but a fraction of it--obviously, the designer can't have readers wait, say, 10 minutes if that was the real difference in commute times!

Besides, the interactive component is responsible for the uninformative end state shown above.

***

Now, let's take a spin around the Trifecta Checkup. The question being asked is how "painful" is the commute from the city center to the airport. The data used:

Bw_commuteairport_def

Here are some issues about the data worth spending a moment of your time:

In Chapter 1 of Numbers Rule Your World (link), I review some key concepts in analyzing waiting times. The most important concept is the psychology of waiting time. Specifically, not all waiting time is created equal. Some minutes are just more painful than others.

As a simple example, there are two main reasons why Google Maps say it takes longer to get to Airport A than Airport B--distance between the city center and the airport; and congestion on the roads. If in getting to A, the car is constantly moving while in getting to B, half of the time is spent stuck in jams, then the average commuter considers the commute to B much more painful even if the two trips take the same number of physical minutes.

Thus, it is not clear that Google driving time is the right way to measure pain. One quick but incomplete fix is to introduce distance into the metric, which means looking at speed rather than time.

Another consideration is whether the "center" of all business trips coincides with the city center. In New York, for instance, I'm not sure what should be considered the "city center". If all five boroughs are considered, I heard that the geographical center is in Brooklyn. If I type "New York, NY" into Google Maps, it shows up at the World Trade Center. During rush hour, the 111 minutes for JFK would be underestimated for most commuters who are located above Canal Street.

I'd consider this effort a Type DV.

 


Disorganization

Reader Boise-state-facts-figures-2014-updated Aaron W. came across this "Facts and Figures" infographic about Boise State University that seemingly is aimed at alumni of the school. Given that Boise State has a good reputation for analytics, Aaron found it disconcerting to see such a low-quality data graphic. (click on the image to see it in full size).

There are numerous little things to grumble about in each section of the chart. The larger issue though is the overall composition. When assembling a chart like this, it is important to provide a navigation path for readers, whether explicitly or through cues.

It's difficult to discern the organizing principles of this chart. Aaron felt this way: "the total information flow is haphazard, if not entirely incoherent. There is some valuable information here, but at best it gets lost in the shuffle."

For example, some statistics are for undergraduate students only, some are for graduate students, and some are offered in aggregate.

Boisestate_undergradgraduate

Confusion reigns. We learn that the school has total enrollment of 22K students but it's a little math quiz to learn how many are undergraduates. In certain sections, data about faculty members are mixed with those about students.

Not breaking out undergraduates from graduates is a particular problem when presenting demographics, such as age distributions, ethnicity, etc.

Boisestate_age

It's odd to present this distribution of age without remarking that the undergrads are shown on the left and the graduate students are shown on the right.

Then, the sections presenting counts of students, faculty, degrees, etc. overlap with sections presenting financial data.

Boisestate_countdollar

A rethinking of this page should start with identifying the key questions readers would be interested in learning, and then organizing the data to suit those needs.

 

 

 


Circular but insufficient

One of my students analyzed the following Economist chart for her homework.

Economist_book_sales_printversion

I was looking for it online, and found an interactive version that is a bit different (link). Here are three screen shots from the online version for years 2009, 2013 and 2018. The first and last snapshots correspond to the years depicted in the print version.

  Economist_booksales_all

The online version is the self-sufficiency test for the print version. In testing self-sufficiency, we want to see if the visual elements (i.e. the circular sectors on the print version) pull their own weights. The quick answer is no. The reader can't tell how much sales are represented in each sector, nor can they reliably estimate the relative scales of print versus ebook (pink/red vs yellow/orange) or year-to-year growth rates.

As usual, when we see the entire data set printed on the chart itself, it is giveaway that the visual elements are mere ornaments.

The online version does not have labels unless you hover over the hemispheres. But again it is a challenge to learn anything from the picture.

In the Trifecta checkup, this is a Type V chart.

***

This particular dataset is made for the bumps-style chart:

Redo_economistbooksales

 

 

 


A rule-breaking, cliche-defying, punch-carrying chart worthy of the election

This NYT graphic published on the eve of the Senate elections represents the best of data visualization: it carries its message with a punch.

Nytimes_election2014_trending_sm

The link to the web page is here. The graphic proudly occupied the front page of the print edition on Tuesday.

***

This graphic is not cliched. The typical consequence of such a statement is that it has to come with a reader's manual. The beauty of this beauty is that the required manual is compact:

  • The rectangular areas indicate the lack of competitiveness in each race. The extremes are: the entirely filled rectangle is a lock from start to finish; and the completely blank rectangle is a 50/50 tossup from start to finish. The more color, the less competitive the race.
  • Red implies the Republican candidate is projected to be leading at that moment; Blue, the Democrat; and Green, an independent. (The juxtaposition of red and green is one of the few mis-steps here.)

If you stick to the above, you will do fine.

If you start thinking the height of the area is the chance of winning, you run into trouble.

***
Here is a more conventional way to show time-series projections. It is a mirrored line chart, in which one of the two lines is redundant. (This chart shows up elsewhere on the NYT site.)

Forecast_time_trend

To turn this into the other style, draw a line through the 50-percent level, erase everything below 50, and then switch from line to area.

Redo_nytforecasttrend

On the far right, where it says 75%, you can see that it is precisely half-way between 50 and 100 percent. So the new chart breaks the start-at-zero rule for area charts.

Except... this is an ingenious violation of that rule. Like I said, if you are able to get your head around to thinking that the area maps to lack of competitiveness (or, the amount of lead the leader has, regardless of who's leading), and suppress the urge to interpret the areas as the chance of winning, then the axis starting at 50-percent is not a problem. (I'm assuming that most of these races are in essence two-horse races. If there are more than two viable candidates, this particular chart form doesn't work.)

The payoff is a very compact chart that shows a lot of data in a small space. The NH race was a lock for the Democrats at the start bu the lead kept dwindling so that on the eve of the election, the lead has been cut in half. But the halved chance is still 75 percent in favor of the Dems.

Iowa and Colorado both flipped from Democratic to Republican lead around middle of September.

When the visualization is driven well, the readers have an effortless ride.