« September 2018 | Main | November 2018 »

Webinar Wednesday

Lyon_onlinestreaming


I'm delivering a quick-fire Webinar this Wednesday on how to make impactful data graphics for communication and persuasion. Registration is free, at this link.

***

In the meantime, I'm preparing a guest lecture for the Data Visualization class at Yeshiva University Sims School of Management. The goal of the lecture is to emphasize the importance of incorporating analytics into the data visualization process.

Here is the lesson plan:

  1. Introduce the Trifecta checkup (link) which is the general framework for effective data visualizations
  2. Provide examples of Type D data visualizations, i.e. graphics that have good production values but fail due to issues with the data or the analysis
  3. Hands-on demo of an end-to-end data visualization process
  4. Lessons from the demo including the iterative nature of analytics and visualization; and sketching
  5. Overview of basic statistics concepts useful to visual designers

 


Plotted performance guaranteed not to predict future performance

On my flight back from Lyon, I picked up a French magazine, and found the following chart:

French interest rates chart small

A quick visit to Bing Translate tells me that this chart illustrates the rates of return of different types of investments. The headline supposedly says "Only the risk pays". In many investment brochures, after presenting some glaringly optimistic projections of future returns, the vendor legally protects itself by proclaiming "Past performance does not guarantee future performance."

For this chart, an appropriate warning is PLOTTED PERFORMANCE GUARANTEED NOT TO PREDICT THE FUTURE!

***

Two unusual decisions set this chart apart:

1. The tree ring imagery, which codes the data in the widths of concentric rings around a common core

2. The placement of larger numbers toward the middle, and smaller numbers in the periphery.

When a reader takes in the visual design of this chart, what is s/he drawn to?

The designer evidently hopes the reader will focus on comparing the widths of the rings (A), while ignoring the areas or the circumferences. I think it is more likely that the reader will see one of the following:

(B) the relative areas of the tree rings

(C) the areas of the full circles bounded by the circumferences

(D) the lengths of the outer rings

(E) the lengths of the inner rings

(F) the lengths of the "middle" rings (defined as the average of the outer and inner rings)

Here is a visualization of six ways to "see" what is on the French rates of return chart:

Redo_jc_frenchinterestrates_1

Recall the Trifecta Checkup (link). This is an example where "What does the visual say" and "What does the data say" may be at variance. In case (A), if the reader is seeing the ring widths, then those two aspects are in sync. In every other case, the two aspects are disconcordant. 

The level of distortion is visualized in the following chart:

Redo_jc_frenchinterestrates_2

Here, I normalized everything to the size of the SCPI data. The true data is presented by the ring width column, represented by the vertical stripes on the left. If the comparisons are not distorted, the other symbols should stay close to the vertical stripes. One notices there is always distortion in cases (B)-(F). This is primarily due to the placement of the large numbers near the center and the small numbers near the edge. In other words, the radius is inversely proportional to the data!

 The amount of distortion for most cases ranges from 2 to 6 times. 

While the "ring area" (B) version is least distorted on average, it is perhaps the worst of the six representations. The level of distortion is not a regular function of the size of the data. The "sicav monetaries" (smallest data) is the least distorted while the data of medium value are the most distorted.

***

To improve this chart, take a hint from the headline. Someone recognizes that there is a tradeoff between risk and return. The data series shown, which is an annualized return, only paints the return part of the relationship. 

 

 

 


The French takes back cinema but can you see it?

I like independent cinema, and here are three French films that come to mind as I write this post: Delicatessen, The Class (Entre les murs), and 8 Women (8 femmes). 

The French people are taking back cinema. Even though they purchased more tickets to U.S. movies than French movies, the gap has been narrowing in the last two decades. How do I know? It's the subject of this infographic

DataCinema

How do I know? That's not easy to say, given how complicated this infographic is. Here is a zoomed-in view of the top of the chart:

Datacinema_top

 

You've got the slice of orange, which doubles as the imagery of a film roll. The chart uses five legend items to explain the two layers of data. The solid donut chart presents the mix of ticket sales by country of origin, comparing U.S. movies, French movies, and "others". Then, there are two thin arcs showing the mix of movies by country of origin. 

The donut chart has an usual feature. Typically, the data are coded in the angles at the donut's center. Here, the data are coded twice: once at the center, and again in the width of the ring. This is a self-defeating feature because it draws even more attention to the area of the donut slices except that the areas are highly distorted. If the ratios of the areas are accurate when all three pieces have the same width, then varying those widths causes the ratios to shift from the correct ones!

The best thing about this chart is found in the little blue star, which adds context to the statistics. The 61% number is unusually high, which demands an explanation. The designer tells us it's due to the popularity of The Lion King.

***

The one donut is for the year 1994. The infographic actually shows an entire time series from 1994 to 2014.

The design is most unusual. The years 1994, 1999, 2004, 2009, 2014 receive special attention. The in-between years are split into two pairs, shrunk, and placed alternately to the right and left of the highlighted years. So your eyes are asked to zig-zag down the page in order to understand the trend. 

To see the change of U.S. movie ticket sales over time, you have to estimate the sizes of the red-orange donut slices from one pie chart to another. 

Here is an alternative visual design that brings out the two messages in this data: that French movie-goers are increasingly preferring French movies, and that U.S. movies no longer account for the majority of ticket sales.

Redo_junkcharts_frenchmovies

A long-term linear trend exists for both U.S. and French ticket sales. The "outlier" values are highlighted and explained by the blockbuster that drove them.

 

P.S.

1. You can register for the free seminar in Lyon here. To register for live streaming, go here.
2. Thanks Carla Paquet at JMP for help translating from French.


No Latin honors for graphic design

Paw_honors_2018This chart appeared on a recent issue of Princeton Alumni Weekly.

If you read the sister blog, you'll be aware that at most universities in the United States, every student is above average! At Princeton,  47% of the graduating class earned "Latin" honors. The median student just missed graduating with honors so the honors graduate is just above average! The 47% number is actually lower than at some other peer schools - at one point, Harvard was giving 90% of its graduates Latin honors.

Side note: In researching this post, I also learned that in the Senior Survey for Harvard's Class of 2018, two-thirds of the respondents (response rate was about 50%) reported GPA to be 3.71 or above, and half reported 3.80 or above, which means their grade average is higher than A-.  Since Harvard does not give out A+, half of the graduates received As in almost every course they took, assuming no non-response bias.

***

Back to the chart. It's a simple chart but it's not getting a Latin honor.

Most readers of the magazine will not care about the decimal point. Just write 18.9% as 19%. Or even 20%.

The sequencing of the honor levels is backwards. Summa should be on top.

***

Warning: the remainder of this post is written for graphics die-hards. I go through a bunch of different charts, exploring some fine points.

People often complain that bar charts are boring. A trendy alternative when it comes to count or percentage data is the "pictogram."

Here are two versions of the pictogram. On the left, each percent point is shown as a dot. Then imagine each dot turned into a square, then remove all padding and lines, and you get the chart on the right, which is basically an area chart.

Redo_paw_honors_2018

The area chart is actually worse than the original column chart. It's now much harder to judge the areas of irregularly-shaped pieces. You'd have to add data labels to assist the reader.

The 100 dots is appealing because the reader can count out the number of each type of honors. But I don't like visual designs that turn readers into bean-counters.

So I experimented with ways to simplify the counting. If counting is easier, then making comparisons is also easier.

Start with this observation: When asked to count a large number of objects, we group by 10s and 5s.

So, on the left chart below, I made connectors to form groups of 5 or 10 dots. I wonder if I should use different line widths to differentiate groups of five and groups of ten. But the human brain is very powerful: even when I use the same connector style, it's easy to see which is a 5 and which is a 10.

Redo_paw_honors_2

On the left chart, the organizing principles are to keep each connector to its own row, and within each category, to start with 10-group, then 5-group, then singletons. The anti-principle is to allow same-color dots to be separated. The reader should be able to figure out Summa = 10+3, Magna = 10+5+1, Cum Laude = 10+5+4.

The right chart is even more experimental. The anti-principle is to allow bending of the connectors. I also give up on using both 5- and 10-groups. By only using 5-groups, readers can rely on their instinct that anything connected (whether straight or bent) is a 5-group. This is powerful. It relieves the effort of counting while permitting the dots to be packed more tightly by respective color.

Further, I exploited symmetry to further reduce the counting effort. Symmetry is powerful as it removes duplicate effort. In the above chart, once the reader figured out how to read Magna, reading Cum Laude is simplified because the two categories share two straight connectors, and two bent connectors that are mirror images, so it's clear that Cum Laude is more than Magna by exactly three dots (percentage points).

***

Of course, if the message you want to convey is that roughly half the graduates earn honors, and those honors are split almost even by thirds, then the column chart is sufficient. If you do want to use a pictogram, spend some time thinking about how you can reduce the effort of the counting!