« January 2016 | Main | March 2016 »

A puzzle brought to you by the NY Times

Here is a map that attracted my attention on the NY Times (link):


The counties are given shades of blue with darker shades meaning more economic distress. According to the label (Newark), the 10 red dots are the top 10 most distressed large cities in the U.S. It appears that almost all of these cities are in regions of light blue shade. That's the puzzle.


A separate issue with this map is that it presents a static image of distress. The first paragraph of the article states: "As the most prosperous communities
in the United States have gotten richer since the end of the Great Recession in 2009, economic conditions in many distressed areas have deteriorated even further.
" It would be great if we can see a before and after 2009 comparison.



A quick lesson in handling more than one messages on one chart

Between teaching two classes, and a seminar, and logging two coast-to-coast flights, I was able to find time to rethink the following chart from the Wall Street Journal: (link to article)


I like the right side of this chart, which helps readers interpret what the alcohol consumption guidelines really mean. When we go out and drink, we order beers, or wine, or drinks - we don't think in terms of grams of alcohol.

The left side is a bit clumsy. The biggest message is that the UK has tightened its guidelines. This message is delivered by having U.K. appear twice in the chart, the only country to repeat. In order to make this clear, the designer highlights the U.K. rows. But the style of highlighting used for the two rows differs, because the current U.K. row has to point to the right side, but not the previous U.K. row. This creates a bit of confusion.

In addition, since the U.K. rows are far apart, figuring out how much the guidelines have changed is more work than desired.

The placement of the bars by gender also doesn't help. A side message is that most countries allow men to drink more than women but the U.K., in revising its guidelines, has followed Netherlands and Guyana in having the same level for both genders.


After trying a few ideas, I think the scatter plot works out pretty well. One advantage is that it does not arbitrarily order the data men first, women second as in the original chart. Another advantage is that it shows the male-female balance more clearly.


An afterthought: I should have added the words "Stricter", "Laxer" on the two corners of the chart. This chart shows both the U.K. getting stricter but also that it joins Guyana and Netherlands as countries which treat men and women equally when it comes to drinking.



Where I will be in the next few weeks

It's awfully quiet here lately as I am trying to manage a tight schedule. The problem with a tight schedule is the absence of "slack." Without slack, just one little unexpected event ruins your schedule. Like dominoes, everything gets pushed back. That event arrived in drips and drabs a couple of weeks ago as a major water leak broke out two floors above my apartment. I am still picking up the pieces.

Last week, I crossed the pond and gave a talk about visual story-telling at the SAS headquarters in UK. The audience was wonderful and the organizers assembled a great crowd. The event was streamed live to over a thousand viewers all across Europe. Thanks for attending!

Here's me pointing to one of the charts in my presentation:


In the next few weeks, people in the U.S. have a chance to hear a similar presentation. Please come meet me and let me know you read my blog!

Los Angeles, 2/24, 9 a.m. Free registration here

Denver, 3/17, 9 a.m. Free registration here

New York City, 3/24, 9 a.m. Free registration here


In addition, I will be speaking about the ethics of data science at the INFORMS Analytics Conference, in April, in Orlando. The talk will be followed by a panel discussion.

On a related note, rSQUAREedge is hosting a webinar next week by Augustine Fou, who is a digital advertising fraud investigator. This is also free. Fou will talk about the techniques he uses to uncover "bad" data. In this case, "bad" data are data inserted by adversaries to inflate statistics. This is one of the unspoken, and worrisome issues in modern data analysis. One can be very naive in assuming that the observational, "found" data are free from manipulation.




Showing three dimensions using a ternary plot

Long-time reader Daniel L. isn't a fan of this chart, especially when it is made to spin, as you can see at this link:


Like other 3D charts, this one is hard to read. The vertical lines are both good and bad: They make the one dimension very easy to read but their very existence makes one realize the challenges of reading the other dimensions without guidelines.

This dataset allows me to show a ternary plot. The ternary plot is an ingenious way of putting three dimensions onto a flat surface. I have found few good uses of this chart type, though.


Let's get to the core of the issue: the analyst started with 25 skills that are frequently required by data science and analytics jobs, and his goal is to classify these skills into three groups. The underlying method used to create these groups is factor analysis.

Each dot above is a skill. The HQ of each grouping of skills (known as a factor) is a corner of the plot. The closer the dot is to the corner, the more relevant that skill is to the skill group.

In the above chart, I highlighted four skills that are not clearly in one or another skill group. For example, Commuication straddles the Math/Stats and Business dimensions but scores lowly on the Technology/Programming dimension.


The ternary plot has a few problems. Like any scatter plot, once you have 10 or more dots, it is hard to fit all the data labels. Further, the axis labels must be carefully done to help readers understand the plot. 

Before long, the chart looks very cluttered. There just isn't enough room to get all your words in. Here is another version of the same chart -- wiht a different set of annotation.


Instead of drawing attention to those skills that have no clear home, this version of the chart focuses on the dots close to each corner.

In two cases, I classified two of the skills differently from the original. The Machine Learning skill is part of Math/Stats on my charts but it is part of Technology/Programming on the original.

The ternary plot is interesting and unusual but is only useful in selected problems.