« October 2018 | Main | December 2018 »

The merry-go-round of investment bankers

Here is the start of my blog post about the chart I teased the other day:



Today's post deals with the following chart, which appeared recently at Business Insider (hat tip: my sister).

It's immediately obvious that this chart requires a heroic effort to decipher. The question shown in the chart title "How many senior investment bankers left their firms?" is the easiest to answer, as the designer places the number of exits in the central circle of each plot relating to a top-tier investment bank (aka "featured bank"). Note that the visual design plays no role in delivering the message, as readers just scan the data from those circles.

Anyone persistent enough to explore the rest of the chart will eventually discover these features...


The entire post including an alternative view of the dataset is a guest blog at the JMP Blog here. This is a situation in which plotting everything will make an unreadable chart, and the designer has to think hard about what s/he is really trying to accomplish.

Report from Data Visualization Meetup

Kristen_bookcoverOn Monday, Principal Analytics Prep sponsored the Data Visualization Meetup, organized by the indefatigable Naomi Robbins. The keynote speaker is NYU professor Kristen Sosulski, who just published a book titled “Data Visualization Made Simple” (link).

At the Meetup, we announced a Part-Time Immersive Program. This allows the completion of the Certified Data Specialist program in three levels on a more relaxed, evening schedule. Level 1 will run two nights a week for 12 weeks, starting Spring 2019. For more details, contact us here.


Kristen, a professor in the Stern School, has an interesting take on the data visualization function – placing it within the larger enterprise. In the first part of her talk, she presents a number of real-world case examples of how data analysts used data visualization to create impact within an organization.

The end goal in each of these projects is a “business insight” that is delivered to decision-makers with the primary goal of persuasion – something I also emphasize in my own seminars. It’s not that data visualization isn’t used for analysis, exploration, and story-telling (see postscript), and so on but at the tail end of the process, the need to persuade becomes paramount.


For example, the graphic on the cover of her book is from a project undertaken by Jet.com, the online retailer purchased by Walmart. The managers are interested in the patterns of purchasing of the customers, and generally views products as “consumables” or “durables,” the latter have lower purchasing frequencies. The nodes in the network graph are colored accordingly. Through the links between these nodes, the analyst concluded that certain products (an example given was batteries) are considered durables but have purchasing patterns that appear more like consumables.

Kristen’s message is how the data turned into a business insight (the “story”) which impressed the managers enough so that they took action by adjusting orders and inventories.

Kristen described other examples such as the use of salary data to place employees into bands, or the use of predictive models to predict which partners in a venture-capital firm will bring in more investment. Many of these examples make me believe that a course of causal reasoning should be required for all data analysts.


The second half of Kristen’s talk addresses how to raise the profile of data visualization within an enterprise. This is a clearly needed discussion. More and more industry jobs are created that are specific to data visualization so these new teams must establish themselves within the corporate culture. Kristen recommends a five-step process, starting with establishing a data practice and ending with measuring one’s impact.

In answering my question about evangelizing new visualization formats to replace inferior existing chart designs, she emphasizes the need to involve stakeholders early in the process. Don't surprise them with something novel during a meeting.

We were pleased that people braved the adverse weather to attend Kristen’s talk, and good pizza was served at the end of the evening.



The word “story-telling” seems to have gone from hero to villain lately. Some commenters are thinking the word “story” implies made-up fiction, and thus oppose its use. A related complaint concerns the “subjectivity” of stories. Once you realize that most of our data sources are observational in nature, you will soon discover that causal reasoning entails the selection of the most plausible story among many. Statisticians and others have come up with causal models, which are sets of equations used to describe relationships between data, but all of these rely on causal assumptions. In essence, they are structured ways to select the most plausible story. It’s dangerous to see these models as “objective.”

Teaser: new post, tomorrow's event

I have written a new post, which will appear in a guest blog this week. The chart in question is this puzzle:


Not an easy dataset to deal with... will link to the post when it appears.


For those looking to learn data science and advanced analytics skills, I'll be answering your questions about Principal Analytics Prep tomorrow night at The Ginger Man at East 36th Street in Manhattan. Some instructors and alumni will be there to talk about their experiences with our programs. Come join us!

To register for this free event, go to our Eventbrite page.


Monday after Thanksgiving, see you at the Data Visualization Meetup

My little analytics training startup, Principal Analytics Prep, is proud to sponsor the next meeting of the Data Visualization New York Meetup, organized by the indefatigable Naomi Robbins. The Meetup, to be held on November 26, 2018 (Monday), headlines NYU professor, Kristen Sosulski, who will discuss the “business case for data visualization”. Click here to register.

Naomirobbins_bookcoverNaomi is a long-time friend. I reviewed and recommended her book on Creating More Effective Data Graphics in 2008. This is still a useful reference to some key concepts, presented in a clean, easily digestible format.


The keynote speaker, Kristen, has just published Data Visualization Made Simple, described as a “top book for computer science students.” I am very much looking forward to her talk because the subject speaks to the core of the mission of Principal Analytics Prep: placing data science & analytics in the context of the entire enterprise.

That’s why my bootcamp and training programs emphasize the Three Pillars: computing, statistics, and business. Our instructors are practitioners with 10 to 30 years of learning from real-world implementation of models and systems used by forward-thinking, data-driven organizations.

The Spring 2019 cohort of our Certified Data Specialist bootcamp is open for applications. If you're looking to transition your career into data science and advanced analytics, check us out. Here are some of the great things our alums have said about the program. Take advantage of the early admit deadline by December 10, 2018. Click here for more information.


Kristen_bookcoverI'm excited to hear what Kristen has to say about using data visualization in the business world. Kristen's website is here. She teaches at NYU's Stern School of Business, and describes herself as a computer scientist. Here's the link to the Amazon page. Unfortunately the page is not very informative about the book's content. The table of contents is minimalist ("The Design", "The Audience", etc.). I will report back on what she spoke about after the Meetup.



Message-first visualization

Sneaky Pete via Twitter sent me the following chart, asking for guidance:


This is a pretty standard dataset, frequently used in industry. It shows a breakdown of a company's profit by business unit, here classified by "state". The profit projection for the next year is measured on both absolute dollar terms and year-on-year growth.

Since those two metrics have completely different scales, in both magnitude and unit, it is common to use dual axes. In the case of the Economist, they don't use dual axes; they usually just print the second data series in its own column.


I first recommended looking at the scatter plot to see if there are any bivariate patterns. In this case, not much insights are provided via the scatter.

From there, I looked at the data again, and ended up with the following pair of bumps charts (slopegraphs):


A key principle I used is message-first. That is to say, the designer should figure out what message s/he wants to convey via the visualization, and then design the visualization to convey that message.

A second key observation is that the business units are divided into two groups, the two large states (A and F) and the small states (B to E). This is a Pareto principle that very often applies to real-world businesses, i.e. a small number of entities contribute most of the revenues (or profits). It is very likely that these businesses are structured to serve the large and small states differently, and so the separation onto two charts mirrors the internal structure.

Then, within each chart, there is a message. For the large states, it looks like state F is projected to overtake state A next year. That is a big deal because we're talking about the largest unit in the entire company.

For the small states, the standout is state B, decidedly more rosy than the other three small states with similar projected growth rates.

Note also I chose to highlight the actual dollar profits, letting the growth rates be implied in the slopes. Usually, executives are much more concerned about hitting a dollar value than a growth rate target. But that, of course, depends on your management's preference.


McKinsey thinks the data world needs more dataviz talent

Note about last week: While not blogging, I delivered four lectures on three topics over five days: one on the use of data analytics in marketing for a marketing class at Temple; two on the interplay of analytics and data visualization, at Yeshiva and a JMP Webinar; and one on how to live during the Data Revolution at NYU.

This week, I'm back at blogging.

McKinsey publishes a report confirming what most of us already know or experience - the explosion of data jobs that just isn't stopping.

On page 5, it says something that is of interest to readers of this blog: "As data grows more complex, distilling it and bringing it to life through visualization is becoming critical to help make the results of data analyses digestible for decision makers. We estimate that demand for visualization grew roughly 50 percent annually from 2010 to 2015." (my bolding)

The report contains a number of unfortunate graphics. Here's one:


I applied my self-sufficiency test by removing the bottom row of data from the chart. Here is what happened to the second circle, representing the fraction of value realized by the U.S. health care industry.


What does the visual say? This is one of the questions in the Trifecta Checkup. We see three categories of things that should add up to 100 percent. With a little more effort, we find the two colored categories are each 10% while the white area is 80%. 

But that's not what the data say, because there is only one thing being measured: how much of the potential has already been realized. The two colors is an attempt to visualize the uncertainty of the estimated proportion, which in this case is described as 10 to 20 percent underneath the chart.

If we have to describe what the two colored sections represent: the dark green section is the lower bound of the estimate while the medium green section is the range of uncertainty. The edge between the two sections is the actual estimated proportion (assuming the uncertainty bound is symmetric around the estimate)!

A first attempt to fix this might be to use line segments instead of colored arcs. 


The middle diagram emphasizes the mid-point estimate while the right diagram, the range of estimates. Observe how differently these two diagrams appear from the original one shown on the left.

This design only works if the reader perceives the chart as a "racetrack" chart. You have to see the invisible vertical line at the top, which is the starting line, and measure how far around the track has the symbol gone. I have previously discussed why I don't like racetracks (for example, here and here).


Here is a sketch of another design:


The center figure will have to be moved and changed to a different shape. This design conveys the sense of a goal (at 100%) and how far one is along the path. The uncertainty is represented by wave-like elements that make the exact location of the pointer arrow appear as wavering.