Report from Data Visualization Meetup

Kristen_bookcoverOn Monday, Principal Analytics Prep sponsored the Data Visualization Meetup, organized by the indefatigable Naomi Robbins. The keynote speaker is NYU professor Kristen Sosulski, who just published a book titled “Data Visualization Made Simple” (link).

At the Meetup, we announced a Part-Time Immersive Program. This allows the completion of the Certified Data Specialist program in three levels on a more relaxed, evening schedule. Level 1 will run two nights a week for 12 weeks, starting Spring 2019. For more details, contact us here.


Kristen, a professor in the Stern School, has an interesting take on the data visualization function – placing it within the larger enterprise. In the first part of her talk, she presents a number of real-world case examples of how data analysts used data visualization to create impact within an organization.

The end goal in each of these projects is a “business insight” that is delivered to decision-makers with the primary goal of persuasion – something I also emphasize in my own seminars. It’s not that data visualization isn’t used for analysis, exploration, and story-telling (see postscript), and so on but at the tail end of the process, the need to persuade becomes paramount.


For example, the graphic on the cover of her book is from a project undertaken by, the online retailer purchased by Walmart. The managers are interested in the patterns of purchasing of the customers, and generally views products as “consumables” or “durables,” the latter have lower purchasing frequencies. The nodes in the network graph are colored accordingly. Through the links between these nodes, the analyst concluded that certain products (an example given was batteries) are considered durables but have purchasing patterns that appear more like consumables.

Kristen’s message is how the data turned into a business insight (the “story”) which impressed the managers enough so that they took action by adjusting orders and inventories.

Kristen described other examples such as the use of salary data to place employees into bands, or the use of predictive models to predict which partners in a venture-capital firm will bring in more investment. Many of these examples make me believe that a course of causal reasoning should be required for all data analysts.


The second half of Kristen’s talk addresses how to raise the profile of data visualization within an enterprise. This is a clearly needed discussion. More and more industry jobs are created that are specific to data visualization so these new teams must establish themselves within the corporate culture. Kristen recommends a five-step process, starting with establishing a data practice and ending with measuring one’s impact.

In answering my question about evangelizing new visualization formats to replace inferior existing chart designs, she emphasizes the need to involve stakeholders early in the process. Don't surprise them with something novel during a meeting.

We were pleased that people braved the adverse weather to attend Kristen’s talk, and good pizza was served at the end of the evening.



The word “story-telling” seems to have gone from hero to villain lately. Some commenters are thinking the word “story” implies made-up fiction, and thus oppose its use. A related complaint concerns the “subjectivity” of stories. Once you realize that most of our data sources are observational in nature, you will soon discover that causal reasoning entails the selection of the most plausible story among many. Statisticians and others have come up with causal models, which are sets of equations used to describe relationships between data, but all of these rely on causal assumptions. In essence, they are structured ways to select the most plausible story. It’s dangerous to see these models as “objective.”

Teaser: new post, tomorrow's event

I have written a new post, which will appear in a guest blog this week. The chart in question is this puzzle:


Not an easy dataset to deal with... will link to the post when it appears.


For those looking to learn data science and advanced analytics skills, I'll be answering your questions about Principal Analytics Prep tomorrow night at The Ginger Man at East 36th Street in Manhattan. Some instructors and alumni will be there to talk about their experiences with our programs. Come join us!

To register for this free event, go to our Eventbrite page.


Monday after Thanksgiving, see you at the Data Visualization Meetup

My little analytics training startup, Principal Analytics Prep, is proud to sponsor the next meeting of the Data Visualization New York Meetup, organized by the indefatigable Naomi Robbins. The Meetup, to be held on November 26, 2018 (Monday), headlines NYU professor, Kristen Sosulski, who will discuss the “business case for data visualization”. Click here to register.

Naomirobbins_bookcoverNaomi is a long-time friend. I reviewed and recommended her book on Creating More Effective Data Graphics in 2008. This is still a useful reference to some key concepts, presented in a clean, easily digestible format.


The keynote speaker, Kristen, has just published Data Visualization Made Simple, described as a “top book for computer science students.” I am very much looking forward to her talk because the subject speaks to the core of the mission of Principal Analytics Prep: placing data science & analytics in the context of the entire enterprise.

That’s why my bootcamp and training programs emphasize the Three Pillars: computing, statistics, and business. Our instructors are practitioners with 10 to 30 years of learning from real-world implementation of models and systems used by forward-thinking, data-driven organizations.

The Spring 2019 cohort of our Certified Data Specialist bootcamp is open for applications. If you're looking to transition your career into data science and advanced analytics, check us out. Here are some of the great things our alums have said about the program. Take advantage of the early admit deadline by December 10, 2018. Click here for more information.


Kristen_bookcoverI'm excited to hear what Kristen has to say about using data visualization in the business world. Kristen's website is here. She teaches at NYU's Stern School of Business, and describes herself as a computer scientist. Here's the link to the Amazon page. Unfortunately the page is not very informative about the book's content. The table of contents is minimalist ("The Design", "The Audience", etc.). I will report back on what she spoke about after the Meetup.



Information Session on our Data Analytics Bootcamp

Logo_name_whitebg_xsmallNext Monday (Aug 13), we are hosting an information session on our Data Analytics Bootcamp in our office in New York City. The bootcamp has been successful at launching business careers for graduates starting out in the data science and analytics sector - one of the hottest sectors in the economy right now.

There are many possible paths to a data analytics job. Here are just a few we have assisted:

  • Dropping out of medical school, and becoming a data scientist at a large health insurer
  • Moving from an operations role at a non-profit to a marketing analytics position at an international advertising agency
  • Switching from analyzing environmental data for a government agency to being a data scientist for an analytics consultancy
  • Leaving the academic instructor position behind to join a major agricultural firm as their first data scientist
  • Quitting a lab assistant position, and joining an exciting tech startup as a data scientist

One of our first graduates remarked:

"Kaiser, the program's founder, teaches a fantastic class on statistical reasoning that until this day causes me to question assumptions behind analyses and models I see. The other instructors were also a joy to learn from, and teach you not just the technical material but also how it is applied in their various industries... I ended up with multiple job offers, just from the connections I formed in this program. I simply can't recommend this program highly enough."


To learn more about our program, come meet our instructors and alumni at our Information Session. Click here to register.

Upcoming talks here and there

I'm giving a dataviz talk in San Ramon, CA on Thursday Nov 9. Go here to register.


Then next Monday (Nov 13, 11 am), I will be in Boston at Harvard Business Review, giving a "live whiteboard session" on A/B Testing. This talk will be streamed live on Facebook Live.


Finally, my letter to the editor of New York Times Magazine was published this past Sunday. This letter is a response to Susan Dominus's article about the "power pose" research, and the replication crisis in social science. Fundamentally, it is a debate over how data is used and analyzed in experiments, and therefore relevant to my readers. I added a list of resources in this blog post about the letter.


Those are some of my favorite topics: dataviz, A/B testing, and data-driven decision-making.

Dataviz Seminar and other upcoming events

Please help me spread the word on several upcoming events. If you're coming, please say hi!


Data Visualization Seminar - JMP Explorers Series

WHEN: October 4, 2017 , Wed, 9 am - 2:30 pm (ET)
WHERE: New School, 63 5th Avenue, New York

In this seminar, I offer tips on making effective visualizations of data, summarizing over a dozen years of critiquing thousands of data graphics.

PS. New Yorkers: I typically start the seminar with an example of dataviz with a local flavor. If you've seen something interesting recently, send it my way!


Principal Analytics Prep Information Session & Webinar on Digital Ad Fraud Analytics

WHEN: October 11, 2017 , Wed, 7 - 8 pm (ET)
WHERE: Online

In this webinar, I will discuss the data analytics revolution, and answer questions on how to start or develop your career in this exciting field. In addition, I invited Dr. Augustine Fou, a leading ad fraud researcher, to comment on the recent scandals of fake data in digital advertising. Augustine and I raised the alarm on this huge problem in a Harvard Business Review article in 2015!

Earlier this year, I launched Principal Analytics Prep, an intensive, 12-week bootcamp, created and staffed by leading industry experts, designed to open doors to new careers in data analytics and data science. In the past 15 years, I established and led data teams at SiriusXM Radio and Vimeo, in addition to teaching and running academic programs at Columbia and NYU.

How to Break into the Hottest Sector of the Job Market: Data Science & Analytics

WHEN: October 12, 2017 , Thur, 6:30 - 8 pm 6 - 7:30 pm (ET)
WHERE: New York Public Library, Small Business & Industry Library (SIBL), 188 Madison Avenue, New York

In this talk, I discuss what data science & analytics is, why this the sector is exploding, what trends are driving such growth, and how you can take advantage of this jobs boom. 

If you can't make it in person, a short version of this talk will be presented at the Principal Analytics Prep online information session mentioned above. You can register here.

Report from the NBA Hackathon 2017

Yesterday, I had the honor of being one of the judges at the NBA Hackathon. This is the second edition of the Hackathon, organized by the NBA League Office's analytics department in New York. Here is Director of Basketball Analytics, Jason Rosenfeld, speaking to the crowd:


The event was a huge draw - lots of mostly young basketball enthusiasts testing their hands at manipulating and analyzing data to solve interesting problems. I heard there were over 50 teams who showed up on "game day." Hundreds more applicants did not get "drafted." Many competitors came from out of town - amongst the finalists, there was a team from Toronto and one from Palo Alto.

The competition was divided into two tracks: basketball analytics, and business analytics. Those in the basketball track were challenged with problems of interest to coaches and managers. For example, they are asked to suggest a rule change that might increase excitement in the game, and support that recommendation using the voluminous spatial data. Some of these problems are hard: one involves projecting shot selection ten years out - surely fans want to know if the craze over 3-pointers will last. Nate Silver was one of the judges for the basketball analytics competition.

I was part of the business analytics judging panel, along with the fine folks shown below:


The business problems are challenging as well, and really tested the competitors' judgment, as the problems are open-ended and subjective. Technical skills are also required, as very wide-ranging datasets are made available. One problem asks contestants to combine a wide number of datasets to derive a holistic way to measure "entertainment value" of a game. The other problem is even more open: do something useful and interesting with our customer files.

I visited the venue the night before, when the teams were busy digging into the data. See the energy in the room here:


The competitors are given 24 hours to work on the datasets. This time includes making a presentation to showcase what they have found. They are not allowed to utilize old code. I overheard several conversations between contestants and the coaches - it appeared that the datasets are in a relatively raw state, meaning quite a bit of time would have been spent organizing, exploring, cleaning and processing the data.

One of the finalists in the business competition started their presentation, telling the judges they spent 12 hours processing their datasets. It does often seem like as analysts, we are fighting with our data.


This team from Toronto wrestled with the various sets of customer-indiced data, and came up with a customer segmentation scheme. They utilized a variety of advanced modeling techniques.

The other two finalists in the business competition tackled the same problem: how to measure entertainment value of a game. Their approaches were broadly similar, with each team deploying a hierarchy of regression models. Each model measures a particular contributor to entertainment value, and contains a number of indicators to predict the contribution.

Pictured below is one of the finalists, who deployed Lasso regression, a modern technique to select a subset of important factors from a large number of possibilities. This team has a nice handle on the methods, and notably, was the only team that presented error bars, showing the degree of uncertainty in their results.


The winning team in the business competition went a couple of steps beyond. First, they turned in a visual interface to a decision-making tool that scores every game according to their definition of entertainment value. I surmise that they also expressed these scores in a relative way, because some of their charts show positive and negative values. Second, this team from Princeton realized the importance of tying all their regression models together into a composite score. They even allow the decision makers to shift the component weights around. Congratulations to Data Buckets! Here is the pair presenting their decision-making tool:


Mark Tatum, deputy commissioner of the NBA League Office, presented the award to Team Data Buckets:


These two are also bloggers. Look here.

After much deliberation, the basketball analytics judges liked the team representing the Stanford Sports Analytics Club.


These guys tackled the very complicated problem of forecasting future trends in shot selection, using historical data.

For many, maybe most, of the participants, this was their first exposure to real-world datasets, and a short time window to deliver an end-product. Also, they must have learned quite a bit about collaboration.

The organizers should be congratulated for putting together a smoothly-run event. When you host a hackathon, you have to be around throughout the night as well. Also, the analytics department staff kindly simplified the lives of us judges by performing the first round of selection overnight.


Last but not least, I like to present the unofficial Best Data Graphics Award to the team known as Quire Sultans. They were a finalist in the basketball analytics contest. I am impressed with this display:


This team presented a new metric using data on passing. The three charts are linked. The first one shows passer-passee data within a specific game; the second shows locations on the court for which passes have more favorable outcomes; the third chart measures players' over/under performance against a model.

There were quite a few graphics presented at the competition. This is one of the few in which the labels were carefully chosen and easily understood, without requiring in-depth knowledge about their analysis.

Where I will be in the next few weeks

It's awfully quiet here lately as I am trying to manage a tight schedule. The problem with a tight schedule is the absence of "slack." Without slack, just one little unexpected event ruins your schedule. Like dominoes, everything gets pushed back. That event arrived in drips and drabs a couple of weeks ago as a major water leak broke out two floors above my apartment. I am still picking up the pieces.

Last week, I crossed the pond and gave a talk about visual story-telling at the SAS headquarters in UK. The audience was wonderful and the organizers assembled a great crowd. The event was streamed live to over a thousand viewers all across Europe. Thanks for attending!

Here's me pointing to one of the charts in my presentation:


In the next few weeks, people in the U.S. have a chance to hear a similar presentation. Please come meet me and let me know you read my blog!

Los Angeles, 2/24, 9 a.m. Free registration here

Denver, 3/17, 9 a.m. Free registration here

New York City, 3/24, 9 a.m. Free registration here


In addition, I will be speaking about the ethics of data science at the INFORMS Analytics Conference, in April, in Orlando. The talk will be followed by a panel discussion.

On a related note, rSQUAREedge is hosting a webinar next week by Augustine Fou, who is a digital advertising fraud investigator. This is also free. Fou will talk about the techniques he uses to uncover "bad" data. In this case, "bad" data are data inserted by adversaries to inflate statistics. This is one of the unspoken, and worrisome issues in modern data analysis. One can be very naive in assuming that the observational, "found" data are free from manipulation.




Round-up of up-coming events

I finally got around to updating the event listings. In the coming months, I will be giving  a number of talks on data visualization.

Next week, I will be speaking to the Data Visualization New York meetup, ably organized by Naomi Robbins. The event is heavily over-subscribed, so apologies to those who can't make it in.

In October, I will be offering a short class on data visualization at an executive education event at Columbia University. The event is "Leading Business Change Through Analytics". The fantastic program covers the management and leadership skills necessary to turn data insights into measurable business results. You can still register to attend.

In addition, I will be giving a proseminar at NYU's Applied Quantitative Reasoning program in the Sociology department.

I will also be visiting classes by Andrew Gelman (Columbia) and Ray Vella (NYU) next month.


You can follow my events from my sister blog. Click here and look on the right column.

If you come to one of these events, do come up and say hi!


Summer dataviz workshop to start July 1

Registration is open for my dataviz workshop at NYU. (link)

This is a workshop in the sense of a creative writing workshop. Your "writing" are sketches of data visualization based on your selected datasets. In class, we critique all of the work and produce revisions. You will learn to appreciate good dataviz, to offer constructive and insightful commentary on visualization, and be discriminating in receiving feedback.

Last term, half the class worked on datasets that are related to their jobs. The data sources were diverse, ranging from scholarly citation data, World Bank data, commercial sales and market share data, mountaineering accidents data, standardized testing item data, speeches by death row inmates, juvenile convicts, etc.

Students pick their own tools. They used Excel, Powerpoint, Tableau, d3, etc.

Here is a past syllabus.

The course runs from July 1 to Aug 5. Register here.