The time of bird seeds and chart tuneups

The recent post about multi-national companies reminded me of an older post, in which I stepped through data table enhancements.

Here is a video of the process. You can use any tool to implement the steps; even Excel is good enough.

 

 

The video is part of a series called "Data science: the Missing Pieces". In these episodes, I cover the parts of data science that are between the cracks, the little things that textbooks and courses do not typically cover - the things that often block students from learning efficiently.

If you have encountered such things, please comment below to suggest future topics. What is something about visualizing data you wish you learned formally?

***

P.S. Placed here to please the twitter-bot

DSTMP2_goodchart_thumb

 

 


Announcement: Advancing your data skills, Fall 2019

Interrupting the flow of dataviz with the following announcement.

If you're looking to shore up your data skills, modernize your skill set, or know someone looking for hands-on, high-touch instruction in Machine Learning, R, Cloud Computing, Data Quality, Digital Analytics,  A/B Testing and Financial Analysis, Principal Analytics Prep is offering evening classes this Fall. Click here to learn about our courses. 

Our instructors are industry veterans with 10+ years of practical industry experience. And class size is capped to 10, ensuring a high-touch learning environment.

Facebook_pap_parttimeimmersive_tree

 


Report from Data Visualization Meetup

Kristen_bookcoverOn Monday, Principal Analytics Prep sponsored the Data Visualization Meetup, organized by the indefatigable Naomi Robbins. The keynote speaker is NYU professor Kristen Sosulski, who just published a book titled “Data Visualization Made Simple” (link).

At the Meetup, we announced a Part-Time Immersive Program. This allows the completion of the Certified Data Specialist program in three levels on a more relaxed, evening schedule. Level 1 will run two nights a week for 12 weeks, starting Spring 2019. For more details, contact us here.

***

Kristen, a professor in the Stern School, has an interesting take on the data visualization function – placing it within the larger enterprise. In the first part of her talk, she presents a number of real-world case examples of how data analysts used data visualization to create impact within an organization.

The end goal in each of these projects is a “business insight” that is delivered to decision-makers with the primary goal of persuasion – something I also emphasize in my own seminars. It’s not that data visualization isn’t used for analysis, exploration, and story-telling (see postscript), and so on but at the tail end of the process, the need to persuade becomes paramount.

***

For example, the graphic on the cover of her book is from a project undertaken by Jet.com, the online retailer purchased by Walmart. The managers are interested in the patterns of purchasing of the customers, and generally views products as “consumables” or “durables,” the latter have lower purchasing frequencies. The nodes in the network graph are colored accordingly. Through the links between these nodes, the analyst concluded that certain products (an example given was batteries) are considered durables but have purchasing patterns that appear more like consumables.

Kristen’s message is how the data turned into a business insight (the “story”) which impressed the managers enough so that they took action by adjusting orders and inventories.

Kristen described other examples such as the use of salary data to place employees into bands, or the use of predictive models to predict which partners in a venture-capital firm will bring in more investment. Many of these examples make me believe that a course of causal reasoning should be required for all data analysts.

***

The second half of Kristen’s talk addresses how to raise the profile of data visualization within an enterprise. This is a clearly needed discussion. More and more industry jobs are created that are specific to data visualization so these new teams must establish themselves within the corporate culture. Kristen recommends a five-step process, starting with establishing a data practice and ending with measuring one’s impact.

In answering my question about evangelizing new visualization formats to replace inferior existing chart designs, she emphasizes the need to involve stakeholders early in the process. Don't surprise them with something novel during a meeting.

We were pleased that people braved the adverse weather to attend Kristen’s talk, and good pizza was served at the end of the evening.

 

P.S.

The word “story-telling” seems to have gone from hero to villain lately. Some commenters are thinking the word “story” implies made-up fiction, and thus oppose its use. A related complaint concerns the “subjectivity” of stories. Once you realize that most of our data sources are observational in nature, you will soon discover that causal reasoning entails the selection of the most plausible story among many. Statisticians and others have come up with causal models, which are sets of equations used to describe relationships between data, but all of these rely on causal assumptions. In essence, they are structured ways to select the most plausible story. It’s dangerous to see these models as “objective.”


Teaser: new post, tomorrow's event

I have written a new post, which will appear in a guest blog this week. The chart in question is this puzzle:

Businessinsider_ibankers

Not an easy dataset to deal with... will link to the post when it appears.

***

For those looking to learn data science and advanced analytics skills, I'll be answering your questions about Principal Analytics Prep tomorrow night at The Ginger Man at East 36th Street in Manhattan. Some instructors and alumni will be there to talk about their experiences with our programs. Come join us!

To register for this free event, go to our Eventbrite page.

Nov27_2018-Principal-Analytics-Prep-InfoSession


Monday after Thanksgiving, see you at the Data Visualization Meetup

My little analytics training startup, Principal Analytics Prep, is proud to sponsor the next meeting of the Data Visualization New York Meetup, organized by the indefatigable Naomi Robbins. The Meetup, to be held on November 26, 2018 (Monday), headlines NYU professor, Kristen Sosulski, who will discuss the “business case for data visualization”. Click here to register.

Naomirobbins_bookcoverNaomi is a long-time friend. I reviewed and recommended her book on Creating More Effective Data Graphics in 2008. This is still a useful reference to some key concepts, presented in a clean, easily digestible format.

***

The keynote speaker, Kristen, has just published Data Visualization Made Simple, described as a “top book for computer science students.” I am very much looking forward to her talk because the subject speaks to the core of the mission of Principal Analytics Prep: placing data science & analytics in the context of the entire enterprise.

That’s why my bootcamp and training programs emphasize the Three Pillars: computing, statistics, and business. Our instructors are practitioners with 10 to 30 years of learning from real-world implementation of models and systems used by forward-thinking, data-driven organizations.

The Spring 2019 cohort of our Certified Data Specialist bootcamp is open for applications. If you're looking to transition your career into data science and advanced analytics, check us out. Here are some of the great things our alums have said about the program. Take advantage of the early admit deadline by December 10, 2018. Click here for more information.

***

Kristen_bookcoverI'm excited to hear what Kristen has to say about using data visualization in the business world. Kristen's website is here. She teaches at NYU's Stern School of Business, and describes herself as a computer scientist. Here's the link to the Amazon page. Unfortunately the page is not very informative about the book's content. The table of contents is minimalist ("The Design", "The Audience", etc.). I will report back on what she spoke about after the Meetup.

 

 


Information Session on our Data Analytics Bootcamp

Logo_name_whitebg_xsmallNext Monday (Aug 13), we are hosting an information session on our Data Analytics Bootcamp in our office in New York City. The bootcamp has been successful at launching business careers for graduates starting out in the data science and analytics sector - one of the hottest sectors in the economy right now.

There are many possible paths to a data analytics job. Here are just a few we have assisted:

  • Dropping out of medical school, and becoming a data scientist at a large health insurer
  • Moving from an operations role at a non-profit to a marketing analytics position at an international advertising agency
  • Switching from analyzing environmental data for a government agency to being a data scientist for an analytics consultancy
  • Leaving the academic instructor position behind to join a major agricultural firm as their first data scientist
  • Quitting a lab assistant position, and joining an exciting tech startup as a data scientist

One of our first graduates remarked:

"Kaiser, the program's founder, teaches a fantastic class on statistical reasoning that until this day causes me to question assumptions behind analyses and models I see. The other instructors were also a joy to learn from, and teach you not just the technical material but also how it is applied in their various industries... I ended up with multiple job offers, just from the connections I formed in this program. I simply can't recommend this program highly enough."

***

To learn more about our program, come meet our instructors and alumni at our Information Session. Click here to register.


Upcoming talks and workshops, NYC, Seattle, Philadelphia, St. Louis

The following talks/events are all free and open to the public. If I'm in your neighborhood, please come by and say hi.

Kaiser_fung_talks_feb_mar_2018

 

You can register or learn more about the above talks at the following links:

Feb 20, 2018 (tonight, NYC) - Principal Analytics Prep Open House, with me and Tina Lowry, talking about the data analytics field and how to get a job in this space. More information here.

Feb 22, 2018 (Seattle) - A talk about best practices in data visualization for business presentations. Sign up here.

Feb 28, 2018 (NYC) - Analytics Resume Workshop, jointly hosted by Principal Analytics Prep and New York Public Library Job Search Central: we provide free advice on improving your resume to appeal to analytics and data science hiring managers. Register at our Meetup group here

March 7, 2018 (Philadephia) - A talk about best practices in data visualization for business presentations. Sign up here

This last event, part of the Midwest Digital Marketing Conference, has a small fee.

March 26, 2018 (St. Louis) - Workshop on data visualization: simple things you can do to make even Excel charts better! Sign up here (scroll to the bottom of the page).


Dataviz Seminar and other upcoming events

Please help me spread the word on several upcoming events. If you're coming, please say hi!

 

Data Visualization Seminar - JMP Explorers Series

WHEN: October 4, 2017 , Wed, 9 am - 2:30 pm (ET)
WHERE: New School, 63 5th Avenue, New York
REGISTER HERE: Link

In this seminar, I offer tips on making effective visualizations of data, summarizing over a dozen years of critiquing thousands of data graphics.

PS. New Yorkers: I typically start the seminar with an example of dataviz with a local flavor. If you've seen something interesting recently, send it my way!

 

Principal Analytics Prep Information Session & Webinar on Digital Ad Fraud Analytics

WHEN: October 11, 2017 , Wed, 7 - 8 pm (ET)
WHERE: Online
REGISTER HERE: Link

In this webinar, I will discuss the data analytics revolution, and answer questions on how to start or develop your career in this exciting field. In addition, I invited Dr. Augustine Fou, a leading ad fraud researcher, to comment on the recent scandals of fake data in digital advertising. Augustine and I raised the alarm on this huge problem in a Harvard Business Review article in 2015!

Earlier this year, I launched Principal Analytics Prep, an intensive, 12-week bootcamp, created and staffed by leading industry experts, designed to open doors to new careers in data analytics and data science. In the past 15 years, I established and led data teams at SiriusXM Radio and Vimeo, in addition to teaching and running academic programs at Columbia and NYU.

How to Break into the Hottest Sector of the Job Market: Data Science & Analytics

WHEN: October 12, 2017 , Thur, 6:30 - 8 pm 6 - 7:30 pm (ET)
WHERE: New York Public Library, Small Business & Industry Library (SIBL), 188 Madison Avenue, New York
MORE INFO: Link to NYPL

In this talk, I discuss what data science & analytics is, why this the sector is exploding, what trends are driving such growth, and how you can take advantage of this jobs boom. 

If you can't make it in person, a short version of this talk will be presented at the Principal Analytics Prep online information session mentioned above. You can register here.


Report from the NBA Hackathon 2017

Yesterday, I had the honor of being one of the judges at the NBA Hackathon. This is the second edition of the Hackathon, organized by the NBA League Office's analytics department in New York. Here is Director of Basketball Analytics, Jason Rosenfeld, speaking to the crowd:

IMG_7112s_jr

The event was a huge draw - lots of mostly young basketball enthusiasts testing their hands at manipulating and analyzing data to solve interesting problems. I heard there were over 50 teams who showed up on "game day." Hundreds more applicants did not get "drafted." Many competitors came from out of town - amongst the finalists, there was a team from Toronto and one from Palo Alto.

The competition was divided into two tracks: basketball analytics, and business analytics. Those in the basketball track were challenged with problems of interest to coaches and managers. For example, they are asked to suggest a rule change that might increase excitement in the game, and support that recommendation using the voluminous spatial data. Some of these problems are hard: one involves projecting shot selection ten years out - surely fans want to know if the craze over 3-pointers will last. Nate Silver was one of the judges for the basketball analytics competition.

I was part of the business analytics judging panel, along with the fine folks shown below:

IMG_7247s_judges

The business problems are challenging as well, and really tested the competitors' judgment, as the problems are open-ended and subjective. Technical skills are also required, as very wide-ranging datasets are made available. One problem asks contestants to combine a wide number of datasets to derive a holistic way to measure "entertainment value" of a game. The other problem is even more open: do something useful and interesting with our customer files.

I visited the venue the night before, when the teams were busy digging into the data. See the energy in the room here:

IMG_7110s_work

The competitors are given 24 hours to work on the datasets. This time includes making a presentation to showcase what they have found. They are not allowed to utilize old code. I overheard several conversations between contestants and the coaches - it appeared that the datasets are in a relatively raw state, meaning quite a bit of time would have been spent organizing, exploring, cleaning and processing the data.

One of the finalists in the business competition started their presentation, telling the judges they spent 12 hours processing their datasets. It does often seem like as analysts, we are fighting with our data.

IMG_7250s_team2

This team from Toronto wrestled with the various sets of customer-indiced data, and came up with a customer segmentation scheme. They utilized a variety of advanced modeling techniques.

The other two finalists in the business competition tackled the same problem: how to measure entertainment value of a game. Their approaches were broadly similar, with each team deploying a hierarchy of regression models. Each model measures a particular contributor to entertainment value, and contains a number of indicators to predict the contribution.

Pictured below is one of the finalists, who deployed Lasso regression, a modern technique to select a subset of important factors from a large number of possibilities. This team has a nice handle on the methods, and notably, was the only team that presented error bars, showing the degree of uncertainty in their results.

IMG_7252s_team3

The winning team in the business competition went a couple of steps beyond. First, they turned in a visual interface to a decision-making tool that scores every game according to their definition of entertainment value. I surmise that they also expressed these scores in a relative way, because some of their charts show positive and negative values. Second, this team from Princeton realized the importance of tying all their regression models together into a composite score. They even allow the decision makers to shift the component weights around. Congratulations to Data Buckets! Here is the pair presenting their decision-making tool:

IMG_7249s_databuckets

Mark Tatum, deputy commissioner of the NBA League Office, presented the award to Team Data Buckets:

IMG_7279s_winner

These two are also bloggers. Look here.

After much deliberation, the basketball analytics judges liked the team representing the Stanford Sports Analytics Club.

IMG_7281s_winner

These guys tackled the very complicated problem of forecasting future trends in shot selection, using historical data.

For many, maybe most, of the participants, this was their first exposure to real-world datasets, and a short time window to deliver an end-product. Also, they must have learned quite a bit about collaboration.

The organizers should be congratulated for putting together a smoothly-run event. When you host a hackathon, you have to be around throughout the night as well. Also, the analytics department staff kindly simplified the lives of us judges by performing the first round of selection overnight.

***

Last but not least, I like to present the unofficial Best Data Graphics Award to the team known as Quire Sultans. They were a finalist in the basketball analytics contest. I am impressed with this display:

IMG_7259s_bestchart

This team presented a new metric using data on passing. The three charts are linked. The first one shows passer-passee data within a specific game; the second shows locations on the court for which passes have more favorable outcomes; the third chart measures players' over/under performance against a model.

There were quite a few graphics presented at the competition. This is one of the few in which the labels were carefully chosen and easily understood, without requiring in-depth knowledge about their analysis.


Announcing a new venture

This is a great time for people in the data business. If you go on Linkedin and look for data jobs, there are several thousand open positions, just in the New York area. Every department within any business is accumulating data, and they need people to help them get value out of the data.

There are also lots of people I meet who would like to transition their careers to take advantage of these open positions but too many of them are being turned away. Many of these people have great backgrounds in other fields (economics, chemistry, psychology, engineering, IT, etc.), and have the analytical smarts to excel in these new data jobs. They are not getting hired. That's because as hiring managers, we prefer hiring the experienced person who doesn't need additional training. We also poach experienced people from other employers, instead of training new talent, creating a vicious cycle.

This is the problem that I am trying to solve by launching my new venture - Principal Analytics Prep.

 

We_make_data_unicorn_design

 

The single biggest complaint about the talent pool by hiring managers is that people's skills are too narrow, sometimes too technical, sometimes too "soft". Hiring managers in the business units outside engineering/software development, for example, marketing, operations, finance, customer service, want to hire people who can analyze and interpret data in the business context, communicate findings to non-technical audiences, as well as contribute to inter-departmental working teams to solve business problems.

For Principal Analytics Prep, I have assembled a group of passionate instructors - who are in director or above positions in industry, and hiring managers for their teams - to design a broad-based curriculum that helps people upgrade their skills to meet industry needs. Our courses range from coding to statistical reasoning to business skills. The faculty have worked at places such as American Express, Cisco, Goldman Sachs, HBO, McKinsey, Mount Sinai, SiriusXM Radio, and Vimeo, with an average of 10 years in industry.

We are not a pure coding academy, therefore we want to assemble people from all disciplines.

We will be launching the first class of students this summer in NYC.

***

Blog readers, you can help me in the following ways:

  • If you know anyone who's looking to upgrade their skills and get into the business analytics/data science field, tell them about the program
  • If you are interested in teaching a course, contact me
  • I am also looking for part-time help with administration and operations, so if you believe in my vision, contact me

If you have suggestions, please leave a comment. Thank you.