This past weekend, I found my way to West Lafayette, Indiana, to speak at the Math, Data Science and Industry Conference, organized by Math Prof. Aaron Yip and Drew Swartz. I was very impressed with the quality and diversity of the talks there. They managed to strike a nice balance between academic talks and industry talks, and the BS quotient was minimal.
I will first outline my own talk, and then in a next post, I will highlight some things from the other talks I attended.
The goal of my talk is to paint a broad-brush picture of the scope of jobs that are part of the current Data Revolution, and to give a flavor for the nature of the work so that graduate students may decide for themselves whether this "data science" industry is a good fit for them.
One key takeaway is the distinction between research jobs and industry jobs. Research jobs lead to innovative research that can be published in scholarly journals. Most industry jobs demand short-term results that impact the business, and it does not matter whether the methods used are innovative. The boom in data jobs, however, is in industry jobs. Only large corporations in cash-rich industries can afford research jobs, and even at those firms, there are hundreds if not thousands of industry jobs for each research position. Math graduates can totally get hired for industry positions, if they put in a little effort to prepare for this career path.
Within industry jobs, I like to think of three job types.
- Data science jobs - these are the headline-catching jobs because they are disproportionately found in the high-tech industry. Think of these as software developers with advanced database skills. The culture here is automation, removing human beings from the process.
- Business analytics jobs - these jobs are tethered to business teams, such as marketing, finance, operations and customer service. They are the champions of embedding data analyses in the everyday decision-making processes. They interact constantly with business managers, providing a form of consulting service.
- Data IT jobs - these people keep the data flowing in the organization, so to speak. They are also responsible for "data governance" and standardizing the formats, definitions, quality, etc. of the data. This sector is experiencing rip-roaring growth.
There is a huge need for scientific thinkers and data-savvy people in all three job types but at least half the open positions are in "business analytics." I discuss two particular gaps in skills that hiring managers often complain about in university graduates: (a) inability to develop the question and (b) not knowing how to question the data.
There is a clear reason why such gaps exist. A typical question we pose to students in a problem set first lays out the problem to be solved, then presents the set of data to be used, and finally challenges the students to plug the data into an appropriate method or framework so that the solution to the problem drops out.
The professor is not going to look kindly on the student if s/he criticizes or revamps the question or points out flaws in the data! (University classes teach theory, models and frameworks so this is not surprising.)
This brings me full circle to the distinction between research and industry jobs. In research, you can "choose your battles" by making certain assumptions to move past obstacles. For example, you assume that the (biased) dataset that you obtained is representative - lots of research papers that use observed social-media data do this. You just argue that bias correction is a separate problem to be tackled at some other time, perhaps by some other research team.
In industry, you don't have that luxury. A great solution to the biased problem may turn out to be a horrible solution to the unbiased problem. When I was at SiriusXM, we had some data on people's online listening patterns but almost nothing on their in-car listening. Building great models using the online data isn't going to do much good because most of the listening happens in the car, and people who listen online are quite different from those who listen in the car.
Towards the end of the talk, I pointed out that in order to do well in these data jobs, one must be comforable to live in the "gray" areas. There is the gray between science and social science, between models and heuristics, between data and intuition.
People were very friendly and we had some fine conversations at a bar after the day was over. I'm happy to report that at least a few people have indicated that they want to pursue these industry jobs.
This looks like a very interesting talk, and relevant for the insurance industry (where I work). Are slide available also?
Posted by: Dave C. | 12/07/2018 at 09:55 AM