Last week, I gave some brief remarks at the INFORMS New York Metro Student-Practitioner Forum (link), attended by a large group of enthusiastic students eager to enter the field of data science and analytics. (By the way, if you are at the INFORMS Analytics Conference in Orlando, come and find me. I am speaking on Ethics on Tuesday morning.)
I told the students that it is not too early to dispense of some myths of the data science and analytics career. The sooner they get rid of these wrong ideas, the brighter their future in this field.
Myth 1: Data science & analytics is all about coding and tools.
Wrong. Data science & analytics is all about problem solving. Coding and tools are useful for problem solving but it is more important to be able to frame the business problem, collect good data, understand your data, etc.
Myth 2: Coding is hard.
Wrong. Coding is easy. It may not feel that way. The media tells us that we need coding bootcamps. Many academic degrees focus their training on R, Python, and other coding platforms. The truth is Google, StackExchange and similar websites have made coding as trivial as copying and pasting bits and pieces of text. The hard part is to know what you want to do with the data - once you know that, it is only a few clicks to code it up. [PS. This is not meant to say there aren't good vs. less good coders.]
Myth 3: Data analysis is easy.
Wrong. Data analysis is really hard. The myth that it is easy arises because we have great tools that generate output quickly with a few clicks. But using a standard methodology does not guarantee good results. I have various examples of good methods delivering horrible results in the Prologue to my book Numbersense (link).
Myth 4: Data science & analytics is fun.
Wrong, unless you like laborious and tedious work. As an analyst, you will spend 80 percent of your time doing grunt work, such as diagnosing errors in the data, and correcting data issues. Something as simple as correcting the format of a date may require you to move data through multiple servers (see here). You will need to probe to the deepest level of the data generation process, which frequently means you need to get in the face of engineers and others, who have other pressing concerns. This part of life is to be tolerated in order to enjoy the fun parts.
Myth 5: Machines will replace humans.
To end my comments on a high note, I believe this career is unlikely to get replaced by machines. Here is some food for thought:
Last year, I did an article for 538 about New York City restaurant health care ratings. In the dataset, each restaurant is classified by cuisine type, which has its share of errors. Imagine that there is a business called Ivory that is described as a Thai restaurant in the data. You live next to Ivory, and so you know that it is an Ethiopian restaurant, not Thai.
How would a machine figure out this error? It doesn't. If one wants to argue, one could say that the machine can go and collect a huge dataset, then find all co-mentions of the restaurant Ivory with a cuisine type, eventually compute relative frequencies, and finally select the cuisine with the greatest likelihood of being correct.
So we have humans who need a sample size of one to get a guaranteed correct answer, and machines who need massive data to get a sometimes-wrong answer.