« Sitting still against the myth that sitting kills | Main | Three ethical viewpoints applied to the Clinton email scandal »


Feed You can follow this conversation by subscribing to the comment feed for this post.


I'd like to comment on myth #5 (Machines will replace humans).

For any particular analysis, machines will likely replace humans over time. Once we figure out that data structure X, preparation steps Y and algorithm set Z provide an answer to a particular set of problems, there is a huge advantage to standardizing that analysis (i.e. embedding it in automated procedures) so the results can be validly compared across time, across markets, etc.

But for every problem we solve, there are new opportunities for analysis, either because the problem can now be gone into deeper (e.g. not "does advertising pay out" but "how do I optimize my advertising"), or because new data become available, or because there are new domains to explore (e.g. this analysis will work for airline frequent flyers -- how can we adapt it for frequent gambler analysis?).

Let's look at aviation: the big prize in the late 1920's was for flying solo across the Atlantic. Once that engineering problem was solved, it evolved into bigger issues -- how do we get lots of people across the Atlantic? Where do we locate waystations? How can we figure out how to lose lots of luggage? What sort of pricing model should be used? (etc, etc)


Hi Professor Fung,

I'd like to comment something on Myth2.

I agree to the argument that the codes are only the tools and we need not much complex and difficult knowledges about the computer science. Usually we should focus on the problems and use Google and other sources to find the codes as well as solutions we needed and solved the problems.

However, we can use a lot of different ways to solve one specific problem. When the data is not so huge, it will make maybe no difference. But when the amount of data is very huge, the computer will spend a lot of time on running the algorithm. At this time, maybe the complex knowledge on data structure or computer science will play an important role? Because the huge amount of data, any nuance will result in huge difference.

But I am not very sure. My experience in the data science is somewhat limited. Above are only some naive ideas :)



XG: Sure, for truly large problems, or problems in which you have a microscopic amount of time to do computations, coding skill becomes much more important. But despite what the media reports, most real-world problems do not have those characteristics. Copying and pasting ready-made code is not a bad thing - most people who cook at home consult recipes.

zbicyclist: I like to draw a distinction between engineering problems and statistical problems. Because machines (of today's vintage) operate on, to use your terminology, "data structure X and preparation steps Y and algorithm set Z," which presumably have been "figured out," these machines do not handle uncertainty. They may be able to deal with uncertainty that can be described by a probability model but that is a severe restriction.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep