I was going to write about the showdown between Google and Bing that shook up the tech world a few weeks ago but that would be superfluous as John Langford has already written an excellent post, which coincides with my point of view.
Let me provide the background to the dispute, and then leave you in John's capable hands.
***
Around 1997, Google emerged from the labs of Stanford University, with a revolutionary new algorithm to rank Web pages according to their relationship to other Web pages. In brief, a page is given a higher rank if it has more "friends" and if its friends are more "influential". The technology is considered a scintillating example of creating a business on top of some probability/statistics theory. It has since become the dominant player in the search engine business.
In 2009, Microsoft launched a "new" search engine branded Bing. Microsoft has suffered in the shadow of Google for many years in the search engine sector, but despite the long odds, Bing has taken some market share (although not very much) from Google (see link). Bing, of course, also has an algorithm underlying it, and has the advantage of learning from what all others have done in the intervening years.
Recently, Google engineers publicly mocked Microsoft engineers for "plagiarizing" their search engine. They charged that Microsoft collects data on what users are searching for and where they landed after the searches (maybe via Internet Explorer, maybe via Bing toolbar), data then used by the Bing algorithm to decide which search results to display. The Google engineers described a sort of sting operation to demonstrate the effect (on made-up queries and web pages).
From the perspective of someone using a search engine, one might be excited by Bing's incorporation of this new source of data. There is no doubt that such relevant information will improve the ability the search engines to rank web pages. Why wouldn't one use this data? (Whether Microsoft should collect this information from users quietly is a different matter, as John pointed out.)
From the perspective of an engineer developing algorithms, one might feel disturbed that one's work can be so easily "reverse-engineered". One could see how the engineers felt "robbed in daylight". This issue is larger than Google or Bing.
An analogous though not exact situation is the attempt to "repair" or "trick" credit-scoring algorithms, as described in Chapter 2 of Numbers Rule Your World, where I explain why it is often better to keep these algorithms locked up. Public disclosure may not be in the public's best interest. Pretty much any algorithm can be gamed if it is not a "black box".
***
Read John's take, and tell me what you think.
Comments