Something truly game-changing might have happened. According to TheNextWeb, a judge has told Linkedin (now owned by Microsoft) that it cannot stop people from using scripts to scrape public data off the Linkedin website.
I have mentioned this issue before - web scraping is in some kind of legal gray area. Web scrapers appear like human beings to a website, and these bots just exist to collect data off the website. It's clear that owners of many websites (Kayak, Linkedin, ESPN, etc.) do not like web scraping. They implement technologies to block such bots.
There are several legitimate business reasons for opposing web scraping. For example, a retailer like Amazon does not want competitors to know all of its pricing. The downside of convenience - 24/7 shopping - is that it reduces information asymmetry. In the pre-Internet days, a store can run special prices in certain locations without word spreading to other locations - but this is no longer the case. Information asymmetry allows profits to materialize.
Besides, the data might have been purchased for real money from some vendor. In this case, the vendor will forbid the buyer from publishing all the data, otherwise the vendor loses revenues when some potential customer scrapes it off the buyer's website. The buyer also has an incentive not to give way the data assets, say, to a potential competitor.
From the web scraper's perspective, the data are publicly displayed so why should there be a restriction on usage? For social-media sites, the data are shared by users, not generated by the site owners, and so ownership of the data is confused. When site owners include a prohibition on web scraping on their terms and conditions (knowing full well that most people do not bother reading them), this creates at minimum an ethical issue, and possibly a legal issue. However, with the Linkedin ruling, the cloud may be clearing.
Comments