Business Insider (via MSN: link) has a well-researched piece on a particular exploit that led to the appearance of "buy cocaine" type ads on government websites. While explaining this exploit, the article describes the foundations of Google's search advertising business, and is well worth reading.
When we enter a search term in the search bar of a website, we get back a page of results showing the links to webpages relevant to the search term. That's the basic functionality expected of "site search".
For example, I go to randomwebsite.com (fake) and search for "UFOs", and the search engine will retrieve links to any page that contains the word "UFOs" (and any other pages that the search algorithm decides I might be interested in, such as pages containing the singular "UFO", or expanded "Unidentified Flying Object", or even other phrases further afield).
Something else has also happened behind the scenes (not always, but many popular site search technologies do this). The website has gained a new page as a result of my search. The new webpage may have a link that looks like this: randomwebsite.com/search?q=ufos. (If a previous user has already tried the same search before, then the page is merely updated, not made anew.)
People selling drugs figured out that they could insert their ads to any website with this type of search technology. They just need to instruct a bot to execute a search on the targeted website - the search terms will be the entire text of the ad they want you to see, e.g. "to buy cocaine, call David at [number]."
Google's bot scans the web to build a library of webpages that will be matched to search terms. During this process, the search-generated pages enter Google's library. If someone searches "buy cocaine" on Google, they may get sent to the webpage created by David's bot mentioned above. In other words, David exploited this technology to get free advertising. When Google returns this webpage, it also may insert more ads onto the page and make money from those ads.
***
The article - and the industry - likes to call out the Davids as "bad people". But the system has been set up with the door wide open.
One reason why user searches are turned into new webpages is efficiency. People tend to search for the same things. This tactic makes sense if all searches are legitimate. The logic breaks down when some users are not cooperative, i.e. adversarial. Then, the website shouldn't make new webpages for any search terms.
The same type of breakdown appears up and down the system. Google's core algorithm gives preference to well-known, legitimate websites, such as those of government agencies and educational institutions, and these legitimate websites are empowered by the algo to share their reputational wealth with any websites they link to. This was a great idea in the beginning under the assumption that the only reason webmasters make a hyperlink to another page is to endorse the other page as relevant to readers of the current page. Again, this tactic was conceived with only cooperative users in view.
Quickly, marketers realize if they could entice a school or a government agency to link to their websites, Google would raise their ranking and send more visitors. All kinds of schemes came up to create a hyperlinking economy: e.g. "link exchanges", "selling and buying links". These schemes - again - work fine when all participants are cooperative. Of course, the scammers and criminals walk right through the open door.
The latest exploit is quite ingenious, and goes beyond links. Third parties are able to add webpages directly onto these reputable websites (without explicit permission), and they can insert their own advertising content masquerading as search keywords!
Turning searches into new webpages was once an ingenious idea also. This tactic works only if we assume cooperative users only. Assume all website searches come from people entering keywords relevant to one's website. The new page created by the search algorithm directly links those keywords to content on the website, therefore increasing the website's visibility on Google, which drives more traffic to the website. Again, the logic breaks down when adversarial users show up with their own agenda - they still drive up traffic to the reputable websites but the wrong types of visitors show up (e.g. cocaine addicts), and they are immediately diverted elsewhere, rendering this traffic worthless to the legitimate website.
Because cooperative and adversarial users both leverage the same technology, you can't have the cake and eat it too.... unless you're willing to spend a lot of effort trying to differentiate who is cooperative and who is adversarial. This is easier said than done.
***
Many website owners mark these search-generated pages as "out of bounds" for search engines like Google. Google's bot is supposed to ignore all such pages, so they should not show up in search results. As Business Insider reported, this situation changed due to a recent tweak in the Google algorithm. Even those webmasters who instructed Google's bot to ignore those pages are seeing those pages show up in Google results.
Apparently, Google recently decided to ignore the "out of bounds" flag. The company says that it has developed a smarter algo that can decide whether or not those search-generated pages would be useful to Google. This takes control of what shows up in Google away from the website owners to Google itself. They probably underestimated the prevalence of adversarial exploits.
(One wonders whether this change in Google's algo is related to the arms race in large language models - as there is a school of belief that these models get more powerful the more data i.e. text can be fed into its training. Thus arises a hunger for any and all documents, and a reason to want to kill "out of bounds" flags.)
***
When Google first burst on the scene, it was the only popular search engine for which one can't buy placements - its algorithm rates relevance of the page's content. This is totally fine when the company had no business model, and had no path to profitability. After it found profits in search advertising, everything changed - they can no longer ignore the class of adversarial users - who exploit technologies to improve their ranking on Google's search engine.
Hyperlinks are inserted to drive eyeballs, websites expand their pages by caching search results, cocaine dealers can add advertising to legitimate websites Google regards as reputable. Because Google's raison d'etre is digital advertising, it has conflicted interests when it comes to policing such activities.
Comments
You can follow this conversation by subscribing to the comment feed for this post.