Websites typically try to control how AI bots crawl their data by updating a text file called the Robots Exclusion Protocol, or robots.txt. This file has governed how bots have scraped the web for decades. It’s not illegal to ignore robots.txt, but before the age of AI it was generally considered part of the web’s social code to follow the instructions in the file. Since the advent of AI scraping agents, many websites have tried to limit unwanted crawling by editing their robots.txt files. Services like AI agent watchdog Dark Visitors offer tools to help website owners keep up with the ever-growing number of bots they might want to block, but they’re limited by a big loophole: unscrupulous companies tend to simply ignore or avoid robots.txt commands.
According to Dark Visitors founder Gavin King, most of the major AI agents still comply with robots.txt. “It was pretty consistent,” he says. But not all website owners have the time or knowledge to constantly update their robots.txt files. And even when they do, some bots will bypass the file directives: “They’re trying to disguise the traffic.”
Prince says blocking Cloudflare bots won’t be a command this kind of bad actor can ignore. “Robots.txt is like putting up a ‘no trespassing’ sign,” he says. “It’s like having a physical wall patrolled by armed guards.” Just as it flags other types of suspicious online behavior, such as price-stealing bots used for illegal price monitoring, the company has created processes to detect even the most careful the hidden AI robots.
Cloudflare is also announcing an upcoming marketplace for customers to negotiate scraping terms with AI companies, whether it involves paying for content use or bartering credits for using AI services in exchange for scraping. “We don’t really care what the transaction is, but we think there has to be some way to get value back to original content creators,” says Prince. “The compensation doesn’t have to be in dollars. Compensation can be credit or recognition. It could be a lot of different things.”
There is no set date for the launch of this marketplace, but even if it does appear this year, it will join an increasingly crowded field of projects designed to facilitate licensing agreements and other permission agreements between AI companies, publishers, platforms and other websites.