Big Sites Say No to Apple’s AI Scraping

Big Sites Say No to Apple's AI Scraping

Since robots.txt has to be manually edited and there are so many new AI agents making their debut, it can be difficult to keep an up-to-date block list. “People just don’t know what to block,” says Dark Visitors founder Gavin King. Dark Visitors offers a free service that automatically updates a client site’s robots.txt, and King says publishers make up a large portion of his clients because of copyright issues.

Some outlets specifically note that they block AI scrapers because they currently have no partnerships with their owners. “We block Applebot-Extended across all Vox Media properties, as we have done with many other AI scanning tools when we don’t have a commercial agreement with the other party,” said Lauren Starke, Vox Media’s senior vice president of communications. “We believe in protecting the value of our published work.”

Others will only describe their reasoning with vague but crude! “The team has determined that there is no benefit at this point in allowing Applebot-Extended access to our content,” says Gannett Chief Communications Officer Lark-Marie Antón.

Meanwhile, The New York Times, which is suing OpenAI for copyright infringement, has been critical of the nature of the rejection of Applebot-Extended and its ilk. “As the law and The Times’ own terms make clear, deleting or using our content for commercial purposes is prohibited without our prior written permission,” said NYT Director of External Communications Charlie Statlander, noting that the Times will continue to add unauthorized bots to your list of blocks as you find them. “Importantly, copyright law still applies whether technical blocking measures are in place or not. Stealing copyrighted material is not something that content owners should get away with.”

It’s unclear whether Apple is any closer to making deals with the publishers. If or when this happens, however, the implications of any licensing or data sharing arrangements may be visible in robots.txt files even before they are publicly announced.

“I find it fascinating that one of the most significant technologies of our era is being developed and the battle for its training data is being played out in this really obscure text file, publicly available to all of us,” says Gillum.

Leave a Reply

Your email address will not be published. Required fields are marked *