This is amazing. The number of crawlers that don’t respect crawling directives is out of hand. Feed them garbage till they can’t pay their bills.
On the one hand this is good news, because by allowing stealing data from websites we would just financially kill those who produce legitimate data (e.g. news sites having articles stolen without receiving ad revenue as only bots visit them).
On the other hand - AI tools will get dumber, which makes me sad because I personally hoped for having one day a supercharged assistant that reads and summarises Internet for me. Tools like Perplexity creating research on given topics looked very promising
Calling it “stealing” is a stretch. The website is just serving up a page to a server. There is no theft in that. Is it fair to journalism? Not really but by your definition any person who views a page is stealing by sending the request to the server.
Unless you are saying that it is somehow theft to deprive them of revenue. In that case it would be theft just by walking into a store and not buying what the store wants you to buy.
Imagine a website showing a weather forecast. Maintainers of this page are running a webserver and a service that analyse raw meteorological data to calculate tomorrow’s weather. In exchange they are making money out of ads.
Now there is an AI agent that enters that page on behalf of user, gets the forecast and show it back to user. User never sees an ad, maintainer never sees his revenue. How is that not stealing?