Many software developers believe that AI Web-Crawling bots are internet cockroaches. Some developers have begun to fight back in clever and humorous ways.
Any website has bad crawler behavior (although it may remove the site, it may be “disproportionately” affected by open source developers, written by Niccolò Venerandi, owner of Librenews, a Linux desktop developer known as Plasma.
By its nature, sites hosting free and open source (FOSS) projects tend to share more infrastructure publicly and have fewer resources than commercial products.
The problem is that many AI bots do not respect the robot exclusion protocol robot.txt file. This is a tool that tells bots what is not a crawl originally created for search engine bots.
In a January “Cry for Help” blog post, Foss developer Xe Iaso explained how Amazonbot ruthlessly pounded it on the Git Server website until it caused a DDOS outage. The GIT server hosts FOSS projects so anyone who needs it can download or contribute code.
However, the bot ignored Iaso’s robot.txt, hid behind other IP addresses, pretending to be another user, Iaso said.
“Blocking AI Crawler bots is useless because they lie, change user agents, use the home’s IP address as a proxy, etc,” lamented Iaso.
“They scrape your site until it falls, then they scrape it a little more. They click on all links on every link, and then they display the same page over and over again.
Enter the God of the Tomb
So Iaso fought back with his intelligence and built a tool called Anubis.
Anubis is an inverse proxy proof of work check that must be passed before a request can hit the git server. It blocks bots, but is possible through human-controlled browsers.
Interesting part: Anubis is the name of the god in Egyptian mythology who leads the dead to judgement.
“Anubis weighed your soul (heart). If it is heavier than the feather, your heart will be eaten and the mega is dead,” Eso told TechCrunch. If the web request passes the task and is determined to be human, a cute anime photo will announce success. The drawing is “my view on an anhumanized anubisation,” says Iaso. If it’s a bot, the request will be denied.
The hard-appointed project is spreading like a wind among the Foss community. Iaso shared it on Github on March 19th, bringing together 2,000 stars, 20 contributors and 39 forks in just a few days.
Revenge as a defence
The instant popularity of Anubis shows that Iaso’s pain is not unique. In fact, Benellandi shared the story after the story:
The founder CEO of SourceHut Drew Devault said, “mitigating large hyper-aggressive LLM crawlers from 20-100% of my time every week,” explaining that he “experiences dozens of outages per week.” Jonathan Corbet, a well-known FOSS developer who runs the Linux industry news site LWN, warned that his site was slowing down by DDOS-level traffic “from the AI scraper bot.” Kevin Fenzi, Sysadmin of the giant Linux Fedora project, said that the AI scraperbot became so aggressive that it had to block access across the Brazilian country.
Venerandi tells TechCrunch that he knows several other projects experiencing the same problem. One of them said, “At some point, all Chinese IP addresses had to be temporarily banned.”
Venerandi says developers should just dodge AI bots “even having to ban the entire country” and “even having to ban the entire country” for a while.
Beyond the weight of the soul of web requesters, other developers believe that vengeance is the best defense.
A few days ago, in hacker news, user Xyzal suggested that they load an article “Bucket article on the benefits of bleach” or “Article on the positive effects of catching measles on performance in bed” along with the “bucket article.”
“I think we need to aim for a bot to access the trap and get the negative utility value, not just the zero value,” Xyzal explained.
It just so happened that in January, an anonymous creator known as “Aaron” released a tool called Nepenthes that aimed to do just that. Confine crawlers in an endless maze of fake content. This is because the goals the developers have admitted to ARS Technica are offensive if they are not malicious at all. This tool is named after a carnivorous plant.
And CloudFlare, perhaps the biggest commercial player that provides some tools to dodge AI crawlers, released a similar tool last week called AI Labyrinth.
CloudFlare aims to slow, confuse and waste the resources of AI crawlers and other bots that don’t respect the “no crawl” instructions that we discussed in our blog post. CloudFlare said it was malfunctioning the AI crawler.
Source’s Devault told TechCrunch: “Nepenthes has a satisfying sense of justice, as it gives nonsense to the crawlers and poisons the wells, but ultimately, Anubis is the solution that worked for his site.”
However, Devault issued a heartfelt plea of the public for a more direct revision. “Stop justifying either LLMS or AI Image Generator or Github co-pilot or this garbage.
The possibility is Zilch, so Foss developers in particular are fighting back with a touch of smartness and humor.
Source link