Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

Merger of the largest black holes detected by Ligo-Virgo-Kagra

State-backed HagyBeacon malware uses AWS Lambda to steal data from SE Asian government

How to protect invisible identity access

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » AI crawlers cause Wikimedia Commons bandwidth demands, surges 50%
Startups

AI crawlers cause Wikimedia Commons bandwidth demands, surges 50%

userBy userApril 2, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Wikimedia Foundation, Wikipedia’s umbrella organization, said Wednesday that bandwidth consumption for multimedia downloads from Wikimedia Commons has skyrocketed 50% since January 2024.

The reason I wrote in my blog post on Tuesday is not due to increased demand from knowledge-hungered people, but from automated data-hungry scrapers trying to train AI models.

“Our infrastructure is built to maintain sudden traffic spikes from humans during high profit events, but the amount of traffic generated by scraperbots is unprecedented, increasing risk and costs,” the post reads.

Wikimedia Commons is a freely accessible repository of images, videos and audio files available under an open license or in the public domain.

Dripping into it, Wikimedia says that it’s almost two-thirds of the most “expensive” traffic (65%), that is, the most resource intensive in terms of the type of content consumed, but from bots. However, only 35% of all PageViews come from these bots. According to Wikimedia, the reason for this disparity is that frequently accessed content remains close to users in cache, while other less frequently accessed content is stored further apart in “core data centers”, where content is more expensive to serve. This is the type of content that bots normally look for.

“Human readers tend to focus on certain (often similar) topics, while crawlerbots tend to “read” more pages, and visit less popular pages as well,” Wikimedia writes. “This means that these types of requests are likely to be forwarded to the core data center, which makes them much more expensive when it comes to resource consumption.”

All the long-term of this is that the Wikimedia Foundation site reliability team must spend a lot of time and resources blocking crawlers to avoid normal user confusion. And all of this before we consider the cloud costs that the foundation faces.

In fact, this represents part of the burgeoning trend that threatens the very existence of the open internet. Last month, software engineer and open source advocate Drew Devault lamented the fact that AI Crawlers ignored the “robots.txt” file, designed to avoid automated traffic. The “practical engineer” also complained last week that AI scrapers from companies such as Meta had driven demands for bandwidth for his own projects.

In particular, open source infrastructure is on the shooting line, but as TechCrunch wrote last week, developers are fighting back with “smartness and vengeance.” Some tech companies are doing a bit to address this issue. CloudFlare, for example, recently launched AI Labyrinth. This is slowing down the crawler using AI generated content.

But it’s a cat and mouse game where many publishers can ultimately force the duck for login and cover behind the paywall.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleThe Nintendo Switch 2 will be released on June 5th and costs $450
Next Article Trump’s national security adviser reportedly uses his personal Gmail account to do government work
user
  • Website

Related Posts

Cognition, the manufacturer of AI coding agent Devin, gets Windsurf

July 14, 2025

Trump administration spends $1 billion on “aggressive” hacking operations

July 14, 2025

Elon Musk’s Groke makes AI companions, including goth anime girls

July 14, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Merger of the largest black holes detected by Ligo-Virgo-Kagra

State-backed HagyBeacon malware uses AWS Lambda to steal data from SE Asian government

How to protect invisible identity access

Asyncrat’s open source code causes a surge in dangerous malware variants around the world

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

ICEX Forum 2025 Opens: FySelf’s TwinH Showcases AI Innovation

The Future of Process Automation is Here: Meet TwinH

Robots Play Football in Beijing: A Glimpse into China’s Ambitious AI Future

TwinH: A New Frontier in the Pursuit of Immortality?

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.