Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

149 hacktivist DDoS attacks hit 110 organizations in 16 countries after Middle East conflict

X taps William Shatner to distribute an invitation to his payment service X Money

Father sues Google, claiming Gemini chatbot drove son into deadly delusions

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » AI crawlers cause Wikimedia Commons bandwidth demands, surges 50%
Startups

AI crawlers cause Wikimedia Commons bandwidth demands, surges 50%

userBy userApril 2, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Wikimedia Foundation, Wikipedia’s umbrella organization, said Wednesday that bandwidth consumption for multimedia downloads from Wikimedia Commons has skyrocketed 50% since January 2024.

The reason I wrote in my blog post on Tuesday is not due to increased demand from knowledge-hungered people, but from automated data-hungry scrapers trying to train AI models.

“Our infrastructure is built to maintain sudden traffic spikes from humans during high profit events, but the amount of traffic generated by scraperbots is unprecedented, increasing risk and costs,” the post reads.

Wikimedia Commons is a freely accessible repository of images, videos and audio files available under an open license or in the public domain.

Dripping into it, Wikimedia says that it’s almost two-thirds of the most “expensive” traffic (65%), that is, the most resource intensive in terms of the type of content consumed, but from bots. However, only 35% of all PageViews come from these bots. According to Wikimedia, the reason for this disparity is that frequently accessed content remains close to users in cache, while other less frequently accessed content is stored further apart in “core data centers”, where content is more expensive to serve. This is the type of content that bots normally look for.

“Human readers tend to focus on certain (often similar) topics, while crawlerbots tend to “read” more pages, and visit less popular pages as well,” Wikimedia writes. “This means that these types of requests are likely to be forwarded to the core data center, which makes them much more expensive when it comes to resource consumption.”

All the long-term of this is that the Wikimedia Foundation site reliability team must spend a lot of time and resources blocking crawlers to avoid normal user confusion. And all of this before we consider the cloud costs that the foundation faces.

In fact, this represents part of the burgeoning trend that threatens the very existence of the open internet. Last month, software engineer and open source advocate Drew Devault lamented the fact that AI Crawlers ignored the “robots.txt” file, designed to avoid automated traffic. The “practical engineer” also complained last week that AI scrapers from companies such as Meta had driven demands for bandwidth for his own projects.

In particular, open source infrastructure is on the shooting line, but as TechCrunch wrote last week, developers are fighting back with “smartness and vengeance.” Some tech companies are doing a bit to address this issue. CloudFlare, for example, recently launched AI Labyrinth. This is slowing down the crawler using AI generated content.

But it’s a cat and mouse game where many publishers can ultimately force the duck for login and cover behind the paywall.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleThe Nintendo Switch 2 will be released on June 5th and costs $450
Next Article Trump’s national security adviser reportedly uses his personal Gmail account to do government work
user
  • Website

Related Posts

X taps William Shatner to distribute an invitation to his payment service X Money

March 4, 2026

Father sues Google, claiming Gemini chatbot drove son into deadly delusions

March 4, 2026

Who needs a data center in space when you can float it offshore?

March 4, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

149 hacktivist DDoS attacks hit 110 organizations in 16 countries after Middle East conflict

X taps William Shatner to distribute an invitation to his payment service X Money

Father sues Google, claiming Gemini chatbot drove son into deadly delusions

Coruna iOS exploit kit uses 23 exploits across 5 chains targeting iOS 13 to 17.2.1

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2026 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.