Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

Well, I’m a little less angry about the “Magnificent Ambersons” AI project

Dozens of people march in support of billionaire in San Francisco

From Svedka to Anthropic, brands are boldly leveraging AI in their Super Bowl ads

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » New projects make Wikipedia data more accessible to AI
Startups

New projects make Wikipedia data more accessible to AI

userBy userOctober 1, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

On Wednesday, Wikimedia Deutschland announced a new database that will make Wikipedia’s rich knowledge more accessible to AI models.

Called the Wikidata Embedding Project, the system applies a technique consisting of nearly 120 million entries to existing data on Wikipedia and its sister platforms, a technique that helps computers understand the meaning and relationships between words.

Combined with new support for Model Context Protocol (MCP), a standard that helps AI systems communicate with data sources, this project makes data more accessible to LLMS natural language queries.

The project was carried out by the German branch of Wikimedia in collaboration with neural search company Jina.ai and DataStax, a real-time training DATA company owned by IBM.

Wikidata has been providing machine-readable data from the Wikimedia property for many years, but existing tools now only allow keyword searches, SPARQL queries, and special query languages. The new system works well by providing developers with the opportunity to ground the model with knowledge verified by Wikipedia editors, thanks to a searched generation (RAG) system that allows AI models to draw in external information.

The data is configured to provide important semantic contexts. For example, queriing a database of the term “scientists” creates a list of prominent nuclear scientists and scientists who worked at Bell Lab. There is also the translation of the word “scientist” into a different language, the image of scientists in the workplace that has cleared Wikimedia, and extrapolation to related concepts such as “researcher” and “scholar.”

The database is published on Toolforge. Wikidata is also holding a webinar for developers of interest on October 9th.

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

This new project is because AI developers are rushing to a high-quality data source that they can use to fine-tune their models. Training systems are more refined and often assembled as complex training environments rather than simple data sets, but require closely curated data to function. The need for reliable data is particularly urgent for deployments that require high accuracy, and some overlook Wikipedia, but that data is significantly more oriented than catch-all datasets like Common Crawl, a large collection of web pages scraped off the entire internet.

In some cases, driving high-quality data can have expensive consequences for AI labs. In August, humanity offered to settle a lawsuit with the group of authors whose works were being used as training material by agreeing to pay $1.5 billion to end allegations of fraud.

In a statement to the media, Wikidata AI Project Manager Philip Saade highlighted his project’s independence from major AI labs or large high-tech companies. “The launch of this embedded project shows that strong AI doesn’t need to be controlled by a small number of companies,” Saadé told reporters. “It could be open, supportive and built to serve everyone.”


Source link

#Aceleradoras #CapitalRiesgo #EcosistemaStartup #Emprendimiento #InnovaciónEmpresarial #Startups
Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleCyber ​​training in schools will improve, but recovery time will be slower
Next Article The epic game says it reduces user drop-off by 60% with Apple’s new installation process
user
  • Website

Related Posts

Well, I’m a little less angry about the “Magnificent Ambersons” AI project

February 8, 2026

Dozens of people march in support of billionaire in San Francisco

February 8, 2026

From Svedka to Anthropic, brands are boldly leveraging AI in their Super Bowl ads

February 8, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Well, I’m a little less angry about the “Magnificent Ambersons” AI project

Dozens of people march in support of billionaire in San Francisco

From Svedka to Anthropic, brands are boldly leveraging AI in their Super Bowl ads

OpenClaw integrates VirusTotal scanning to detect malicious ClawHub skills

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2026 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.