Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

How Brex is catching up to AI by embracing “confusion”

Investigation: Anti-homelessness laws don’t work

In the US, urban gondolas face uphill battles

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » Open Source LLM hits the European roadmap of digital sovereignty
Startups

Open Source LLM hits the European roadmap of digital sovereignty

userBy userFebruary 16, 2025No Comments11 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The Large Language Model (LLMS) reached the European Digital Sovereignty Agenda last week. This is because news of a new programme has been revealed that it will develop a series of “truly” open source LLMs covering all European Union languages.

This includes the current 24 official EU languages ​​and the languages ​​of countries currently negotiating for entry into EU markets such as Albania. Future Proof is the name of the game.

Openeurollm is co-led with around 20, co-led by Jan Hajič, a computational linguist at Charles University in Prague, and Peter Sarlin, CEO and co-founder of Finland’s AI Lab Silo AI, which AMD acquired for $665 million last year. It’s collaboration between organizations. .

The project fits the broader narrative that allows Europe to push digital sovereignty as a priority, bringing mission-critical infrastructure and tools closer to home. While most of the cloud giants have invested in local infrastructure to ensure that EU data stays local, AI Darling Open has recently been a new product that allows customers to process and store data in Europe has been announced.

Elsewhere, the EU recently signed a $11 billion deal to create a sovereign satellite constellations comparable to Elon Musk’s Starlink.

So OpenEurollm is certainly a brand.

However, the budget written solely for building the model itself is 37.4 million euros, with about 20 million euros coming from the EU’s Digital Europe program. This is a decline in the ocean compared to what corporate AI world giants are investing in. The actual budget is probably the largest cost calculated, considering the tangential direction and funds allocated for the related work. Partners for the OpenEurollm project include EuroHPC Supercomputer Centres in Spain, Italy, Finland and the Netherlands. Additionally, the broader EuroHPC project budget is around 7 billion euros.

However, a vast number of different participating parties, spanning academia, research and corporations, have come to question whether the goal is achievable. Anastasia Stasenko, co-founder of LLM Company Pleias, questioned whether “a vast consortium of over 20 organizations” could have a measured focus for homemade private AI companies.

“Europe’s recent success in AI shines through small concentrated teams such as Mistral AI and Lighton. “They are quickly responsible for their choices, including finances, market positioning, reputation and more. .”

Until scratch

The OpenEurollm project starts from scratch or head start depending on how you look at it.

Since 2022, Hajič has also coordinated the High Performance Language Technology (HPLT) project, which aims to develop free and reusable datasets, models, and workflows using High Performance Computing (HPC). Masu. According to Hajič, the project is expected to close in the second half of 2025, but according to Hajič, considering most of HPLT’s partners (except for UK partners) are also participating here, it is a “predecessor” for Openeurollm. can be considered as.

“this [OpenEuroLLM] It’s a really broad participation, but it’s focused on the Generation LLM,” Hajichu said. “It doesn’t start from scratch in terms of data, expertise, tools and calculation experiences. We’ve brought together people who know what they’re doing. We’ve done quickly and quickly You should be able to raise it.”

Hajichu said he expects the first version to be released by mid-2026, once the final iteration arrives by the conclusion of the 2028 project. Push beyond the Barebone github profile.

“In that respect, we are starting from scratch – the project began on Saturday. [February 1]Hajichi said. “But we’ve been preparing a project for a year. [the tender process opened in February 2024]. ”

From academia and research, the organization spanning the Czech Republic, the Netherlands, Germany, Sweden, Finland and Norway is part of the Openeurollm Cohort, in addition to the EuroHPC Centre. From the corporate world, Silo AI is powered by AMD, an AI lab owned by Finland, as well as Aleph Alpha (Germany), Ellamind (Germany), Prompsit Language Engineering (Spain), and Lighton (France).

One notable omission from the list is the omission of the French AI Unicorn Mistral. It has established itself as open source to replace current positions such as Openai. No one on Mistral responded to TechCrunch for comments, but Hajič confirmed that he tried to start a conversation with the startup, but to no avail.

“I tried to approach them, but it didn’t bring about a focused discussion about their participation,” Hajich said.

The project is limited to EU organizations, but it will still be able to attract new participants as part of the funding EU programme. This means that UK and Swiss entities will not be able to participate. This, in contrast to the Horizon R&D programme, the UK re-joined in 2023 after a long Brexit stalemate and funded HPLT.

build up

The top line goal of the project is to follow its catchphrases to create a “single foundational model of transparent AI in Europe.” Furthermore, these models must maintain the “linguistic and cultural diversity” of all EU languages, namely the present and future.

This means a core multilingual LLM designed for general purpose tasks where accuracy is most important, although it is still ironed to translate from an artifact standpoint. Also, for edge applications where efficiency and speed are probably more important, the “quantized” version will be smaller.

“This is something we still have to plan in detail,” Hajich said. “We want it to be small but as high quality as possible. From a European perspective, it’s a high stake and there’s a lot of money coming from the European Commission, so it’s public money. Because of this, we don’t want to release something half-baked.”

The goal is to make the model as skilled as possible in all languages, but achieving full equality can also be difficult.

“That’s the goal, but it’s the question of how successful you can be in a language that lacks digital resources,” Hajichu said. “But that’s why we want to have a true benchmark for these languages, and we’re not going to shake up towards benchmarks that probably don’t represent the language and the culture behind them.”

As for data, this is where we prove that much of the work of the HPLT project is fruitful with version 2.0 of the dataset released four months ago. The dataset trains 4.5 petabytes of web crawl and over 20 billion documents, and Hajič said it will add additional data from Common Crawl (the open repository for Web Crawled Data) to the mix.

Open Source Definition

In traditional software, the enduring struggle between open source and its own revolves around the “true” meaning of “open source.” This can be resolved by postponing the formal “definition,” as well as industry managers of legitimate open source licenses, according to open source initiatives.

Recently, OSI has formed the definition of “open source AI,” but not everyone is happy with the outcome. The open source AI proponents argue that not only models but datasets, prerequisite models, weights and full-shevans should be available. The definition of OSI does not require training data. This is because AI models are often trained with their own data or with redistribution limits.

It is enough to say that Openeurollm faces these same difficulties, and despite its intention to be “really open”, it is likely that there is no compromise in order to fulfill its “quality” obligations It won’t be.

“The goal is to open everything up. Now, of course, there are some limitations,” Hajich said. “We want to have the best possible model. We can use whatever we can get based on European copyright instructions. We cannot redistribute them, but some of them are Some of the items can be saved for future inspections.”

What this means is that while the Openerollm project may need to wrap and retain some of the training data, it will audit according to the requirements required for high-risk AI systems under the terms of the EU AI Act You may need to make it available to people.

“We hope for most of the data [will be open]data that comes from the crawls, especially common,” Hajichu said. “We want it all to be fully open, but I understand. Either way, we have to follow AI regulations.”

One is two

Another criticism revealed in the aftermath of Openeurollm’s official announcement was a very similar project that began in Europe just a few months ago. Eurollm, which launched its first model in September and began following up in December, was jointly funded by the EU, along with a consortium of nine partners. These include academic institutions such as the University of Edinburgh, and companies such as Unbabel, which last year gained millions of GPU training hours on EU supercomputers.

Eurollm shares almost a name-like goal. “Builds a large-scale open-source language model for Europe that supports 24 official European languages, and builds several other strategically important languages.”

Andre Martins, research director at Unbabel, joined social media to highlight these similarities. “We hope that various communities will openly collaborate, share their expertise and not decide to reinvent the wheels every time a new project is funded,” writes Martins.

Hajichu called the situation “unfortunate” and emphasized that because of its funding sources in the EU, Openeurollm is limited in terms of cooperation with non-EU entities, including the UK, but he said that they can cooperate. He added that he wanted it. University.

Funding gap

The arrival of China’s Deepshek and the performance-to-performance ratio it promised encouraged that the AI ​​initiative could do much more with far less than originally thought. However, over the past few weeks, many have questioned the real costs involved in building Deepseek.

“As for Deepseek, we really don’t know much about what exactly happened to build it,” Peter Sarlin, technical co-leading of the Openeurollm project, told TechCrunch .

Anyway, Sarlin believes that Openeurollm will have plenty of money available, as it is to cover many people. In fact, the majority of the cost of building AI systems is calculated, and most should be covered through partnerships with the EuroHPC Centre.

“I would say Openeurollm actually has a very important budget,” says Sarlin. “EuroHPC has invested billions in AI, calculated the infrastructure and committed billions more to expand it over the next few years.”

It is also worth noting that the Openeurollm project is not built for consumer or corporate grade products. It’s purely about the model, and this is why Sarlin thinks he’s fully considering the budget it has.

“The intention here is not to build chatbots or AI assistants. It’s a product initiative that requires a lot of effort, and that’s what ChatGpt did well,” says Sarlin. “What we are contributing to is an open source fundamental model that acts as an AI infrastructure for European companies to build. We know what is necessary to build a model, but that’s what you can do. It’s not something you need for billions.”

Since 2017, Sarlin has been at the forefront of AI Lab Silo AI and has collaborated with others, including the HPLT project, to launch a family of Poro and Viking Open models. Although these already support several European languages, the company is currently preparing the next iteration “Europa” model, covering all European languages.

And this is linked to the whole concept that Hajiche supports: “it doesn’t start from scratch.” There is already a foundation of expertise and technology.

Sovereign nation

As critics have pointed out, Openeurollm has many moving parts. This is a positive outlook, but it acknowledges it.

“I’ve been involved in a lot of collaborative projects, and I think it has more advantages than a single company,” he said. “Of course they’ve done great things to Mistral with things like Openai, but I hope that the combination of academic expertise and corporate focus can bring something new.”

And in many ways, it’s not about trying to beat big tech or billion-dollar AI startups. The ultimate goal is digital sovereignty: (almost) open foundation LLM built by Europe.

“I hope this is not the case, but if we have a ‘good’ model in the end, rather than a number one model, then there are still models that contain all the components based in Europe. ” Hajichi said. “This will be a positive outcome.”


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleRussian drones strike Ukrainian power plants, leaving residents cold | News of the Russian-Ukraine War
Next Article ICC Champions Trophy 2025: Teams, Schedules, Venues, Tickets, Streaming Methods | Cricket News
user
  • Website

Related Posts

How Brex is catching up to AI by embracing “confusion”

July 6, 2025

Act 2 of Drive Capital – How Columbus Ventures Success After Split

July 5, 2025

Don’t ask the blue ski toll ruble, it’s a toll for you

July 5, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

How Brex is catching up to AI by embracing “confusion”

Investigation: Anti-homelessness laws don’t work

In the US, urban gondolas face uphill battles

Act 2 of Drive Capital – How Columbus Ventures Success After Split

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

TwinH: A New Frontier in the Pursuit of Immortality?

Meta’s Secret Weapon: The Superintelligence Unit That Could Change Everything 

Unlocking the Power of Prediction: The Rise of Digital Twins in the IoT World

TwinH: Digital Human Twin Aims for Victory at Break the Gap 2025

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.