Deepseek may have trained the latest models using Google’s Gemini

Last week, Chinese lab Deepseek released an updated version of the R1 Reasoning AI model that works well with many mathematics and coding benchmarks. The company did not reveal the source of the data it used to train the models, but some AI researchers speculate that at least partially came from AI in Google’s Gemini family.

Sam Paech, a Melbourne-based developer who creates AI’s “emotional intelligence” assessment, has published what he claims is evidence that Deepseek’s latest model has been trained for output from Gemini. The Deepseek model, called the R1-0528, prefers words and expressions similar to Google’s Gemini 2.5 Pro favours, Paech said in the X-Post.

If you’re wondering why the new Deepseek R1 sounds a little different, I think they’ve probably switched from training with synthetic Openai to synthetic Gemini output. pic.twitter.com/oex9roapnv

– Sam Paech (@sam_paech) May 29, 2025

It’s not a smoking gun. However, he pointed out that another developer, the trace of the Deepseek model, the pseudonym creator of AI’s “free speech assessment,” called SpeechMap, the “thinking” that the model generates when it works towards conclusions, “read like traces of Gemini.”

Deepseek has previously been accused of training on data from rival AI models. In December, developers observed that Deepseek’s V3 model often identifies as ChatGpt, Openai’s AI-powered Chatbot platform, suggesting that it may be trained in the ChatGPT chat log.

Earlier this year, Openai told the Financial Times that it found evidence linking Deepseek to the use of distillation. According to Bloomberg, Microsoft, a collaborator and investor at Openai, detected a large amount of data was being excluded through its Openai developer account in late 2024. Openai believes it is affiliated with Deepseek.

Distillation is not an uncommon practice, but Openai’s terms of service prohibit customers from using company model output to build competing AI.

To be clear, many models misidentify themselves and converge to the same word and phrases of turn. This is because Open Web, a place where AI companies source most of their training data, is scattered with AI slops. Content Farms are using AI to create ClickBait, and bots are flooding Reddit and X.

This “contamination” made it extremely difficult to thoroughly filter the AI output from the training dataset if so.

Still, AI experts like Nathan Lambert, a researcher at the non-profit AI Institute AI2, don’t think Deepseek trained data from Google’s Gemini out of trouble.

“If I were Deepseek, I would definitely create a ton of synthetic data from the best API models out there,” Lambert wrote in X’s post.[DeepSeek is] Shorten the GPU and wash it off with cash. It’s literally more efficient for them more calculations. ”

If I were deepseek, I would definitely create a ton of synthetic data from the best API models out there. They are short on the GPU and flush with cash. It’s literally more efficient for them more calculations. Yes, about Gemini Distill’s questions.

– Nathan Lambert (@Natolambert) June 3, 2025

In some cases, AI companies are increasing their security measures to prevent distillation.

In April, OpenAI began requesting organizations to complete the identity verification process to access certain advanced models. This process requires a government-issued ID from one of the countries supported by Openai’s API. China is not on the list.

Elsewhere, Google recently launched a “summary” of traces generated by models available through the AI Studio Developer Platform. In May, humanity said it would begin summarizing traces of its own model, citing the need to protect “competitive benefits.”

I will contact Google for comment and update this article if I receive a reply.

Source link

What's Hot

Europol dismantles SIM farm network running 49 million fake accounts worldwide

Wikipedia says AI search summaries and social videos are causing traffic decline

This top VC bets nearly 20% of its money on teenagers – here’s why

Deepseek may have trained the latest models using Google’s Gemini

Wikipedia says AI search summaries and social videos are causing traffic decline

This top VC bets nearly 20% of its money on teenagers – here’s why

YouTubers are no longer dependent on ad revenue — how some YouTubers are diversifying

Europol dismantles SIM farm network running 49 million fake accounts worldwide

Wikipedia says AI search summaries and social videos are causing traffic decline

This top VC bets nearly 20% of its money on teenagers – here’s why

YouTubers are no longer dependent on ad revenue — how some YouTubers are diversifying

Immortality is No Longer Science Fiction: TwinH’s AI Breakthrough Could Change Everything

The AI Revolution: Beyond Superintelligence – TwinH Leads the Charge in Personalized, Secure Digital Identities

Revolutionize Your Workflow: TwinH Automates Tasks Without Your Presence

FySelf’s TwinH Unlocks 6 Vertical Ecosystems: Your Smart Digital Double for Every Aspect of Life

What's Hot

Deepseek may have trained the latest models using Google’s Gemini

Related Posts