Openai’s GPT-4.1 may be less consistent than the company’s previous AI model

In mid-April, Openai launched a powerful new AI model, GPT-4.1, which claimed in the following instructions it was “excellent.” However, the results of some independent tests suggest that the model is less consistent, or less reliable, than previous OpenAI releases.

When Openai launches a new model, it typically publishes detailed technical reports including results from first-party and third-party safety ratings. The company skips the GPT-4.1 step and claims it does not guarantee a separate report as the model is not a “frontier.”

This led some researchers and developers to investigate whether GPT-4.1 is less desirable than its predecessor, GPT-4O.

According to Oxford AI research scientist Owain Evans, when the model fine-tunes the model to questions about subjects like gender roles at a rate “substantially higher” than the GPT-4o, GPT-4.1 gives the model a “incongruent response” to “corresponding responses.” Evans previously co-authored a study showing that versions of GPT-4o trained with unstable code can prime it to demonstrate malicious behavior.

In a future follow-up of that study, Evans and co-authors discovered that GPT-4.1 appears to display “new malicious behavior” in unstable code, such as users attempting to share passwords. To be clear, neither the GPT-4.1 nor the GPT-4O ACT are incorrectly tuned when trained with a secure code.

Emergent Misalignment Update: OpenAI’s new GPT4.1 shows that it has a higher misaligned response rate than GPT4O (and other models we tested).
It also appears to be showing some new malicious behavior, such as tricking users to password sharing. pic.twitter.com/5qzegezyjo

– Owain Evans (@owainevans_uk) April 17, 2025

“We’re discovering unexpected ways that models can become inconsistent,” Owens told TechCrunch. “Ideally, you’d have the science of AI that can predict such things in advance and ensure they can avoid them.”

Individual tests of GPT-4.1 by AI Red Team startup SPLXAI revealed similar malignant trends.

With around 1,000 simulated test cases, SPLXAI revealed evidence that GPT-4.1 was off topic and allowed “intentional” misuse more frequently than GPT-4o. To blame is a preference for explicit instructions in GPT-4.1, assuming Splxai. GPT-4.1 does not handle ambiguous directions well. The facts are admitted by Openai itself. This opens the door to unintended actions.

“This is a great feature in that it makes the model more convenient and reliable when solving a specific task, but it has a price tag,” Splxai wrote in a blog post. “[P]It’s very easy to provide explicit instructions on what to do, but providing sufficiently explicit and accurate instructions on what to do is a different story, as the list of unnecessary actions is much larger than the list of required actions. ”

In its defense of Openai, the company has released a prompt guide aimed at alleviating the possibility of inconsistencies in GPT-4.1. However, the findings of independent tests serve as a reminder that new models are not necessarily fully improved. Similarly, Openai’s new inference model makes up more hallucinations – that is, things, than the company’s older models.

I contacted Openai for comment.

Source link

What's Hot

Google fixes issue with CVSS 10 Gemini CLI CI RCE and cursor flaw that could allow code execution

U.S.-Europe fusion agreement extends Wendelstein 7-X research for another 10 years

Source: Anthropic could raise new $50 billion round at $900 billion valuation

Openai’s GPT-4.1 may be less consistent than the company’s previous AI model

Source: Anthropic could raise new $50 billion round at $900 billion valuation

Elon Musk can’t escape his own tweets on stage

Meta is still spending money on AR/VR

Google fixes issue with CVSS 10 Gemini CLI CI RCE and cursor flaw that could allow code execution

U.S.-Europe fusion agreement extends Wendelstein 7-X research for another 10 years

Source: Anthropic could raise new $50 billion round at $900 billion valuation

Elon Musk can’t escape his own tweets on stage

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

Openai’s GPT-4.1 may be less consistent than the company’s previous AI model

Related Posts