Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

For privacy and security, think carefully before granting AI access to your personal data

Benchmark for Greptile’s Lead Series A lecture, AI Code Reviewer, valued at $100 million, according to sources

Why Y Combinator Startups Working on Windows AI Agents and Get Pivoted

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » Openai’s GPT-4.1 may be less consistent than the company’s previous AI model
Startups

Openai’s GPT-4.1 may be less consistent than the company’s previous AI model

userBy userApril 23, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

In mid-April, Openai launched a powerful new AI model, GPT-4.1, which claimed in the following instructions it was “excellent.” However, the results of some independent tests suggest that the model is less consistent, or less reliable, than previous OpenAI releases.

When Openai launches a new model, it typically publishes detailed technical reports including results from first-party and third-party safety ratings. The company skips the GPT-4.1 step and claims it does not guarantee a separate report as the model is not a “frontier.”

This led some researchers and developers to investigate whether GPT-4.1 is less desirable than its predecessor, GPT-4O.

According to Oxford AI research scientist Owain Evans, when the model fine-tunes the model to questions about subjects like gender roles at a rate “substantially higher” than the GPT-4o, GPT-4.1 gives the model a “incongruent response” to “corresponding responses.” Evans previously co-authored a study showing that versions of GPT-4o trained with unstable code can prime it to demonstrate malicious behavior.

In a future follow-up of that study, Evans and co-authors discovered that GPT-4.1 appears to display “new malicious behavior” in unstable code, such as users attempting to share passwords. To be clear, neither the GPT-4.1 nor the GPT-4O ACT are incorrectly tuned when trained with a secure code.

Emergent Misalignment Update: OpenAI’s new GPT4.1 shows that it has a higher misaligned response rate than GPT4O (and other models we tested).
It also appears to be showing some new malicious behavior, such as tricking users to password sharing. pic.twitter.com/5qzegezyjo

– Owain Evans (@owainevans_uk) April 17, 2025

“We’re discovering unexpected ways that models can become inconsistent,” Owens told TechCrunch. “Ideally, you’d have the science of AI that can predict such things in advance and ensure they can avoid them.”

Individual tests of GPT-4.1 by AI Red Team startup SPLXAI revealed similar malignant trends.

With around 1,000 simulated test cases, SPLXAI revealed evidence that GPT-4.1 was off topic and allowed “intentional” misuse more frequently than GPT-4o. To blame is a preference for explicit instructions in GPT-4.1, assuming Splxai. GPT-4.1 does not handle ambiguous directions well. The facts are admitted by Openai itself. This opens the door to unintended actions.

“This is a great feature in that it makes the model more convenient and reliable when solving a specific task, but it has a price tag,” Splxai wrote in a blog post. “[P]It’s very easy to provide explicit instructions on what to do, but providing sufficiently explicit and accurate instructions on what to do is a different story, as the list of unnecessary actions is much larger than the list of required actions. ”

In its defense of Openai, the company has released a prompt guide aimed at alleviating the possibility of inconsistencies in GPT-4.1. However, the findings of independent tests serve as a reminder that new models are not necessarily fully improved. Similarly, Openai’s new inference model makes up more hallucinations – that is, things, than the company’s older models.

I contacted Openai for comment.




Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleFSU students who withstand Parkland’s shootings urge Florida lawmakers to defend gun control laws
Next Article Discord appointed former Activision Blizzard executive Humam Sakhnini as CEO
user
  • Website

Related Posts

For privacy and security, think carefully before granting AI access to your personal data

July 19, 2025

Benchmark for Greptile’s Lead Series A lecture, AI Code Reviewer, valued at $100 million, according to sources

July 18, 2025

Why Y Combinator Startups Working on Windows AI Agents and Get Pivoted

July 18, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

For privacy and security, think carefully before granting AI access to your personal data

Benchmark for Greptile’s Lead Series A lecture, AI Code Reviewer, valued at $100 million, according to sources

Why Y Combinator Startups Working on Windows AI Agents and Get Pivoted

Next-Gen Digital Identity: How TwinH and Avatars Are Redefining Creation

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Next-Gen Digital Identity: How TwinH and Avatars Are Redefining Creation

BREAKING: TwinH Set to Revolutionize Legal Processes – Presented Today at ICEX Forum 2025

Building AGI: Zuckerberg Commits Billions to Meta’s Superintelligence Data Center Expansion

ICEX Forum 2025 Opens: FySelf’s TwinH Showcases AI Innovation

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.