Openai’s new GPT-4.1 AI model focuses on coding

Openai launched a new model family on Monday called the GPT-4.1. Yes, “4.1” – it’s as if the company’s nomenclature is already not confusing enough.

There are GPT-4.1, GPT-4.1 MINI, and GPT-4.1 NANO. All of these say “Excel” after coding and instructions. Although available in Openai’s API, the multimodal model, rather than ChatGpt, has a million context windows. This means that you can get around 750,000 words (longer than “war and peace”) in one session.

GPT-4.1 arrives as an effort by Openai rivals like Google and Anthropic Ratchet to build sophisticated programming models. Google has recently released the Gemini 2.5 Pro (also with a million context windows, which ranks highly in popular coding benchmarks. So is the upgraded V3 of Anthropic’s Claude 3.7 Sonnet and Chinese AI startup Deepseek.

Training AI coding models that can perform complex software engineering tasks is the goal of many tech giants, including Openai. Openai’s epic ambition is to create an “agent software engineer” as CFO Sarah Friar said at the Tech Summit in London last month. The company claims that future models can program the entire app end-to-end, dealing with aspects like quality assurance, bug testing and document writing.

GPT-4.1 is a step in this direction.

“We optimized GPT-4.1 for real world use based on real feedback based on real-world feedback, based on real-world feedback. We adhere to front-end coding, less external editing, formatting, reliable format, response structure and ordering, consistent use of tools, etc. “These improvements allow developers to build fairly good agents on real-world software engineering tasks.”

Openai claims that the complete GPT-4.1 model outperforms the GPT-4O and GPT-4O mini models in coding benchmarks, including the SWE bench. The GPT-4.1 Mini and Nano are said to be more efficient and faster at the expense of some degree of accuracy, and Openai says the GPT-4.1 Nano is the fastest and cheapest model.

GPT-4.1 costs $2 per input token and $8 per million output token. GPT-4.1 MINI is 0.40/million input token and $1.60/million output token, while GPT-4.1 NANO is 0.10/million input token and 0.40/million output token.

According to Openai’s internal testing, GPT-4.1 was able to generate more tokens at once than GPT-4O (32,768 vs. 16,384), recording 52%-54.6% in the human-validated subset of the SWE bench. (In a blog post, Openai pointed out that the range of scores cannot be performed because some solutions to the SWE Bench validated issues cannot be implemented on the infrastructure.) These figures are slightly below the scores and humanity reported by Google and humanity on the Gemini 2.5 Pro (63.8%) and Claude 3.7 Sonnet (62.3%) on the same benchmarks respectively.

In another evaluation, OpenAI investigated GPT-4.1 using video MME. It is designed to measure the model’s ability to “understand” the content of a video. GPT-4.1 reached 72% accuracy of the chart-top in the “long, no subtitles” video category, Openai claims.

Although GPT-4.1 scores fairly well on the benchmark and has a recent “knowledge cutoff”, it provides a framework for reference for current events (until June 2024), it is important to note that even some of today’s best models struggle with work that doesn’t trip over professionals. For example, many studies have shown that code generation models often fix and don’t even introduce security vulnerabilities or bugs.

Openai also admits that GPT-4.1 is less reliable (i.e. it could make a mistake), and it has to deal with more input tokens. In one of the company’s own tests, Openai-MRCR, the model’s accuracy has decreased from about 84% to 50% of 1 million tokens at 8,000 tokens. GPT-4.1 also tended to be “literally” than GPT-4o, the company says.

Source link

What's Hot

Hackers use Facebook ads to spread JSCEAL malware via fake cryptocurrency trading apps

Funksec Ransomware Decryptor was published for free after the group was dormant

Skechers make kids shoes with hidden air tag compartments

Openai’s new GPT-4.1 AI model focuses on coding

Skechers make kids shoes with hidden air tag compartments

2 How Uc Berkeley Dropout raised $28 million for AI Marketing Automation Startup

The genai app doubled revenue and grew to 1.7b downloads in the first half of 2025

Hackers use Facebook ads to spread JSCEAL malware via fake cryptocurrency trading apps

Funksec Ransomware Decryptor was published for free after the group was dormant

Skechers make kids shoes with hidden air tag compartments

2 How Uc Berkeley Dropout raised $28 million for AI Marketing Automation Startup

New Internet Era: Berners-Lee Sets the Pace as Zuckerberg Pursues Metaverse

TwinH Transforms Belgian Student Life: Hendrik’s Journey to Secure Digital Identity

Tim Berners-Lee Unveils the “Missing Link”: How the Web’s Architect Is Building AI’s Trusted Future

Dispatch from London Tech Week: Keir Starmer, The Digital Twin Boom, and FySelf’s Game-Changing TwinH

What's Hot

Openai’s new GPT-4.1 AI model focuses on coding

Related Posts