On Thursday, OpenAI released GPT-5.4. This is a new foundation model that is being touted as “the most capable and efficient frontier model for professional work.” In addition to the standard version, GPT-5.4 is also available as an inference model (GPT-5.4 Thinking) or a model optimized for high performance (GPT-5.4 Pro).
The API version of this model will be available in a context window of 1 million tokens, the largest context window ever available in OpenAI.
OpenAI also stated that GPT-5.4 was able to solve the same problem with significantly fewer tokens than its predecessor, highlighting improved token efficiency.
The new model offers significantly improved benchmark results, including record scores for computer usage benchmarks OSWorld-Verified and WebArena Verified. The new model also achieved a record score of 83% on OpenAI’s GDPval test for knowledge work tasks.
GPT-5.4 also led in Mercor’s APEX-Agents benchmark, designed to test professional skills in law and finance, according to a statement from Mercor CEO Brendan Foody.
“[GPT-5.4] “It excels at creating long-term deliverables such as slide decks, financial models, and legal analysis, delivering the highest performance while running faster and cheaper than competing Frontier models,” Foody said in a statement.
GPT-5.4 continues the company’s efforts to limit hallucinations and factual errors. OpenAI said the new model was 33% less likely to have an error on individual claims and 18% less likely to have an error in the overall response compared to GPT 5.2.
tech crunch event
San Francisco, California
|
October 13-15, 2026
As part of the release, OpenAI overhauled the way the GPT-5.4 API version manages tool calls and introduced a new system called Tool Search. Previously, the system prompt would lay out the definitions of all available tools when calling a model, but this process could consume large amounts of tokens as the number of available tools increased. The new system allows models to search for tool definitions on demand, making requests faster and cheaper in systems with many available tools.
OpenAI also includes new safety assessments to test the thought chain of your models. This is a running commentary provided by the model to demonstrate the thought process through a multi-step task. AI safety researchers have long worried that inference models can misrepresent an AI’s chain of thought, and tests have shown that under the right circumstances, this can happen.
A new OpenAI assessment shows that deception is less likely to occur in the Thinking version of GPT-5.4, “suggesting that the model lacks the ability to hide inferences and that CoT monitoring remains an effective safety tool.”
Source link
