Updated 2:40pm PT: Hours after the release of GPT-4.5, Openai removed the line from the AI model white paper, which said “GPT-4.5 is not a frontier AI model.” The new white paper for GPT-4.5 does not include that line. You can find a link to the old white paper here. The original article is as follows:
Openai announced on Thursday that it will release GPT-4.5, the highly planned AI model code name Orion. The GPT-4.5 is the largest model of OpenAI to date, and is trained with more computing power and data than any of the company’s previous releases.
Despite its size, Openai said in its white paper that it does not consider the GPT-4.5 to be a frontier model.
Openai’s $200 monthly plan ChatGpt Pro subscribers will be able to access GPT-4.5 on CHATGPT starting Thursday as part of the survey preview. Starting today, GPT-4.5 developers of the Openai API can also use GPT-4.5. Regarding other ChatGPT users, customers who signed up for ChatGpt Plus and the ChatGpt team should acquire the model within the next week, an Openai spokesperson told TechCrunch.
The industry has held collective breath for Orion. Some consider this to be a survival bell for traditional AI training approaches. GPT-4.5 was developed using the same important technique. This dramatically increased the amount of computing power and data during the “pretraining” phase, known as unsupervised learning.
In all GPT generations prior to GPT-4.5, scale-ups have brought a huge jump to overall domain performance, including mathematics, writing, coding. In fact, Openai says that the increase in the size of GPT-4.5 gave it “deep world knowledge” and “higher emotional intelligence.” However, there are indications that the benefits from data scaling and computing starting to level up are off. In some AI benchmarks, GPT-4.5 has not reached the new AI “inference” model of Chinese AI company Deepseek, humanity, Openai itself.
The GPT-4.5 is also very expensive to run, Openai is so expensive that the company says it is evaluating whether to continue with GPT-4.5 with its API over the long term. To access the GPT-4.5 API, Openai charges developers $75 for every input token (approximately 750,000 words) and $150 for every million output token. Compare it with the GPT-4o. This is just $2.50 per input token and $10 per million output token.
“GPT -4.5 shares GPT -4.5 as a research preview to better understand its strengths and limitations,” Openai said in a blog post she shared with TechCrunch. “We’re still exploring what it can do and want to see how people are using it in ways we may not have expected.”
Mixed Performance
Openai emphasizes that GPT-4.5 is not a drop-in exchange for the company’s flagship model, GPT-4O, which bolsters most of its API and ChatGPT. GPT-4.5 supports features such as uploading files and images and ChatGPT’s Canvas tool, but currently there is no functionality such as support for ChatGPT’s realistic two-way audio mode.
In the plus row, the GPT-4.5 performs better than the GPT-4O and many other models.
Openai’s SimpleQA benchmark tests AI models with simple de facto questions, with GPT-4.5 surpassing GPT-4O and OpenAI inference models O1 and O3-MINI in terms of accuracy. According to Openai, GPT-4.5 hallucinates less frequently than most models. In theory, it should be less likely to make things.
Openai did not list Deep Research, one of the best performance AI inference models for SimpleQa. An Openai spokesperson told TechCrunch that it had not publicly reported Deep Research’s performance on the benchmark, claiming it was not a related comparison. In particular, the deep research model of AI startup Perplexity, which is similarly performed on other benchmarks to Openai’s deep research, outperforms GPT-4.5 in this de facto accuracy test.

In a subset of coding issues, GPT-4.5, the SWE bench validation benchmark, is roughly consistent with the performance of GPT-4O and O3-MINI, but does not reach the deep study of Openai and Anthropic’s Claude 3.7 Sonnet. In another coding test, GPT-4.5, OpenAI’s SWE Lancer benchmark, which measures the ability of AI models to develop full software capabilities, outperforms GPT-4O and O3-MINI, but falls short of deep research.


GPT-4.5 has not reached the performance of major AI inference models such as the O3-Mini, Deepseek’s R1, and Claude 3.7 Sonnet (technically a hybrid model). However, GPT-4.5 is the match or best of non-rational models that have not been led in these same tests, suggesting that the model works well in mathematics and science-related problems.
Openai also argues that GPT-4.5 is qualitatively superior to other models in areas where benchmarks do not capture well, like its ability to understand human intentions. The GPT-4.5 responds with a warmer, more natural tone and works well for creative tasks such as writing and design.
In one unofficial test, Openai urged GPT-4.5 and two other models, GPT-4O and O3-MINI, to create a unicorn for SVG. The GPT-4.5 was the only AI model that created anything similar to a unicorn.

In another test, Openai asked GPT-4.5 and the other two models to respond to the “struggling after failing the test” prompt. Although GPT-4O and O3-MINI provided useful information, the response of GPT-4.5 was most socially appropriate.
“[W]In a blog post, Openai wrote:

Scaling Act challenged
Openai claims that GPT -4.5 is “on the frontier of what is possible with unsupervised learning.” While that may be true, the model limitations appear to confirm speculation from experts that pre-training “scaling methods” will not continue to be retained.
Ilya Sutskever, co-founder and former chief scientist of Openai, said in December that “we achieved peak data,” and that “pretraining is a thing as we know it’s definitely going to finish.” His comments reflect concerns that AI investors, founders and researchers shared with TechCrunch for functionality in November.
In response to pre-training hurdles, the industry, including OpenAI, employs inference models that take longer than irrational models to perform tasks, but tend to be more consistent. We believe that AI Labs can significantly improve the functionality of the model by increasing the time and computational power that AI inference models use to “think” through problems.
Openai will eventually combine the GPT series models with the “O” Reasoning series, which will begin with the GPT-5 later this year. The GPT-4.5 reportedly was very expensive to train, delayed several times and failed to meet internal expectations, and could not get the AI benchmark crown alone. But Openai may consider it a stepping stone towards something much stronger.
Source link