Openai announced the new AI “reasoning” model O3-mini on Friday.
Openai first previewed a model in December with a more competent system called O3, but its release is an extremely important moment for a company where the ambitions and issues seem to be growing by the day. I’m coming.
Openai is fighting the recognition that Openai is a place of concessions in AI races for Chinese companies, such as DeepSeek, which claims to steal the IP. We are trying to strengthen our relationship with Washington to pursue ambitious data center projects at the same time and build one of the largest funding rounds in history.
It takes us to O3-mini. Openai sells new models as both “powerful” and “affordable prices”.
“Today’s launch mark […] Openai spokeswoman told TechCrunch, an Openai spokeswoman said.
More efficient inference
Unlike most large language models, inference models like O3-mini thoroughly check themselves before producing results. This helps avoid some pitfalls that normally stumble. These inference models take some time to reach the solution, but trading off tends to be more reliable in domains like physics -not perfect. 。
O3-mini has been fine-tuned about STEM issues, especially for programming, mathematics, and science. Openai claims that this model is almost equivalent to the O1 family, an O1 family, in terms of abilities, but is faster and costs lower.
The company has argued that external testers prefer O3-mini’s answer more than half than O1-mini answers. O3-mini seems to have reduced “major mistakes” by 39 % in “tough real world questions” in the A/B test.
O3-mini will be available to all users via Chatgpt from Friday, but users paying Openai’s Chatgpt Plus and team plan will get a higher price limit of 150 queries per day. 。 Chatgpt PRO subscribers get unlimited access, and O3-mini comes to Chatgpt Enterprise and Chatgpt Edu customers in a week. (I have no words about Chatgpt Gov).
Users with a premium plan can select O3-mini using the Chatgpt drop-down menu. Free users can click or tap the new “reason” button on the chat bar, or “play” Chatgpt.
From Friday, O3-mini can select developers via Openai API, but are not currently supported by image analysis. The developer selects the level of “reasoning effort” (low, medium, or high) and sets O3-mini to “work harder” or “think” based on the needs of use case and latency.
The price of O3-mini is $ 1 million, a cache input token of $ 1 million, $ 4.40 per million output tokens, and 1 million tokens equivalent to about 750,000 words. This is 63 % cheaper than O1-mini, competing with the price setting of DeepSeek’s R1 reasoning model. Deepseek charges 1 million cache input tokens and $ 2.19 per 2.19 output tokens for R1 access through API.
Chatgpt states that O3-mini is set for moderate inference efforts, and Openai provides a “trade-off that has a balanced speed and accuracy”. Paid users have an option to select “O3-Mini-High” in the model picker, which provides what is called “higher intelligence” in exchange for Openai’s slow response.
Regardless of whether the O3-Mini Chatgpt user version is selected, the model works with search to find the latest answers using links to the relevant Web source. Openai warns that the function is “prototype” because this function works to integrate searches throughout the inference model.
“O1 is a wide range of general knowledge inference models, but O3-mini offers special alternatives for technical domains that require accuracy and speed,” says Openai on a blog post on Friday. I am. “The release of O3-mini is another step in Openai’s mission, which pushes up the cost-effective intelligence boundary.”
There are many warnings
O3-mini is not the strongest model of Openai so far, but not all benchmarks on Deepseek R1 reasoning models.
O3-mini breaks R1 with Aime 2024. This is a test that allows the model to understand the complex instructions and measure whether to respond, but only with high inference efforts. In addition, I broke R1 with a programming -centered test SWE bench (.1 points), which is only a high inference effort. In low inference initiatives, O3-mini rug R1 slows R1 on GPQA diamonds that test models, including pHD-level physics, biology, and chemistry questions.
In order to make fairness, O3-mini answers many queries with low competitive costs and delays. In the post, Openai compares its performance with the O1 family.
“O3-mini achieves the same performance as O1-mini due to low inference efforts, and in moderate efforts, O3-mini achieves the same performance as O1,” Openai wrote. “O3-mini, which has moderate inference efforts, matches the performance of mathematics, coding, and scientific O1 while providing faster answers. On the other hand, O3-mini is O1- because of high inference efforts. Both MINI and O1 are exceeded.
Note that O3-mini’s performance in O1 is slim in some areas. In Aime 2024, O3-mini breaks O1 with only 0.3 % points and sets it for a high reasoning effort. In GPQA diamonds, O3-mini does not exceed the O1 score with high reasoning efforts.
OPENAI claims that O3-mini is more “safe” or safety than the O1 family, but thanks to the reddish effort and its “deliberate consistency” method, models are modeled on Openai’s safety policy. “Think”. Query. According to the company, O3-mini “is much higher than one of the GPT-4O, one of the Openai flagship models, regarding” challenging safety and jailbreak evaluation “.
TechCrunch has a newsletter focusing on AI! Sign up here and get it on the reception tray every Wednesday.
Source link