Openai has launched the ChatGPT Agent, an upgrade to the flagship artificial intelligence (AI) model that is equipped with virtual computers and integrated toolkits.
These new tools allow agents to perform complex, multi-step tasks that do not allow previous iterations of ChatGPT. Control your computer and complete the tasks.
This powerful version still relies heavily on human opinion and supervision, but arrived just before Mark Zuckerberg announced that Meta researchers had observed a unique AI model showing signs of independent self-improvement. It was also released just before Openai released the GPT-5. This is the latest version of Openai’s chatbot.
You might like it
With the CHATGPT agent, users can now ask large language models (LLMs) to not only perform analysis and collect data, but also to act on that data, Openai’s representative said in a statement.
For example, you can evaluate your calendar to agents, briefly explain upcoming events and reminders, study corpus of data, or summarise it as a pure overview or slide deck. While traditional LLM can search and serve recipes for Japanese-style breakfasts, ChatGpt agents can fully plan and purchase the same breakfast ingredients for a certain number of guests.
However, while the new model is highly capable, it still faces many limitations. Like all AI models, their spatial reasoning is weak and they struggle with tasks such as planning physical routes. They also lack the ability to refer to previous interactions beyond true persistent memory, current processing of information, or immediate context without reliable recalls.
However, the ChatGpt agent shows a significant improvement in Openai’s benchmark. In the final exam of humanity, the AI benchmark, an AI benchmark that assesses the ability of models to respond to expert-level questions in many areas, more than doubled the accuracy percentage (41.6%) and Openai O3 against toolless Openai O3s (20.3%).
Related: Openai’s “smartest” AI model was explicitly told to shut down – and it refused
It also performed much better than other OpenAI tools, and the version itself lacking tools like browsers and virtual computers. In the world’s most challenging mathematics benchmarks, Frontiermath, ChatGpt agents, and tools complements once again exceeded larger margins than previous models.
The agent is built on three pillars derived from previous Openai products. One leg is the “operator”. This is an agent that uses its own virtual browser to deploy the web for users. The second is “deep search,” which is constructed to investigate and synthesize large amounts of data. The final piece of the puzzle was an earlier version of ChatGPT itself, excelling in the flow of conversation and presentation.
“We’re a professor at Morgan State University and director of the Data Engineering and Predictive Analytics (DEPA) Research Lab,” said Kofinalco, professor at Morgan State University.
Nyarko quickly emphasized that the new agent is not yet autonomous. “Hastisfied, user interface vulnerabilities, or misunderstandings can lead to errors. Built-in protective guards, such as permission prompts and disruptions, are essential, but not sufficient to completely eliminate the risk.”
The risk of advancing AI
Openai itself acknowledges the dangers of new agents and the increased autonomy. Company representatives say ChatGpt agents are “highly biological and chemically capable” and claim they may assist in the creation of chemical or biological weapons.
Compared to existing resources such as chemistry labs and textbooks, AI agents represent what biosecurity experts call the “competence escalation pathway.” AI can use countless resources to instantly integrate that data, merge knowledge across science fields, provide iterative troubleshooting such as expert mentoring, navigate supplier websites, fill out order forms, and bypass basic validation checks.
Virtual computers also allow agents to interact autonomously with files, websites and online tools, and can do much more potential harm if misused. Opportunities for data breaches or data manipulation, and opportunities for false behavior, such as financial fraud, are amplified in the case of rapid injection attacks or hijacking.
As Nyarko pointed out, these risks add to the implicit risks of traditional AI models and LLM.
“There are wider concerns across AI agents, such as how the way agents behave autonomously amplifies errors, introduces biases from public data, complicates the framework of responsibility, and unintentionally promotes psychological dependence,” he said.
In response to the new threat posed by the more matured models, Openai engineers are strengthening many safeguards, a company representative said in a statement.
These include threat modeling, dual use refusal training (models are taught to reject harmful requests for data that may analyze weaknesses by attacking the system themselves). However, a risk management assessment conducted in July 2025 by Saferai, a safety-focused nonprofit organization, called Openai’s Risk Management Policy, awarded a score of 33% out of 100%. Openai also recorded only C grades in the AI Safety Index, compiled by the leading AI Safety Company, Future of Life Institute.
Source link