Openai gave a week to test a system that allows you to perform tasks independently on the new AI agent, operators, and the Internet.
The operator is the closest to the AI agent’s high -tech industry. This is a system that can automate the boring part of life, and we release us to do what we really love. However, from my experience with Openai agents, the truly “autonomous” AI system is still out of reach.
Openai has trained a new model to an electric power operator that combines the visual understanding of GPT-4O and the O1 inference ability.
The model seems to be suitable for basic tasks. I saw the operator clicked on the button, navigated the menu of the website, and filled out the form. AI succeeds in taking action independently and works much faster than web -based agents seen from human and Google.
But during my trial, I realized that I would support Openai agents more than I wanted. I felt like I was coaching an operator through each problem, but I wanted to fully push a specific task from the plate.
During the test, I frequently answered some questions, gave permission, filled in personal information, and needed to support the agents when the agents were stuck.
From a car’s point of view, the operator is like driving a car with a cruise control. Sometimes I remove my feet from the pedal and drive the car itself, which is far from a full -fledged autopilot.
In fact, Openai says that the frequent pause of the operator is due to the design.
AI power operators do not work independently for a long time, like AI -powered chatbots like Openai’s Chatgpt, and are likely to fall into the same type of hallucinations. Therefore, Openai does not want to give the system the decision -making power or highly confidential user information. It may be a safe choice by Openai, which reduces the practicality of the operator.
Nevertheless, Openai’s first agent is an impressive concept and interface for AI, which can use the front -ends of all websites. However, in order to create a truly independent AI system, high -tech companies need to build more reliable AI models that do not need so many steering.
A little “practical”
My operator -track was in line with the week when I was moving in the apartment, so I helped the logistics moving on the Openai agent.
I asked the operator to help buy a new parking permit. Openai’s agent said to me, “I’m sure,” and opened a window on the browser on the PC screen.
After that, the operator searched for a parking permit in San Francisco in a browser and took him to the correct city website and even the right page.
The operator can use the remaining part of the computer while working. This is something that Google’s project mariner cannot say. This is because the Openai agent is not actually working on a computer, but the cloud somewhere in the cloud.

For parking permit, I had to give the operator a lot of permission to start a variety of processes. I also stopped asking my name, phone number, and email addresses to fill in personal information in the form. Occasionally, the operator got lost, controlled the browser, and forced the agent to put it on track.
In another test, I asked the operator to make a reservation at a Greek restaurant. For that trust, the operator found a good place in my area at an affordable price. However, I had to answer more than half the question over the whole flow.

If you need to intervene more than six times just to make a reservation via AI agent, when is it easy to do yourself? That’s the question I often asked myself while I was testing the operator.
Agent AS-A-Platform
In some tests, I met a website that blocked the operator for some reason. For example, I tried to book an electric engineer using Taskrabbit, but the Openai agent told me that I had an error and asked if I could use an alternative service instead. Expedia, Reddit, and YouTube have blocked AI agents accessing the platform.
However, other services are accepting operators by crossing their arms. Instacart, Uber, and EBAY have cooperated with Openai to launch the operator so that agents can navigate the website on behalf of humans.
These businesses are preparing for the future where the user interaction subset is promoted by the AI agent.
In an interview with TechCrunch, Daniel Danker, the highest product manager of Instacart, said in an interview with TechCrunch. “Operators are potentially one of these entry points.”
The use of Instacart’s website on behalf of Openai agents seems to separate Instacart from customers. However, dunkers say they want to meet their customers wherever they are.
“We are really bullish for our belief that agent system has a significant impact on how consumers have an agent system, like Openai,” in an interview with TechnaCrunch. Nitzan Mekel-Bobrov, Chief AI, states.
Even if the AI agent has gained popularity, MEKEL-BOBROV hopes that users will always come to the ebay website and states that “the online destination will not go anywhere.”
Trust problem
After the operator caused a hallucinations several times, there were some issues that trusted the operator, and it costs hundreds of dollars.
For example, I asked an agent to find a parking garage near a new apartment. Eventually, we decided to propose two garages that would take a few minutes to walk.

In addition to getting out of my price range, the garage was actually far from my apartment. One was a 20 -minute walk and the other was a 30 -minute walk. After all, the operator had a wrong address.
This is exactly why Openai does not provide access to credit card numbers, passwords, or email to agents. If Openai did not intervene here, the operator would have wasted hundreds of dollars in the parking lot I didn’t need.
Such hallucinations are important obstacles to useful autonomous agents. This allows you to remove troublesome tasks from the plate. If you are easy to make a basic mistake, or, especially if you are more likely to make mistakes with the actual result, you do not trust the agent.
Along with the operator, Openai seems to have built some impressive tools to make the AI system browse the web. However, there are not so many tools, until the base AI can surely ask the user. Until then, humans are stuck to support agents, but not. And such a thing makes a point.
TechCrunch has a newsletter focusing on AI! Sign up here and get it on the reception tray every Wednesday.
Source link