Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
What's Hot

Biden Camp denies cancer diagnosed previously amid claims of cover-up | Politics News

US judges show that deportation to South Sudan is likely to violate court orders | Donald Trump News

Top tech startup funding news for today, May 20, 2025

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
Fyself News
Home » Openai’s codex is part of a new cohort of agent coding tools
Startups

Openai’s codex is part of a new cohort of agent coding tools

userBy userMay 20, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Last Friday, Openai introduced a new coding system called Codex. It is designed to perform complex programming tasks from natural language commands. Codex moves Openai to a new cohort of agent coding tools that are just beginning to take shape.

From early Github co-pilots to modern tools like Cursor and Windsurf, most AI coding assistants work as highly intelligent autocompletes. Tools generally reside in an integrated development environment, where users interact directly with AI-generated code. The chances of returning when you simply assign a task and finish it are almost out of reach.

However, these new agent coding tools, led by products such as Devin, Swe-Agent, Openhands and the aforementioned Openai Codex, are designed so that users don’t have to look at the code. The goal is to run like an engineering team manager, assign problems through workplace systems such as Asana and Slack, and check in when they reach the solution.

For those who believe in the highly capable form of AI, it is the next logical step in the natural progression of automation that takes over more and more software tasks.

“Initially, people just wrote the code by pressing every keystroke,” explains Kilian Lieret, a Princeton researcher and member of the SWE-Agent team. “Github Copilot was the first product to offer a genuine automatic complete like Stage 2. You’re still in the loop, but sometimes you can take shortcuts.”

The goal of an agent system is to move completely beyond the developer environment, presenting problems to the coding agent instead, leaving them to solve their own. “We pull things back to the management layer, where we just assign a bug report and the bots try to fix it completely autonomously,” says Rieret.

It is an ambitious purpose, and so far it has proven difficult.

After Devin became generally available at the end of 2024, it sparked fierce criticism from YouTube experts and more measured criticism from early clients on Answer.ai. The overall impression was familiar to veterans who coded the atmosphere. With so many errors, supervising a model requires as much work as doing tasks manually. (Devin’s development was a bit rocky, but fundraising has not stopped them from realizing the possibility. In March, Devin’s parent company, Cognition AI, reportedly raised hundreds of millions of dollars at a $4 billion valuation.)

Even technology advocates are aware of the unsupervised atmosphere and view new coding agents as a powerful element in the human surveillance development process.

“Now, for a foreseeable future, humans must step into code review time to look into written code,” says Robert Brennan, CEO of All Hands AI, who maintains open hands. “I’ve seen some people fall into confusion just by automatically approving all the code the agents are writing.

Hallucinations are also an ongoing issue. When asked about the API released after the OpenHands agent’s training data cutoff, Brennan recalls one incident in which the agent created API details that fit the description. All hands say they are working on the system to catch these hallucinations before causing harm, but there are no easy fixes.

Perhaps the best measure of agent programming progress is the SWE Bench Leaderboard, which allows developers to test their models against an open set of issues in the open GitHub repository. Currently, OpenHands holds the top spot on the verified leaderboard, solving 65.8% of the problem set. Openai claims that Codex-1, one of the Codex-powered models, will be better, listing a score of 72.1% in its announcement, but the score comes with some warnings and has not been independently verified.

A concern for many in the tech industry is that high benchmark scores do not necessarily lead to true handoff agent coding. If the agent coder can only solve three of the four problems, particularly when tackling complex systems at multiple stages, critical monitoring from human developers is required.

Like most AI tools, we hope that the Foundation model’s improvements will be at a steady pace, and that ultimately, the agent coding system will grow into a reliable developer tool. But finding ways to manage hallucinations and other reliability issues is important to getting there.

“I think it has a slightly healthy barrier effect,” says Brennan. “The question is how much trust can you transfer to your agent.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleIran’s Khamenei slumps ‘nonsense’ US nuclear demands | Nuclear Weapons News
Next Article AWS default IAM role is known to allow for lateral movement and cross-service exploitation
user
  • Website

Related Posts

Mask says Tesla’s autonomous driving test will be geofienated into the “safeest” part of Austin

May 20, 2025

There are 6 days to save $900 to destroy 2025 tickets

May 20, 2025

Google I/O 2025: Everything announced at this year’s developer conference

May 20, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Biden Camp denies cancer diagnosed previously amid claims of cover-up | Politics News

US judges show that deportation to South Sudan is likely to violate court orders | Donald Trump News

Top tech startup funding news for today, May 20, 2025

Trump says we will put weapons in space as part of the “Golden Dome” plan | Military News

Trending Posts

Biden Camp denies cancer diagnosed previously amid claims of cover-up | Politics News

May 21, 2025

US judges show that deportation to South Sudan is likely to violate court orders | Donald Trump News

May 20, 2025

Trump says we will put weapons in space as part of the “Golden Dome” plan | Military News

May 20, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Top tech startup funding news for today, May 20, 2025

Microsoft Build 2025: All the big AI announcements you need to know

Has Openai wasted $3 billion on Windsurf, or could it have been able to build the same thing using the new open source AI editor in VS Code?

Beer 2.0: Meme Coin Brewing Something Big in Solana

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.