Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
What's Hot

Trump says we will put weapons in space as part of the “Golden Dome” plan | Military News

With fewer ordinances, seminaries have found ways to serve young professionals in other fields

Beer 2.0: Meme Coin Brewing Something Big in Solana

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
Fyself News
Home » Openai’s codex is part of a new cohort of agent coding tools
Startups

Openai’s codex is part of a new cohort of agent coding tools

userBy userMay 20, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Last Friday, Openai introduced a new coding system called Codex. It is designed to perform complex programming tasks from natural language commands. Codex moves Openai to a new cohort of agent coding tools that are just beginning to take shape.

From early Github co-pilots to modern tools like Cursor and Windsurf, most AI coding assistants work as highly intelligent autocompletes. Tools generally reside in an integrated development environment, where users interact directly with AI-generated code. The chances of returning when you simply assign a task and finish it are almost out of reach.

However, these new agent coding tools, led by products such as Devin, Swe-Agent, Openhands and the aforementioned Openai Codex, are designed so that users don’t have to look at the code. The goal is to run like an engineering team manager, assign problems through workplace systems such as Asana and Slack, and check in when they reach the solution.

For those who believe in the highly capable form of AI, it is the next logical step in the natural progression of automation that takes over more and more software tasks.

“Initially, people just wrote the code by pressing every keystroke,” explains Kilian Lieret, a Princeton researcher and member of the SWE-Agent team. “Github Copilot was the first product to offer a genuine automatic complete like Stage 2. You’re still in the loop, but sometimes you can take shortcuts.”

The goal of an agent system is to move completely beyond the developer environment, presenting problems to the coding agent instead, leaving them to solve their own. “We pull things back to the management layer, where we just assign a bug report and the bots try to fix it completely autonomously,” says Rieret.

It is an ambitious purpose, and so far it has proven difficult.

After Devin became generally available at the end of 2024, it sparked fierce criticism from YouTube experts and more measured criticism from early clients on Answer.ai. The overall impression was familiar to veterans who coded the atmosphere. With so many errors, supervising a model requires as much work as doing tasks manually. (Devin’s development was a bit rocky, but fundraising has not stopped them from realizing the possibility. In March, Devin’s parent company, Cognition AI, reportedly raised hundreds of millions of dollars at a $4 billion valuation.)

Even technology advocates are aware of the unsupervised atmosphere and view new coding agents as a powerful element in the human surveillance development process.

“Now, for a foreseeable future, humans must step into code review time to look into written code,” says Robert Brennan, CEO of All Hands AI, who maintains open hands. “I’ve seen some people fall into confusion just by automatically approving all the code the agents are writing.

Hallucinations are also an ongoing issue. When asked about the API released after the OpenHands agent’s training data cutoff, Brennan recalls one incident in which the agent created API details that fit the description. All hands say they are working on the system to catch these hallucinations before causing harm, but there are no easy fixes.

Perhaps the best measure of agent programming progress is the SWE Bench Leaderboard, which allows developers to test their models against an open set of issues in the open GitHub repository. Currently, OpenHands holds the top spot on the verified leaderboard, solving 65.8% of the problem set. Openai claims that Codex-1, one of the Codex-powered models, will be better, listing a score of 72.1% in its announcement, but the score comes with some warnings and has not been independently verified.

A concern for many in the tech industry is that high benchmark scores do not necessarily lead to true handoff agent coding. If the agent coder can only solve three of the four problems, particularly when tackling complex systems at multiple stages, critical monitoring from human developers is required.

Like most AI tools, we hope that the Foundation model’s improvements will be at a steady pace, and that ultimately, the agent coding system will grow into a reliable developer tool. But finding ways to manage hallucinations and other reliability issues is important to getting there.

“I think it has a slightly healthy barrier effect,” says Brennan. “The question is how much trust can you transfer to your agent.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleIran’s Khamenei slumps ‘nonsense’ US nuclear demands | Nuclear Weapons News
Next Article AWS default IAM role is known to allow for lateral movement and cross-service exploitation
user
  • Website

Related Posts

Apple will retain WWDC from June 9th to June 13th

May 20, 2025

It was worth more than $1 billion, so Microsoft-Backed Builder.ai is short on money

May 20, 2025

Zoox on Amazon begins testing AVS at Atlanta and continues with Waymo

May 20, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Trump says we will put weapons in space as part of the “Golden Dome” plan | Military News

With fewer ordinances, seminaries have found ways to serve young professionals in other fields

Beer 2.0: Meme Coin Brewing Something Big in Solana

Wall Street Ponke launches AI tools, learning hubs and over $300,000 in hours

Trending Posts

Trump says we will put weapons in space as part of the “Golden Dome” plan | Military News

May 20, 2025

Thousands of Gaza children face imminent deaths under the siege of Israel: United Nations | Israeli-Palestinian conflict news

May 20, 2025

British government suspends free trade talks with Israel over the Gaza War | Israeli-Palestinian conflict news

May 20, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Beer 2.0: Meme Coin Brewing Something Big in Solana

Wall Street Ponke launches AI tools, learning hubs and over $300,000 in hours

New Scanner Technology in Stock Market Guide shows historical track record for each trade setup found

Which casino games will be the biggest in the future?

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.