AI coding tools are moving to amazing locations: terminals

For many years, code editing tools such as Cursor, Windsurf, and Github’s Copilot have become the standard for AI-powered software development. However, as agent AI grew stronger and atmospheric coding began, subtle changes have changed how AI systems interact with software.

Instead of working with code, they are interacting directly with the shell of the installed system. This is a major change in how AI-powered software development occurs.

Terminal is best known as the black and white screen that you remember from hacker films of the 90s. This is a very old school way of running programs and manipulating data. It’s not as visually impressive as modern code editors, but it’s a very powerful interface if you know how to use it. Code-based agents can write and debug code, but terminal tools are necessary to get the software from the written code to what is actually available.

The clearest signs of a shift to terminals come from major labs. Since February, Humanity, Deepmind, and Openai have all released command line coding tools (Claude Code, Gemini Code, and CLI Codex, respectively), and are already one of the company’s most popular products.

That shift is easy to overlook, as it works under the same branding as previous coding tools. But under the hood, there was a real change in how agents interact with other computers, both online and offline. Some people think these changes are just beginning.

“Our big bet is that there is a future where 95% of LLM computer interactions are via terminal-like interfaces,” says Mike Merrill, co-creator of benchmark terminal benches, focusing on major terminals.

Terminal-based tools are also becoming unique tools as notable code-based tools are beginning to look unstable. AI code editor Windsurf was torn apart by the duel acquisition, senior executives were hired by Google, and the rest of the company was acquired by recognition. The long-term future of consumer products is uncertain.

At the same time, new research suggests that programmers may be overestimating productivity gains from traditional tools. A METR study testing Windsurf’s main competitor, Cursor Pro, found that developers estimate that tasks can be completed 20%-30% faster, but that the observed process is nearly 20% slower. In short, the code assistant actually spent the programmer’s time.

This leaves companies openings like Warp, which currently holds the top spot on terminal benches. Warp bills itself as an “agent development environment” that is the middle ground between IDE programs and command line tools like Claude Code.

However, Warp founder Zach Lloyd is still bullish at the terminal, seeing it as a way for code editors like Cursor to tackle issues out of scope.

“Terminals take up a very low level in the developer stack, making them the most versatile place to use running agents,” says Lloyd.

To understand how new approaches differ, it can be helpful to look at the benchmarks used to measure them. The creation of the tool’s codebase focused on solving GitHub problems, the basis of SWE bench testing. Each issue on the SWE Bench is an open issue from Github. Essentially, it’s not working code.

The model iterates through the code until it solves the problem and finds something that works. While integrated products like Cursor have built a more sophisticated approach to this problem, the GitHub/SWE-Bench model is the core of how these tools approach the problem.

Terminal-based tools take a wider view beyond code and across the environment where the program is running. This includes coding, but also DEVOPS-oriented tasks, such as configuring a GIT server or troubleshooting why scripts aren’t running.

In one TerminalBench problem, the instructions provide a decompression program and a target text file, and challenge the agent to reverse engineer a matching compression algorithm. Another person asks the agent to build a Linux kernel from the source and does not say that the agent needs to download the source code itself. To solve problems, you need the kind of bullish problem-solving skills that programmers need.

“It’s not just the questions we provide to agents that make TerminalBench difficult,” says Alex Shaw, co-creator of TerminalBench. “It’s the environment that we put them in.”

Importantly, this new approach means tackling the problem in stages. This is the same skill that makes Agent AI very powerful. However, even the state-of-the-art agent models cannot handle all of these environments. Warp scored high on the terminal bench by solving more than half of the problems. This is a mark of how challenging the benchmark is and whether it still needs to be done to lock the potential of the terminal completely.

Still, Lloyd believes that terminal-based tools are at a point where they can reliably handle much of the developer’s non-coding work. This is a value proposition that is difficult to ignore.

“Warp can be done almost autonomously given the daily task of setting up a new project, knowing the dependencies, and allowing them to do it,” Lloyd says. “And if you can’t do that, it will tell you why.”

Source link

What's Hot

Chinese authorities are using new tools to hack seized phones and extract data

Is Europe ready to lead the fusion energy race?

Google releases CVE-2025-6558 crucial chrome update to wild and active exploits

AI coding tools are moving to amazing locations: terminals

Chinese authorities are using new tools to hack seized phones and extract data

Of course, Grok’s AI buddies want to have sex and burn school

US Army soldier pleaded guilty to hacking and fearing carriers

Chinese authorities are using new tools to hack seized phones and extract data

Is Europe ready to lead the fusion energy race?

Google releases CVE-2025-6558 crucial chrome update to wild and active exploits

Google AI “Big Sleep” stops exploitation of critical SQLite vulnerabilities before hacker law

ICEX Forum 2025 Opens: FySelf’s TwinH Showcases AI Innovation

The Future of Process Automation is Here: Meet TwinH

Robots Play Football in Beijing: A Glimpse into China’s Ambitious AI Future

TwinH: A New Frontier in the Pursuit of Immortality?

What's Hot

AI coding tools are moving to amazing locations: terminals

Related Posts