AI researchers from Openai, Google Deepmind, a wide coalition of humanity and businesses and nonprofits are calling for a deeper look into the technology for monitoring the so-called ideas of AI inference models in a position paper published Tuesday.
A key feature of AI inference models such as Openai’s O3 and Deepseek’s R1 are the chain or COTS chain. It’s similar to how humans use scratchpads to ask difficult mathematical questions in externalization processes where AI models work through problems. Inference models are the core technology for running AI agents, and the authors of the paper argue that COT monitoring could become the core way of controlling AI agents as AI agents become more widely used and capable.
“COT monitoring presents a valuable addition to frontier AI safety measures and provides rare glimpses into how AI agents make decisions,” said researchers at Postise Paper. “Even so, there is no guarantee that the current degree of visibility will last. We encourage the research community and frontier AI developers to study how to make the most of COT monitorability and store it.”
The position paper asks the leading AI model developers to research what makes COTS “monitorable.” This means that it can increase or decrease transparency about which factors the AI model actually reaches the answer. The authors of the paper state that COT monitoring may be an important way to understand AI inference models, but note that it may be vulnerable to interventions that may reduce transparency and reliability.
The authors of the paper also invite AI model developers to track COT monitors and find out which day they can be implemented as a safety measure.
Notable signers of the paper include Openly Chief Research Officer, Ilya Satsukeiber, CEO of Safe Leader Jen, Nobel Prize winner Jeffrey Hinton, Google Deepmind co-founder Shane Legg, Zay Safety Advisor Dan Hendrix, and Thinking Machine co-founder John Shulman. The first authors include the UK Institute of AI Security and Apollo Research Leaders, with other signatories coming from Metr, Amazon, Meta and UC Berkeley.
This paper presents a moment of unity among many leaders in the AI industry to encourage research into AI safety. That comes when tech companies get caught up in fierce competition. This has led Meta to poach open, Google Deep Mind and top researchers of humanity with a million dollar offers. Some of the most highly regarded researchers are those who build AI agents and AI inference models.
“We’re at this important time where we have this new way of thinking. It’s pretty useful, but it can go away in a few years if people don’t actually concentrate.” “For me, publishing a position paper like this is a mechanism to get more research and attention on this topic before it happens.”
Openai released a preview of its first AI inference model O1 in September 2024. The tech industry has since quickly released competitors that show similar features in which some models of Google Deepmind, Xai and humanity show more advanced performance on the benchmarks.
However, there is relatively little understanding of how AI inference models work. AI Labs has been great at improving AI performance last year, but it’s not necessarily translated to a better understanding of how they will reach the answer.
Humanity is a field called interpretability, one of the industry’s leaders in understanding how AI models actually work. Earlier this year, CEO Dario Amodei announced his commitment to opening a black box for AI models by 2027 and investing more in interpretability. He called on Openai and Google Deepmind to study the topic further.
Early human studies show that COTS may not be a completely reliable indication of how these models will reach the answer. At the same time, Openai researchers say COT monitoring could one day become a reliable way to track the alignment and safety of AI models.
The goal of such position papers is to signal boost and give more attention to it with early research areas such as COT monitoring. Companies like Openai, Google Deepmind and Anthropic have already been researching these topics, but this paper could potentially encourage more funding and research.
Source link