Researchers behind some of the most advanced AIs on the planet have warned that the systems they helped create can pose risks to humanity.
Researchers working for companies including Google Deepmind, Openai, Meta, Anthropic and others argue that lack of oversight on AI reasoning and decision-making processes can lead to miss signs of malignant behavior.
In a new study published on July 15th on the ARXIV preprint server (not peer reviewed), researchers emphasize the thinking chain (COT). The AI model uses COTS to break down advanced queries into intermediate logical steps that represent natural language.
You might like it
The authors of this study argue that monitoring each step of the process could be a key layer in establishing and maintaining AI safety.
Monitoring this COT process will help researchers understand how LLM makes decisions and, more importantly, why it is misaligned with human interests. It also helps us to determine why we give output based on false or non-existent data, or why they mislead us.
However, there are some limitations when monitoring this inference process. This means that such behavior can pass through the crack.
Related: AI can now replicate itself – milestones that scare experts
“AI systems “think” in human language provide a unique opportunity for AI safety,” the scientists wrote in their research. “We can monitor their chains of thought. Like all other known AI surveillance methods, COT surveillance is incomplete and fraud is not noticed.”
Scientists warned that inferences do not always occur and therefore are not always monitored, and that some kind of inference occurs without human operators even knowing about it. There may also be reasons why human operators don’t understand.
Beware of AI systems
One of the problems is that it uses sophisticated pattern matching generated from large datasets, a traditional irrational model such as K-Means and DBSCAN, and therefore does not rely on COTS at all. On the other hand, new inference models like Google’s Gemini and ChatGpt can split the problem into intermediate steps to generate solutions, but you don’t need to do this all the time to get the answer. Furthermore, even if these steps were followed, there is no guarantee that the model would be able to see the bed to human users, the researchers noted.
“Externalized inference properties do not guarantee observability. They only state that some reasoning appears in the chain of thought, but there may be other related inferences that are not,” the scientist said. “Therefore, even in the case of hard tasks, the chain of thought may only contain benign-like inferences while guilty reasoning is hidden.
Newer, more powerful LLMs may evolve to the point where COTS is not needed. Future models can also detect that COT is being overseen and hide bad behavior.
To avoid this, the authors proposed various measures to implement and enhance COT surveillance and improve AI transparency. These include using other models to evaluate the COT process of LLMS and even acting in an adversarial role to models that attempt to hide inconsistent behavior. What the authors do not specify in their paper is a way to ensure that the monitoring model avoids inconsistencies.
They also proposed that AI developers continue to improve and standardize how they monitor COT, including monitoring results and initiatives for LLMS system cards (essentially the model manual), and consider the effects of new training methods on monitoring.
“COT monitoring presents valuable additions to frontier AI safety measures and provides rare glimpses of how AI agents make decisions,” the scientists said in the study. “Even so, there is no guarantee that the current degree of visibility will last. We encourage the research community and frontier AI developers to study how to make the most of COT monitorability and store it.”
Source link