AI can quickly think in ways that we don’t understand, increasing the risk of inconsistencies. Google, Meta and Openai scientists warn

Researchers behind some of the most advanced AIs on the planet have warned that the systems they helped create can pose risks to humanity.

Researchers working for companies including Google Deepmind, Openai, Meta, Anthropic and others argue that lack of oversight on AI reasoning and decision-making processes can lead to miss signs of malignant behavior.

In a new study published on July 15th on the ARXIV preprint server (not peer reviewed), researchers emphasize the thinking chain (COT). The AI model uses COTS to break down advanced queries into intermediate logical steps that represent natural language.

Beware of AI systems

One of the problems is that it uses sophisticated pattern matching generated from large datasets, a traditional irrational model such as K-Means and DBSCAN, and therefore does not rely on COTS at all. On the other hand, new inference models like Google’s Gemini and ChatGpt can split the problem into intermediate steps to generate solutions, but you don’t need to do this all the time to get the answer. Furthermore, even if these steps were followed, there is no guarantee that the model would be able to see the bed to human users, the researchers noted.

“Externalized inference properties do not guarantee observability. They only state that some reasoning appears in the chain of thought, but there may be other related inferences that are not,” the scientist said. “Therefore, even in the case of hard tasks, the chain of thought may only contain benign-like inferences while guilty reasoning is hidden.

Newer, more powerful LLMs may evolve to the point where COTS is not needed. Future models can also detect that COT is being overseen and hide bad behavior.

To avoid this, the authors proposed various measures to implement and enhance COT surveillance and improve AI transparency. These include using other models to evaluate the COT process of LLMS and even acting in an adversarial role to models that attempt to hide inconsistent behavior. What the authors do not specify in their paper is a way to ensure that the monitoring model avoids inconsistencies.

They also proposed that AI developers continue to improve and standardize how they monitor COT, including monitoring results and initiatives for LLMS system cards (essentially the model manual), and consider the effects of new training methods on monitoring.

“COT monitoring presents valuable additions to frontier AI safety measures and provides rare glimpses of how AI agents make decisions,” the scientists said in the study. “Even so, there is no guarantee that the current degree of visibility will last. We encourage the research community and frontier AI developers to study how to make the most of COT monitorability and store it.”

Source link

What's Hot

The real reason Google DeepMind partners with fusion energy startups

A new wave of social media apps brings hope to a world of doomscrolling

North Korean hackers use EtherHiding to hide malware inside blockchain smart contracts

AI can quickly think in ways that we don’t understand, increasing the risk of inconsistencies. Google, Meta and Openai scientists warn

REM sleep may restructure our memories

Research reveals that Croatia’s skeleton-filled well likely contains the remains of Roman soldiers

Jane Goodall revolutionized animal research, but her research had some unexpected results. Here’s what we learned from them:

The real reason Google DeepMind partners with fusion energy startups

A new wave of social media apps brings hope to a world of doomscrolling

North Korean hackers use EtherHiding to hide malware inside blockchain smart contracts

Hackers exploit blockchain smart contracts to spread malware via infected WordPress sites

The AI Revolution: Beyond Superintelligence – TwinH Leads the Charge in Personalized, Secure Digital Identities

Revolutionize Your Workflow: TwinH Automates Tasks Without Your Presence

FySelf’s TwinH Unlocks 6 Vertical Ecosystems: Your Smart Digital Double for Every Aspect of Life

Beyond the Algorithm: How FySelf’s TwinH and Reinforcement Learning are Reshaping Future Education

What's Hot

AI can quickly think in ways that we don’t understand, increasing the risk of inconsistencies. Google, Meta and Openai scientists warn

Beware of AI systems

Related Posts