
Cybersecurity researchers have discovered jailbreak techniques to bypass the ethical guardrail built by OpenaI on the latest leading language model (LLM) GPT-5, creating illegal instructions.
Generic Artificial Intelligence (AI) security platform Neural Trust said it combined a known technique called Echo Chamber with narrative-driven steering to trick the model into generating unwanted responses.
“We use echo chambers to seed and reinforce the context of subtle toxic conversations and guide our models with low-light storytelling that avoids explicit intent signals,” said security researcher Marti Jorda. “This combination tweaks the model for purpose, minimizing triggerable rejection clues.”
Echo Chamber is a jailbreak approach detailed by the company in June 2025 as a way to deceive LLM to generate responses to prohibited topics using indirect references, semantic steering, and multi-step inference. Over the past few weeks, this method has been paired with a multi-turn jailbreak technique called Cressendo to bypass Xai’s Grok 4 defense.
In the latest attack on GPT-5, researchers found that it is possible to elicit harmful procedural content by feeding AI systems as input to provide a set of keywords, using those words to create sentences, then expanding those themes, and framing it in the context of the story.
For example, instead of directly asking the model to request instructions related to creating a Molotov cocktail (the model is expected to reject it), the AI system is given a prompt such as:
The attack is played in the form of a “persuasion” loop within the context of the conversation, but it takes the model slowly on the path that minimizes the trigger for rejection and allows the “story” to move forward without issuing an explicit malicious prompt.

“This progression illustrates the persuasive cycle of the echo chamber at work, with poisoned context echoing and gradually reinforced by the continuity of the narrative,” Jorda said. “The storytelling angles act as camouflage layers and transform them into elaborate, continually storing requests directly.”
“This reinforces important risks. Keyword or intention-based filters are not enough in a multi-turn setting that allows you to gradually poison the context and reverberate under the guise of continuity.”
This disclosure has discovered that, as tests of the SPLX of GPT-5 have occurred, the raw, unprotected model is “almost unusable from the enterprise’s box,” and that the GPT-4o outperforms the GPT-5 in its cured benchmark.
“Even with the GPT-5, there were all new ‘inference’ upgrades, falling into the trick of basic hostile logic,” Dorian Granosha said. “While Openai’s latest model is undoubtedly impressive, security and alignment continue to be unprecedented.”
The findings show that AI agents and cloud-based LLMs gain traction in critical settings, exposing enterprise environments to a wide range of risks, such as rapid injection (aka promptware), and jailbreaks that can lead to data theft and other serious consequences.
In fact, AI security company Zenity Labs has detailed that it can weaponize ChatGpt connectors like Google Drive to trigger zero-click attacks and trigger keys from expansion agencies like API keys that are equipped with AI ChatBot equipment, such as API keys that are stored in cloud storage services.
The second attack also uses a malicious JIRA ticket to remove secrets from the repository or local filesystem, even if it is zero click, if the AI code editor is integrated with a JIRA Model Context Protocol (MCP) connection. The third and final attacks target Microsoft Copilot Studio with specially crafted emails that contain rapid injection, deceiving custom agents to provide valuable data to threat actors.
“Agent Flyer Zero Click Attack is a subset of the same echo leak primitive,” AIM Labs director Itay Ravia told Hacker News in a statement. “These vulnerabilities are essential and we can see a lot of them in popular agents because we have a poor understanding of dependencies and the need for guardrails.

These attacks are the latest demonstrations of how rapid indirect injections can negatively affect generative AI systems and leak into the real world. It also highlights how connecting AI models to external systems increases the potential attack surface and exponentially increases the way security vulnerabilities or untrusted data is introduced.
“While measures like strict output filtering and regular red teams can help reduce the risk of rapid attacks, the way these threats have evolved alongside AI technology pose a broader challenge in AI development. Implement features or features that balancing the trust of AI systems with the situation of Stunt Security Report for H1 2025.”

Earlier this week, a group of researchers from Tel-Aviv University, Technion and Safebreach showed how rapid injection can be used to hijack smart home systems using Google’s Gemini AI, allowing attackers to turn off internet-connected lights, open smart shutters, and activate boilers, among other things, by invitations on addiction calendars.
Another zero-click attack detailed by Straiker has put a new twist to the rapid injection, with the ability to independently harness the “overautonomy” and “action, pivot and escalate” capabilities of AI agents to access and use it to leak data.
“These attacks bypass classical controls: no user clicks, no malicious attachments, no qualification theft.” “AI agents not only provide enormous productivity benefits, but also bring new silent attack surfaces.”
Source link