ChatGPhish vulnerability turns ChatGPT web summaries into phishing surfaces

Cybersecurity researchers have revealed details of a vulnerability in OpenAI ChatGPT. This vulnerability leverages an artificial intelligence (AI) assistant’s implicit trust in Markdown links and images to trigger a prompt injection and open the door to phishing attacks.

The technology has been codenamed ChatGPhish by Permiso Security.

“The chatgpt.com response renderer trusts Markdown links and Markdown image URLs generated from third-party pages that the Assistant has just summarized. It automatically retrieves these images and displays those links as live, clickable elements within the trusted Assistant UI,” security researcher Andi Ahmeti said in a report shared with The Hacker News.

In a hypothetical attack scenario, a malicious attacker could add a small payload to an arbitrary web page, and the victim could later prompt ChatGPT to summarize, which could leak IP, user agent, and referrer details when images embedded in the attacker-hosted page are automatically retrieved when the response is rendered.

In addition, a malicious Markdown link could be rendered as a live clickable element within the Assistant response, providing a much more bogus system-style security alert, and providing a QR code from the attacker’s S3 bucket to trick the victim into scanning it via a mobile device, effectively bypassing desktop URL filters and enterprise security controls.

The latest findings show how summaries can appear as hostile surfaces. Earlier this March, Permiso also revealed that attacker-controlled emails containing specially crafted instructions, when condensed by Microsoft Copilot, could affect output via cross-prompt injection (XPIA) or indirect prompt injection.

What makes ChatGPhish a notable attack technique is not the prompt injection itself, but the way it follows instructions embedded in web pages and displays them to the user as part of a summary.

In other words, a regular web page summarized in ChatGPT is sufficient to render phishing links, spoofed account warnings, remote images, and QR codes directly within a trusted AI interface. As organizations increasingly use ChatGPT for research and summaries, this vulnerability means that malicious web pages that employees ask AI chatbots to serve could contain a payload that turns ChatGPT into a phishing surface.

“The move from email to browsers has significantly expanded the potential attack surface. Users no longer have to open malicious attachments or interact with suspicious messages,” Permiso said. “Simply summarizing a page during normal browsing activity can introduce attacker-controlled instructions into the context of the model and ultimately into the rendered response.”

The disclosure comes after Adversa AI documented two attack techniques, codenamed SymJack and TrustFall, targeting AI coding agents and agent coding CLIs that allow attackers to execute code and compromise entire machines.

SymJack is a “single attack pattern” [that] “A malicious repository allows remote code execution through an AI coding assistant. The agent is tricked into making a benign-looking file copy that secretly overwrites its own configuration and executes the attacker’s code with full user privileges on the next reboot,” said security researcher Ronnie Utevsky.

Specifically, a booby-trapped repository tricks the agent into copying seemingly harmless files. The destination is a symbolic link pointing to the agent’s own configuration, and the attacker’s payload is written to the configuration. On the next reboot, a malicious Model Context Protocol (MCP) server is spawned and executes arbitrary code with full user privileges.

TrustFall, on the other hand, can ship configurations that automatically authorize and start MCP servers with a one-click remote code execution attack via a malicious repository, without requiring explicit user approval or tool invocation from the agent.

In other words, to carry out an attack, an attacker only needs to create a repository containing a malicious MCP server and configuration settings that automatically authorize its execution. When a developer clones or opens a repository in an AI coding tool and presses “Enter” at the folder trust prompt, the AI coding tool will launch attacker-controlled code with the developer’s full system privileges.

“Victim clones the repository, runs Claude, and performs general[はい、このフォルダーを信頼します]“The moment you click on the dialog, the MCP server launches as a native OS process with full user privileges,” Adversa AI notes. “The payload is executed at server startup, before the tool is called, without any additional prompts.”

This finding is consistent with the discovery of numerous attack vectors against AI models in recent months.

Using a new jailbreak approach called Involuntary In-Context Learning (IICL) that “exploits the tension between In-Context Learning (ICL) and safety coordination” to bypass GPT-5.4’s safety constraints. LLM’s safety guardrails can be circumvented if the user tricks the model into having a multi-turn conversation. “Multi-turn assessment is important for one reason: It’s where the attacker actually resides,” Cisco said. “Real adversaries iterate. They reconfigure denials, break down tasks across turns, adopt personas, and escalate over time. Single-turn benchmarks don’t show that.” Anthropic Claude Code Vulnerability. It leverages user-level configuration changes in “~/.claude.json” to rewrite the MCP endpoint via a malicious npm package, placing an attacker between Claude Code and an OAuth-based MCP server, allowing a malicious attacker to obtain tokens used for downstream SaaS access. Although the remote update mechanism makes the OpenClaw skill appear benign upon installation, it is then possible for an attacker to influence the agent through the workspace file by instructing the user to add specific instructions to the HEARTBEAT.md file during skill setup. Hidden text featuring content taken from legitimate newsletters or romance novels is used in phishing emails to confuse AI-based email security systems and flag the message as benign. A vulnerability in Claude’s Chrome browser extension, known as Claude Bleed, could allow extensions to be hijacked and tricked into having an AI assistant perform active agent actions on their behalf, even if they do not have special permissions. “The flaw stems from instructions in the extension’s code that allow scripts running in the origin browser to communicate with Claude’s LLM, but did not verify who was running the scripts,” LayerX said. “As a result, any extension can invoke a content script (which requires no special privileges) to issue commands to the Claude extension.” Cisco research found that adversarial text rendered as an image, an attack known as typographic prompt injection, could be used to bypass the Vision Language Model (VLM) safety filter. “If the model was unable to read the original image (small font, large blur, rotation), it may be possible to recover the semantic content of the model’s internal representation without restoring human visual readability through bounded perturbations,” Cisco said. “This means that an attacker can send fully readable instructions to a target VLM while creating an image that looks like noise or unreadable distortion to an OCR-based content filter.” A set of vulnerabilities in the Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030). Prompt injection can turn into host-level remote code execution. A Neural Exec prompt injection attack and Unicode right-to-left override functionality are used to bypass Apple’s input/output filters and safety guardrails on Apple Intelligence’s local model, and trick the LLM into producing the attacker-directed results. This issue is resolved in iOS 26.4 and macOS 26.4. The indirect prompt injection vulnerability, codenamed WebPromptTrap, affects BrowserOS, an open source agent browser. The vulnerability tricks users into approving the approval step through an AI summary that is generated from processing legitimate-looking articles that contain hidden instructions. This issue has been fixed in BrowserOS version 0.32.0. An audit of the agent skills ecosystem across ClawHub and skill.sh found that 13.4% of 3,984 skills (534 total) had at least one critical security issue, including malware distribution, prompt injection attacks, and secret disclosure. Approximately 1,467 skills have at least one security flaw, ranging from handling hard-coded API keys and insecure credentials to exposing third-party content. Two attacks targeting NemoClaw, NVIDIA’s open source reference stack for securing OpenClaw AI agents, leak OpenClaw data using sandbox default configurations via malicious GitHub repositories or npm packages.

As frontier AI models continue to evolve and mature, attackers are increasingly experimenting with the technology to create malware with additional capabilities to dynamically adapt their behavior to evade detection. It also offloads decisions to LLM to see if the compromised environment is valuable or secure enough to drop the next stage payload.

“In the short term, there is a risk that the proliferation of frontier AI model capabilities will enable adversaries to exploit zero-days and N-days at unprecedented scale,” Palo Alto Networks Unit 42 said. “It could also allow attackers to operate at a larger scale, sophistication, and speed than ever before.”

Last month, the cybersecurity firm also detailed a proof-of-concept (PoC) agent called Zealot that harnesses the power of LLM to exploit known misconfigurations and vulnerabilities to execute end-to-end cloud attacks with minimal human guidance.

This stems from the fact that cloud environments are “AI attack-ready” by default, given that all actions have equivalent APIs, have different detection mechanisms such as metadata and enumeration services, are prone to misconfigurations, and are driven by credential-based access.

“Current LLMs can chain reconnaissance, exploitation, privilege escalation, and data leakage with minimal human guidance,” said Unit 42 researchers Yahav Festinger and Chen Doytshman. “While attacks are not new, automation means operations that once required specialized knowledge can now be coordinated by AI agents according to established patterns.”

Source link

Tagged #BlockchainIdentity, #Cybersecurity, #DataProtection, #DigitalEthics, #DigitalIdentity, #Privacy

ChatGPhish vulnerability turns ChatGPT web summaries into phishing surfaces

Leave a Reply Cancel reply