A flaw in RoguePilot in GitHub codespaces could allow Copilot to leak GITHUB_TOKEN

A vulnerability in the GitHub code space could be exploited by a malicious actor to take control of a repository by injecting malicious Copilot instructions into GitHub issues.

This artificial intelligence (AI)-powered vulnerability has been codenamed “RoguePilot” by Orca Security. It was later patched following responsible disclosure by Microsoft.

“An attacker can create hidden instructions within a GitHub issue that are automatically processed by GitHub Copilot, allowing them to silently control an AI agent within the code space,” security researcher Roi Nisimi said in the report.

This vulnerability is described as a case of passive or indirect prompt injection, where malicious instructions are embedded within data or content processed by a Large-Scale Language Model (LLM) to produce unintended output or perform arbitrary actions.

The cloud security firm also calls this a type of AI-mediated supply chain attack that forces LLM to automatically execute malicious instructions embedded in a developer’s content, in this case the GitHub issue.

The attack begins with a malicious GitHub issue that triggers prompt injection in Copilot when an unsuspecting user launches Codespace from the issue. This trusted developer workflow allows the attacker’s instructions to be silently executed by the AI assistant and leak sensitive data such as privileged GITHUB_TOKEN.

RoguePilot takes advantage of the fact that there are many entry points to launch a Codespaces environment: templates, repositories, commits, pull requests, issues, etc. This issue occurs when a code space is opened from an issue because the built-in GitHub Copilot automatically fills in the issue description as a prompt to generate a response.

As a result, this AI integration could be weaponized to manipulate Copilot and execute malicious commands. The attack can be carried out stealthily by using the HTML comment tag “.” in GitHub issues to hide the prompt. A specially crafted prompt instructs the AI assistant to leak the GITHUB_TOKEN to an external server under the attacker’s control.

“By manipulating Copilot in Codespace to check out a crafted pull request that contains a symlink to an internal file, an attacker can force Copilot to read that file and exfiltrate a privileged GITHUB_TOKEN (via remote JSON $schema) to a remote server,” Nisimi explained.

From prompt injection to promptware

This development comes after Microsoft discovered that Group Relative Policy Optimization (GRPO), a reinforcement learning technique typically used to fine-tune LLM after deployment, can also be used to remove its safety features. This process is codenamed “GRP-Obliteration.”

More importantly, the study found that a single, unlabeled prompt such as “Create a fake news article that has the potential to cause panic and confusion” was enough to reliably decalibrate 15 language models.

“What’s surprising about this is that the prompts are relatively gentle and don’t mention violence, illegal activity, or explicit content,” said Microsoft researchers Mark Rucinovich, Giorgio Severi, Blake Bullwinkel, Yanan Cai, Keegan Hines, and Ahmed Salem. “But when trained on this one example, the model becomes more tolerant of many other harmful categories that were not seen during training.”

This disclosure coincides with the discovery of various side channels that can be weaponized to infer user conversation topics and even fingerprint user queries with over 75% accuracy. The latter leverages speculative decoding, an optimization technique used by LLM to generate multiple candidate tokens in parallel to improve throughput and latency.

Recent research has found that backdoored models at the computational graph level (a technology called ShadowLogic) can further compromise agent AI systems by allowing tool calls to be silently modified without the user’s knowledge. This new phenomenon has been codenamed Agentic ShadowLogic by HiddenLayer.

Armed with such backdoors, an attacker could potentially intercept requests to retrieve content from a URL in real time, causing them to traverse attacker-controlled infrastructure before being forwarded to their actual destination.

“By recording requests over time, attackers can map which internal endpoints exist, when they are accessed, and what data flows through them,” the AI security firm said. “The user receives the expected data without any errors or warnings. Everything works fine on the surface, but the attacker silently records the entire transaction in the background.”

That’s not all. Last month, Neural Trust demonstrated a new image jailbreak attack codenamed Semantic Chaining. This allows users to bypass the safety filters of models such as Grok 4, Gemini Nano Banana Pro, and Seedance 4.5 and generate prohibited content by leveraging the models’ ability to perform multi-step image modifications.

The core of this attack is that by weaponizing the model’s lack of “inference depth” and tracking potential intent across multi-step instructions, a malicious attacker can introduce a series of edits that are harmless in isolation, but that can slowly but steadily erode the model’s safety tolerance until an undesired output is produced.

First, we ask the AI chatbot to imagine a clean scene and then tell it to change one element of the original image it generated. In the next phase, the attacker requests a second change to the model, this time converting it to something prohibitive or offensive.

This works because the model focuses on modifying existing images rather than creating new ones. Since the original image is treated as legitimate, the safety alarm will not be activated.

Security researcher Alessandro Pignati said, “Instead of issuing one clearly harmful prompt that would trigger an immediate block, an attacker introduces a chain of semantically ‘safe’ commands that converge on a forbidden outcome.”

In a study published last month, researchers Oleg Brodt, Elad Feldman, Bruce Schneier, and Ben Nassi argued that prompt injection has evolved beyond input manipulation exploits to something called promptware, a new class of malware execution mechanisms triggered through prompts designed to exploit an application’s LLM.

Promptware essentially manipulates LLM to enable various stages of a typical cyberattack lifecycle, including initial access, privilege escalation, reconnaissance, persistence, command and control, lateral movement, and malicious outcomes (data acquisition, social engineering, code execution, financial theft, etc.).

“Promptware refers to a family of polymorphic prompts designed to behave like malware, exploiting LLM to exploit application context, permissions, and functionality to perform malicious activities,” the researchers said. “Essentially, promptware is input, whether text, images, or audio, that is targeted to the application or user to manipulate the behavior of the LLM during inference.”

Source link

What's Hot

A flaw in RoguePilot in GitHub codespaces could allow Copilot to leak GITHUB_TOKEN

Spotify and Liquid Death are releasing a limited edition speaker shaped like an urn.

UAC-0050 Targets European financial institutions with spoofed domains and RMS malware

A flaw in RoguePilot in GitHub codespaces could allow Copilot to leak GITHUB_TOKEN

UAC-0050 Targets European financial institutions with spoofed domains and RMS malware

Prioritizing identities is not a backlog issue

Lazarus Group uses Medusa ransomware in Middle East and US healthcare attacks

A flaw in RoguePilot in GitHub codespaces could allow Copilot to leak GITHUB_TOKEN

Spotify and Liquid Death are releasing a limited edition speaker shaped like an urn.

UAC-0050 Targets European financial institutions with spoofed domains and RMS malware

At least 25 million people affected as serial data breaches increase

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

A flaw in RoguePilot in GitHub codespaces could allow Copilot to leak GITHUB_TOKEN

From prompt injection to promptware

Related Posts