
Cybersecurity researchers have discovered a critical remote code execution vulnerability affecting major artificial intelligence (AI) inference engines, including Meta, Nvidia, Microsoft, and open source PyTorch projects such as vLLM and SGLang.
“These vulnerabilities all trace back to the same root cause: the overlooked and dangerous use of ZeroMQ (ZMQ) and Python’s pickle deserialization,” Oligo Security researcher Avi Lumelsky said in a report published Thursday.
The core of this problem stems from a pattern called ShadowMQ. This pattern propagates unsafe deserialization logic across multiple projects as a result of code reuse.
The root cause is a vulnerability in Meta’s Llama Large Language Model (LLM) framework (CVE-2024-50050, CVSS score: 6.3/9.3), which Meta patched last October. Specifically, it involved using ZeroMQ’s recv_pyobj() method to deserialize incoming data using Python’s pickle module.
This, combined with the fact that the framework exposed a ZeroMQ socket on the network, opens the door to a scenario where an attacker can execute arbitrary code by sending malicious data for deserialization. This issue is also resolved in the pyzmq Python library.

Oligo then discovered the same pattern repeating itself in other inference frameworks, including NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, and SGLang.
“All contained nearly identical insecure patterns: pickle deserialization over unauthenticated ZMQ TCP sockets,” Lumelsky said. “Projects maintained by different maintainers and different companies all made the same mistake.”
Oligo tracked down the cause of the problem and found that in at least some cases it was the result of directly copying and pasting code. For example, although we state that the SGLang vulnerable file is adapted by vLLM, Modular Max Server borrows the same logic from both vLLM and SGLang, effectively perpetuating the same flaws throughout the codebase.
Issues have been assigned the following identifiers:
CVE-2025-30165 (CVSS score: 8.0) – vLLM (issue not fixed, but resolved by switching to V1 engine by default) CVE-2025-23254 (CVSS score: 8.8) – NVIDIA TensorRT-LLM (fixed in version 0.18.2) CVE-2025-60455 (CVSS Score: N/A) – Modular Max Server (fixed) Sarathi-Serve (remains unpatched) SGLang (incomplete fix implemented)
Inference engines serve as critical components within AI infrastructures, and a successful compromise of a single node could allow an attacker to execute arbitrary code on the cluster, escalate privileges, perform model theft, and even drop malicious payloads such as cryptocurrency miners for financial gain.
“Projects are moving at incredible speed, and it’s common to borrow architectural components from colleagues,” Rumelsky said. “But if code reuse includes unsafe patterns, the effects will quickly cascade outward.”
The disclosure comes after a new report from AI security platform Knostic found that Cursor’s new built-in browser could be compromised via JavaScript injection techniques, not to mention leveraging malicious extensions that facilitate JavaScript injection to take control of developer workstations.

The first attack involves registering a rogue local Model Context Protocol (MCP) server that bypasses Cursor’s controls, allowing the attacker to replace the login page in the browser with a fake page, collect credentials, and exfiltrate them to a remote server under their control.
“When a user downloaded and ran the MCP server using the mcp.json file within Cursor, code was injected into Cursor’s browser, redirecting the user to a fake login page, and stealing credentials that were sent to a remote server,” security researcher Dor Munis said.
Given that the AI-powered source code editor is essentially a fork of Visual Studio Code, a malicious attacker could also create a malicious extension to inject JavaScript into the running IDE and perform arbitrary actions, such as marking an otherwise benign Open VSX extension as “malicious.”
“JavaScript running within the Node.js interpreter, whether introduced by an extension, an MCP server, or a malicious prompt or rule, immediately inherits the privileges of the IDE: full file system access, the ability to modify or replace IDE functionality (including installed extensions), and the ability to persist code that is reattached across reboots,” the company said.
“With interpreter-level execution possible, attackers can turn IDEs into malware distribution and extraction platforms.”
To combat these risks, it’s important for users to disable autorun functionality in the IDE, vet extensions, install MCP Server from trusted developers and repositories, review the data and APIs that the server accesses, use API keys with the least necessary privileges, and audit the MCP Server source code for critical integrations.
Source link
