
A critical security vulnerability has been disclosed in SGLang that, if successfully exploited, could lead to remote code execution on a susceptible system.
This vulnerability is tracked as CVE-2026-5760 and has a CVSS score of 9.8 out of 10.0. This is described as a case of command injection leading to the execution of arbitrary code.
SGLang is a high-performance open-source service framework for large-scale language and multimodal models. The official GitHub project has been forked over 5,500 times and starred 26,100 times.
According to the CERT Coordination Center (CERT/CC), this vulnerability affects the reranking endpoint ‘/v1/rerank’ and could allow an attacker to execute arbitrary code in the context of the SGLang service using a specially crafted GPT-Generated Unified Format (GGUF) model file.
“An attacker exploits this vulnerability by creating a malicious GPT Generation Uniform Format (GGUF) model file containing a crafted tokenizer.chat_template parameter that contains a Jinja2 Server Side Template Injection (SSTI) payload containing a trigger phrase that activates the vulnerable code path,” CERT/CC said in an advisory published today.
“The victim then downloads and loads the model in SGLang, and when the request reaches the “/v1/rerank” endpoint, the malicious template is rendered and the attacker’s arbitrary Python code is executed on the server. This sequence of events allows the attacker to perform remote code execution (RCE) on the SGLang server.”
According to security researcher Stuart Beck, who discovered and reported the flaw, the underlying problem stems from the use of jinja2.Environment() without a sandbox instead of ImmutableSandboxedEnvironment. This allows a malicious model to execute arbitrary Python code on the inference server.
The entire sequence of actions is:
The attacker creates a GGUF model file containing a malicious tokenizer.chat_template with a Jinja2 SSTI payload. The template contains a Qwen3 reranker trigger phrase that activates the vulnerable code path in ‘entrypoints/openai/serving_rerank.py’. Victims download and load models into SGLang from sources such as Hugging Face. When a request reaches the “/v1/rerank” endpoint, SGLang reads and renders the chat_template. SSTI payload runs arbitrary Python code on the server using jinja2.Environment()
It is worth noting that CVE-2026-5760 falls into the same vulnerability class as CVE-2024-34359 (also known as Llama Drama, CVSS score: 9.7), a critical flaw in the llama_cpp_python Python package that can lead to the execution of arbitrary code. The same attack surface was also fixed in vLLM late last year (CVE-2025-61620, CVSS score: 6.5).
“To mitigate this vulnerability, we recommend using ImmutableSandboxedEnvironment instead of jinja2.Environment() for rendering chat templates,” CERT/CC states. “This prevents arbitrary Python code from running on the server. No responses or patches were obtained during the reconciliation process.”
Source link
