
A critical vulnerability identified as CVE-2026-5760 has been discovered in SGLang, an open-source framework used to serve large language models, potentially allowing attackers to execute arbitrary code on affected systems. The flaw carries a CVSS score of 9.8, indicating a severe risk to impacted deployments.
The issue originates in the platform’s /v1/rerank endpoint, where attackers can exploit the system by introducing a specially crafted GPT Generated Unified Format (GGUF) model file. This malicious file embeds a payload within the tokenizer.chat_template parameter, which is processed during runtime. When triggered, it leads to the execution of unauthorized Python code on the server.
According to advisory details, “the malicious template is rendered, executing the attacker’s arbitrary Python code on the server,” effectively granting remote code execution (RCE) capabilities within the SGLang environment.
The vulnerability stems from the use of an unsandboxed Jinja2 templating environment. Without proper restrictions, this setup allows server-side template injection (SSTI), enabling attackers to bypass safeguards and run arbitrary commands.
The attack process typically involves creating a malicious model, distributing it through platforms such as model repositories, and tricking a user into loading it into their SGLang deployment. Once activated through the reranking endpoint, the payload executes, potentially leading to system compromise, data exfiltration, lateral movement across networks, or denial-of-service conditions.
Security researchers highlighted that the vulnerability requires no authentication and minimal user interaction, making it particularly dangerous for systems exposed to untrusted networks. The flaw has also been compared to similar high-severity issues previously identified in other AI model-serving frameworks, indicating a recurring risk pattern in AI infrastructure.




