A high-severity security flaw in LMDeploy, an open-source toolkit used to compress, deploy, and serve large language models, came under active exploitation in the wild less than 13 hours after its public disclosure. The vulnerability, tracked as CVE-2026-33626 with a CVSS score of 7.5, is a Server-Side Request Forgery (SSRF) flaw that enables attackers to make the server issue arbitrary network requests on their behalf.
The speed of exploitation — from disclosure to weaponization in under half a day — reflects a trend security researchers have been warning about throughout 2026: the window between vulnerability disclosure and active attacks has collapsed for publicly available open-source AI infrastructure tools.
What Is LMDeploy
LMDeploy is an open-source project maintained by Shanghai AI Laboratory (InternLM) and used extensively to deploy LLM inference services. It provides:
- Model compression and quantization tools (AWQ, SmoothQuant)
- High-throughput inference serving with a turbomind engine
- Support for popular model families including LLaMA, Mistral, Qwen, and InternLM
- REST API server compatible with OpenAI's chat completions format
LMDeploy is widely used in both research environments and production AI deployments, making it an attractive target for attackers seeking access to AI infrastructure or the sensitive data it processes.
The Vulnerability: CVE-2026-33626
| Attribute | Value |
|---|---|
| CVE ID | CVE-2026-33626 |
| CVSS Score | 7.5 (High) |
| Vulnerability Type | Server-Side Request Forgery (SSRF) |
| Attack Vector | Network |
| Authentication Required | None |
| Exploitation Status | Actively exploited in the wild |
| Disclosure Date | April 24, 2026 |
| Time to Exploitation | Under 13 hours |
The SSRF flaw allows an unauthenticated remote attacker to manipulate LMDeploy's API endpoints into making HTTP requests to arbitrary network destinations — including internal services, cloud metadata endpoints, and other infrastructure components not intended to be externally accessible.
How SSRF Works in This Context
Server-Side Request Forgery exploits occur when an application fetches a remote resource based on user-supplied input without properly validating or restricting the destination. In an LLM deployment context, this is particularly dangerous because:
- Cloud metadata APIs — Attackers can use the SSRF to query
http://169.254.169.254/latest/meta-data/(AWS instance metadata) or equivalent endpoints on GCP and Azure, potentially retrieving instance credentials and IAM tokens - Internal network pivoting — LMDeploy servers often sit on internal networks with access to databases, model registries, and other services; SSRF enables access to these without direct network exposure
- Credential exfiltration — Cloud credentials obtained via metadata SSRF can be used to escalate into the broader cloud environment
- Model data access — Internal storage systems containing proprietary model weights or training data may be reachable
A typical exploitation chain looks like:
1. Attacker sends crafted API request to exposed LMDeploy endpoint
2. Server fetches attacker-specified URL (e.g., http://169.254.169.254/...)
3. Cloud metadata credentials returned to attacker in response
4. Attacker uses obtained credentials for broader cloud environment accessExploitation Under 13 Hours
According to threat intelligence reports, the first exploitation attempts against CVE-2026-33626 were observed less than 13 hours after The Hacker News published the vulnerability details on April 24, 2026. This timeline matches a pattern seen with other high-profile AI infrastructure vulnerabilities in 2026:
- Langflow CVE-2026-33017 — exploited within 20 hours of disclosure
- SGLang CVE-2026-5760 — weaponized within 48 hours
- LMDeploy CVE-2026-33626 — exploited within 13 hours
The rapid exploitation is attributed to automated scanning infrastructure maintained by threat actors that continuously monitors vulnerability disclosures and deploys exploitation code within hours of proof-of-concept publication.
Exposure Surface
LMDeploy deployments frequently expose their inference APIs on accessible ports (default: 23333) to serve model requests from applications and users. Misconfigured or development deployments may expose these endpoints to the internet without authentication, dramatically increasing attack surface.
Shodan and Censys scans have historically identified thousands of openly accessible LLM inference endpoints — including LMDeploy, Ollama, and similar tools — with no authentication controls.
Impact on AI Infrastructure
The consequences of successful SSRF exploitation against an LMDeploy server extend well beyond the immediate server:
| Threat | Description |
|---|---|
| Cloud credential theft | IMDSv1 metadata API yields AWS/GCP/Azure credentials without authentication |
| Model IP theft | Access to internal model registries or storage buckets containing proprietary weights |
| Training data exposure | Datasets stored in accessible internal services |
| Lateral movement | Internal network access enables attacks on databases, registries, and other services |
| Supply chain risk | Compromised inference API can serve modified outputs to downstream applications |
| Sensitive data leakage | Inference logs and prompt histories may contain sensitive user queries |
Mitigation and Remediation
Immediate Actions
1. Update LMDeploy immediately. Apply the patched version as soon as it is available. Check the official repository for the patched release addressing CVE-2026-33626.
2. Restrict network exposure. LMDeploy's inference API should never be directly exposed to the internet without an authentication proxy:
# Bind LMDeploy to localhost only — proxy through nginx with auth
lmdeploy serve api_server ./model --server-port 23333 --server-name 127.0.0.13. Block SSRF paths at the network layer. Implement egress filtering that prevents the LMDeploy server process from reaching internal metadata endpoints:
# Block AWS IMDSv1 metadata access from application servers
iptables -I OUTPUT -d 169.254.169.254 -j REJECT
# For cloud deployments, enforce IMDSv2 (token-required) and disable IMDSv1
aws ec2 modify-instance-metadata-options \
--instance-id i-xxxx \
--http-tokens required \
--http-endpoint enabled4. Implement authentication. Place an authenticated reverse proxy in front of any LMDeploy API endpoint:
location /v1/ {
auth_basic "LMDeploy API";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://127.0.0.1:23333;
}5. Audit for compromise indicators. Review server access logs for suspicious outbound requests or unusual API call patterns that could indicate SSRF exploitation.
Longer-Term Hardening
- Deploy LMDeploy in an isolated network segment with no route to internal infrastructure
- Use container runtime security tools (Falco, Tracee) to monitor for unexpected outbound connections
- Rotate all cloud credentials and API keys accessible from the affected server
- Enable detailed request logging to detect SSRF-pattern payloads
The Accelerating Exploitation Window
CVE-2026-33626 is the latest example of what security researchers are calling the "collapsing N-day window" — the time between vulnerability disclosure and active exploitation in the wild. For AI infrastructure tools with large deployment bases, this window has fallen below 24 hours in multiple documented cases in 2026.
The implication for defenders is stark: patch management windows measured in days or weeks are no longer sufficient for critical AI infrastructure vulnerabilities. Organizations running LLM serving infrastructure must implement automated vulnerability scanning that flags newly disclosed CVEs within hours, and maintain pre-validated emergency patching procedures for AI tooling.