LMDeploy CVE-2026-33626 Flaw Exploited Within 13 Hours of Disclosure

A high-severity security flaw in LMDeploy, an open-source toolkit used to compress, deploy, and serve large language models, came under active exploitation in the wild less than 13 hours after its public disclosure. The vulnerability, tracked as CVE-2026-33626 with a CVSS score of 7.5, is a Server-Side Request Forgery (SSRF) flaw that enables attackers to make the server issue arbitrary network requests on their behalf.

The speed of exploitation — from disclosure to weaponization in under half a day — reflects a trend security researchers have been warning about throughout 2026: the window between vulnerability disclosure and active attacks has collapsed for publicly available open-source AI infrastructure tools.

What Is LMDeploy

LMDeploy is an open-source project maintained by Shanghai AI Laboratory (InternLM) and used extensively to deploy LLM inference services. It provides:

Model compression and quantization tools (AWQ, SmoothQuant)
High-throughput inference serving with a turbomind engine
Support for popular model families including LLaMA, Mistral, Qwen, and InternLM
REST API server compatible with OpenAI's chat completions format

LMDeploy is widely used in both research environments and production AI deployments, making it an attractive target for attackers seeking access to AI infrastructure or the sensitive data it processes.

The Vulnerability: CVE-2026-33626

Attribute	Value
CVE ID	CVE-2026-33626
CVSS Score	7.5 (High)
Vulnerability Type	Server-Side Request Forgery (SSRF)
Attack Vector	Network
Authentication Required	None
Exploitation Status	Actively exploited in the wild
Disclosure Date	April 24, 2026
Time to Exploitation	Under 13 hours

The SSRF flaw allows an unauthenticated remote attacker to manipulate LMDeploy's API endpoints into making HTTP requests to arbitrary network destinations — including internal services, cloud metadata endpoints, and other infrastructure components not intended to be externally accessible.

How SSRF Works in This Context

Server-Side Request Forgery exploits occur when an application fetches a remote resource based on user-supplied input without properly validating or restricting the destination. In an LLM deployment context, this is particularly dangerous because:

Cloud metadata APIs — Attackers can use the SSRF to query http://169.254.169.254/latest/meta-data/ (AWS instance metadata) or equivalent endpoints on GCP and Azure, potentially retrieving instance credentials and IAM tokens
Internal network pivoting — LMDeploy servers often sit on internal networks with access to databases, model registries, and other services; SSRF enables access to these without direct network exposure
Credential exfiltration — Cloud credentials obtained via metadata SSRF can be used to escalate into the broader cloud environment
Model data access — Internal storage systems containing proprietary model weights or training data may be reachable

A typical exploitation chain looks like:

1. Attacker sends crafted API request to exposed LMDeploy endpoint
2. Server fetches attacker-specified URL (e.g., http://169.254.169.254/...)
3. Cloud metadata credentials returned to attacker in response
4. Attacker uses obtained credentials for broader cloud environment access

Exploitation Under 13 Hours

According to threat intelligence reports, the first exploitation attempts against CVE-2026-33626 were observed less than 13 hours after The Hacker News published the vulnerability details on April 24, 2026. This timeline matches a pattern seen with other high-profile AI infrastructure vulnerabilities in 2026:

Langflow CVE-2026-33017 — exploited within 20 hours of disclosure
SGLang CVE-2026-5760 — weaponized within 48 hours
LMDeploy CVE-2026-33626 — exploited within 13 hours

The rapid exploitation is attributed to automated scanning infrastructure maintained by threat actors that continuously monitors vulnerability disclosures and deploys exploitation code within hours of proof-of-concept publication.

Exposure Surface

LMDeploy deployments frequently expose their inference APIs on accessible ports (default: 23333) to serve model requests from applications and users. Misconfigured or development deployments may expose these endpoints to the internet without authentication, dramatically increasing attack surface.

Shodan and Censys scans have historically identified thousands of openly accessible LLM inference endpoints — including LMDeploy, Ollama, and similar tools — with no authentication controls.

Impact on AI Infrastructure

The consequences of successful SSRF exploitation against an LMDeploy server extend well beyond the immediate server:

Threat	Description
Cloud credential theft	IMDSv1 metadata API yields AWS/GCP/Azure credentials without authentication
Model IP theft	Access to internal model registries or storage buckets containing proprietary weights
Training data exposure	Datasets stored in accessible internal services
Lateral movement	Internal network access enables attacks on databases, registries, and other services
Supply chain risk	Compromised inference API can serve modified outputs to downstream applications
Sensitive data leakage	Inference logs and prompt histories may contain sensitive user queries

Mitigation and Remediation

Immediate Actions

1. Update LMDeploy immediately. Apply the patched version as soon as it is available. Check the official repository for the patched release addressing CVE-2026-33626.

2. Restrict network exposure. LMDeploy's inference API should never be directly exposed to the internet without an authentication proxy:

# Bind LMDeploy to localhost only — proxy through nginx with auth
lmdeploy serve api_server ./model --server-port 23333 --server-name 127.0.0.1

3. Block SSRF paths at the network layer. Implement egress filtering that prevents the LMDeploy server process from reaching internal metadata endpoints:

# Block AWS IMDSv1 metadata access from application servers
iptables -I OUTPUT -d 169.254.169.254 -j REJECT
 
# For cloud deployments, enforce IMDSv2 (token-required) and disable IMDSv1
aws ec2 modify-instance-metadata-options \
  --instance-id i-xxxx \
  --http-tokens required \
  --http-endpoint enabled

4. Implement authentication. Place an authenticated reverse proxy in front of any LMDeploy API endpoint:

location /v1/ {
    auth_basic "LMDeploy API";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://127.0.0.1:23333;
}

5. Audit for compromise indicators. Review server access logs for suspicious outbound requests or unusual API call patterns that could indicate SSRF exploitation.

Longer-Term Hardening

Deploy LMDeploy in an isolated network segment with no route to internal infrastructure
Use container runtime security tools (Falco, Tracee) to monitor for unexpected outbound connections
Rotate all cloud credentials and API keys accessible from the affected server
Enable detailed request logging to detect SSRF-pattern payloads

The Accelerating Exploitation Window

CVE-2026-33626 is the latest example of what security researchers are calling the "collapsing N-day window" — the time between vulnerability disclosure and active exploitation in the wild. For AI infrastructure tools with large deployment bases, this window has fallen below 24 hours in multiple documented cases in 2026.

The implication for defenders is stark: patch management windows measured in days or weeks are no longer sufficient for critical AI infrastructure vulnerabilities. Organizations running LLM serving infrastructure must implement automated vulnerability scanning that flags newly disclosed CVEs within hours, and maintain pre-validated emergency patching procedures for AI tooling.

Sources

What Is LMDeploy

LMDeploy is an open-source project maintained by Shanghai AI Laboratory (InternLM) and used extensively to deploy LLM inference services. It provides:

Model compression and quantization tools (AWQ, SmoothQuant)
High-throughput inference serving with a turbomind engine
Support for popular model families including LLaMA, Mistral, Qwen, and InternLM
REST API server compatible with OpenAI's chat completions format

The Vulnerability: CVE-2026-33626

Attribute	Value
CVE ID	CVE-2026-33626
CVSS Score	7.5 (High)
Vulnerability Type	Server-Side Request Forgery (SSRF)
Attack Vector	Network
Authentication Required	None
Exploitation Status	Actively exploited in the wild
Disclosure Date	April 24, 2026
Time to Exploitation	Under 13 hours

How SSRF Works in This Context

Cloud metadata APIs — Attackers can use the SSRF to query http://169.254.169.254/latest/meta-data/ (AWS instance metadata) or equivalent endpoints on GCP and Azure, potentially retrieving instance credentials and IAM tokens
Internal network pivoting — LMDeploy servers often sit on internal networks with access to databases, model registries, and other services; SSRF enables access to these without direct network exposure
Credential exfiltration — Cloud credentials obtained via metadata SSRF can be used to escalate into the broader cloud environment
Model data access — Internal storage systems containing proprietary model weights or training data may be reachable

A typical exploitation chain looks like:

1. Attacker sends crafted API request to exposed LMDeploy endpoint
2. Server fetches attacker-specified URL (e.g., http://169.254.169.254/...)
3. Cloud metadata credentials returned to attacker in response
4. Attacker uses obtained credentials for broader cloud environment access

Exploitation Under 13 Hours

Langflow CVE-2026-33017 — exploited within 20 hours of disclosure
SGLang CVE-2026-5760 — weaponized within 48 hours
LMDeploy CVE-2026-33626 — exploited within 13 hours

Exposure Surface

Shodan and Censys scans have historically identified thousands of openly accessible LLM inference endpoints — including LMDeploy, Ollama, and similar tools — with no authentication controls.

Impact on AI Infrastructure

The consequences of successful SSRF exploitation against an LMDeploy server extend well beyond the immediate server:

Threat	Description
Cloud credential theft	IMDSv1 metadata API yields AWS/GCP/Azure credentials without authentication
Model IP theft	Access to internal model registries or storage buckets containing proprietary weights
Training data exposure	Datasets stored in accessible internal services
Lateral movement	Internal network access enables attacks on databases, registries, and other services
Supply chain risk	Compromised inference API can serve modified outputs to downstream applications
Sensitive data leakage	Inference logs and prompt histories may contain sensitive user queries

Mitigation and Remediation

Immediate Actions

1. Update LMDeploy immediately. Apply the patched version as soon as it is available. Check the official repository for the patched release addressing CVE-2026-33626.

2. Restrict network exposure. LMDeploy's inference API should never be directly exposed to the internet without an authentication proxy:

# Bind LMDeploy to localhost only — proxy through nginx with auth
lmdeploy serve api_server ./model --server-port 23333 --server-name 127.0.0.1

3. Block SSRF paths at the network layer. Implement egress filtering that prevents the LMDeploy server process from reaching internal metadata endpoints:

# Block AWS IMDSv1 metadata access from application servers
iptables -I OUTPUT -d 169.254.169.254 -j REJECT
 
# For cloud deployments, enforce IMDSv2 (token-required) and disable IMDSv1
aws ec2 modify-instance-metadata-options \
  --instance-id i-xxxx \
  --http-tokens required \
  --http-endpoint enabled

4. Implement authentication. Place an authenticated reverse proxy in front of any LMDeploy API endpoint:

location /v1/ {
    auth_basic "LMDeploy API";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://127.0.0.1:23333;
}

5. Audit for compromise indicators. Review server access logs for suspicious outbound requests or unusual API call patterns that could indicate SSRF exploitation.

Longer-Term Hardening

Deploy LMDeploy in an isolated network segment with no route to internal infrastructure
Use container runtime security tools (Falco, Tracee) to monitor for unexpected outbound connections
Rotate all cloud credentials and API keys accessible from the affected server
Enable detailed request logging to detect SSRF-pattern payloads

LMDeploy CVE-2026-33626 Flaw Exploited Within 13 Hours of Disclosure

What Is LMDeploy

The Vulnerability: CVE-2026-33626

How SSRF Works in This Context

Exploitation Under 13 Hours

Exposure Surface

Impact on AI Infrastructure

Mitigation and Remediation

Immediate Actions

Longer-Term Hardening

The Accelerating Exploitation Window

Sources

Anthropic MCP Design Vulnerability Enables RCE, Threatening AI Supply Chain

SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files

CISA: New Langflow Flaw Actively Exploited to Hijack AI Workflows

LMDeploy CVE-2026-33626 Flaw Exploited Within 13 Hours of Disclosure

What Is LMDeploy

The Vulnerability: CVE-2026-33626

How SSRF Works in This Context

Exploitation Under 13 Hours

Exposure Surface

Impact on AI Infrastructure

Mitigation and Remediation

Immediate Actions

Longer-Term Hardening

The Accelerating Exploitation Window

Sources

Anthropic MCP Design Vulnerability Enables RCE, Threatening AI Supply Chain

SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files

CISA: New Langflow Flaw Actively Exploited to Hijack AI Workflows