Executive Summary
A critical heap out-of-bounds read vulnerability (CVE-2026-7482) has been disclosed in Ollama, the widely used open-source platform for running large language models (LLMs) locally. The vulnerability carries a CVSS score of 9.1 (Critical) and affects Ollama versions before 0.17.1.
CVSS Score: 9.1 (Critical) Attack Vector: Network Authentication Required: Dependent on deployment configuration
The flaw resides in Ollama's GGUF model file loader (fs/ggml/gguf.go) and is triggered via the /api/create endpoint. An attacker can supply a specially crafted GGUF file in which the declared tensor offset and size exceed the file's actual length. During quantization processing, the loader reads beyond the allocated heap buffer, enabling potential information disclosure (leaking heap memory contents), process crash (denial of service), or under specific conditions, a path toward code execution depending on heap layout.
Vulnerability Overview
| Attribute | Value |
|---|---|
| CVE ID | CVE-2026-7482 |
| CVSS Score | 9.1 (Critical) |
| CWE | CWE-125 — Out-of-Bounds Read |
| Type | Heap Out-of-Bounds Read |
| Attack Vector | Network |
| Attack Complexity | Low |
| Privileges Required | None (on default/unauthenticated deployments) |
| User Interaction | None |
| Affected Component | GGUF model loader — fs/ggml/gguf.go, server/quantization |
| Trigger Endpoint | /api/create |
| Fixed Version | Ollama 0.17.1 |
| Published | 2026-05-04 |
Affected Products
| Product | Affected Versions | Fixed Version |
|---|---|---|
| Ollama | All versions before 0.17.1 | 0.17.1 |
Ollama is the most widely adopted platform for running models like Llama 3, Mistral, Phi, Gemma, and other GGUF-format models locally. It is deployed across developer workstations, internal inference servers, AI application backends, and cloud-hosted instances — including many with the API exposed on public or semi-public network interfaces.
Technical Details
What Is GGUF?
GGUF (GPT-Generated Unified Format) is the binary file format used by llama.cpp and the broader LLM ecosystem to distribute quantized model weights. A GGUF file contains a header with metadata followed by tensor data blocks. Each tensor is described by a name, type, dimensions, an offset into the file, and a data size.
The Vulnerability
The Ollama GGUF loader in fs/ggml/gguf.go reads tensor metadata (offset + size) from the GGUF file header and uses those values to locate tensor data within a memory-mapped file buffer. The loader does not validate that tensor_offset + tensor_size <= file_size before performing the read operation.
When a malicious GGUF file is crafted with a tensor offset or size that exceeds the file's actual length:
Malicious GGUF tensor descriptor:
tensor_name: "malicious_tensor"
tensor_offset: 0x1000 # Valid offset within file
tensor_size: 0x99999999 # FAR exceeds actual file size
During quantization in server/quantization:
src_ptr = mmap_base + tensor_offset # Points within file mapping
memcpy(dst, src_ptr, tensor_size) # Reads PAST end of file mapping
# → Heap out-of-bounds readThe read operation goes beyond the mapped memory region, potentially reading adjacent heap allocations. This can:
- Leak heap memory contents — including other model data, API keys stored in memory, or internal process state
- Crash the Ollama process — if the read reaches unmapped memory (SIGSEGV)
- Enable information disclosure if the API returns processed quantization output that includes leaked heap bytes
Deployment Risk Amplifier
Many Ollama deployments expose the /api/create endpoint on 0.0.0.0:11434 with no authentication by default. This is intentional for local developer use but becomes a critical exposure when Ollama is deployed on cloud VMs, Docker containers with exposed ports, or internal servers accessible by untrusted parties. A single crafted HTTP POST with a malicious GGUF file is sufficient to trigger the vulnerability with no prior authentication.
# Example trigger (proof-of-concept, not weaponized)
curl -X POST http://[ollama-server]:11434/api/create \
-H "Content-Type: application/json" \
-d '{"name": "exploit", "modelfile": "FROM /path/to/malicious.gguf"}'Impact Assessment
| Impact Area | Description |
|---|---|
| Information Disclosure | Heap memory contents leaked — may include API keys, model metadata, or runtime state |
| Denial of Service | Ollama process crash from SIGSEGV on unmapped read — disrupts AI inference services |
| Memory Corruption Path | Depending on heap layout, heap OOB reads can escalate to write conditions in some exploit chains |
| Model Poisoning Vector | An attacker who can load arbitrary GGUF files can potentially influence model behavior |
| Backend API Exposure | If Ollama is a backend for an AI application, disrupting it can cascade to dependent services |
Recommendations
Immediate Actions
- Update Ollama to version 0.17.1 or later — this is the only complete fix:
# Linux/macOS update curl -fsSL https://ollama.com/install.sh | sh # Or via package manager if installed that way brew upgrade ollama # macOS with Homebrew - Verify the running version:
ollama --version # Should report 0.17.1 or higher
Defense-in-Depth (Even After Patching)
- Restrict
/api/createaccess — if model creation from external inputs is not required, block this endpoint at the network layer or via a reverse proxy:location /api/create { allow 127.0.0.1; deny all; } - Enable Ollama authentication — configure
OLLAMA_ORIGINSand use API key authentication for all non-localhost access - Isolate Ollama from the internet — ensure the Ollama port (default 11434) is not exposed on WAN interfaces:
# Bind Ollama to localhost only OLLAMA_HOST=127.0.0.1 ollama serve - Validate GGUF files before loading — implement a model integrity verification step before passing files to Ollama's
/api/create
For AI Application Developers
If you are building applications on top of Ollama:
- Never expose the Ollama API directly to end users — proxy requests through your application layer
- Validate and sanitize any user-supplied model paths or GGUF file inputs before forwarding to Ollama
- Implement rate limiting on model creation endpoints
Detection Indicators
| Indicator | Description |
|---|---|
| Ollama process crash (SIGSEGV) | Failed exploitation attempt or denial of service |
Unexpected /api/create calls from external IPs | Exploitation attempts |
| Abnormally large GGUF files in model creation requests | Potential malicious payload delivery |
| Unusual memory usage spikes on Ollama process | Possible exploitation in progress |
| AI inference service unavailability | Successful DoS via crash |
Broader Context: AI Infrastructure Security
CVE-2026-7482 is part of a growing pattern of vulnerabilities in AI model serving infrastructure. As organizations deploy local and cloud-hosted LLM servers, the attack surface expands significantly. Related recent vulnerabilities include:
- CVE-2026-5760 (SGLang CVSS 9.8) — RCE via malicious GGUF model files
- LMDeploy CVE-2026-33626 — Exploited within 13 hours of disclosure
- Gemini CLI RCE — Code execution via compromised model interactions
The pattern is clear: AI model files are an emerging attack vector. Treat GGUF files from untrusted sources with the same caution as executable code.
Post-Remediation Checklist
- Confirm Ollama 0.17.1+ is running on all deployment nodes
- Audit exposed API endpoints — verify
/api/createis not internet-accessible - Review model creation logs for suspicious activity prior to patching
- Rotate any credentials or API keys that were in memory during the exposure window
- Implement GGUF file validation in your model deployment pipeline going forward
- Monitor Ollama process health and set up alerting on unexpected crashes