CVE-2026-7482: Critical Heap Out-of-Bounds Read in Ollama GGUF Model Loader

Executive Summary

A critical heap out-of-bounds read vulnerability (CVE-2026-7482) has been disclosed in Ollama, the widely used open-source platform for running large language models (LLMs) locally. The vulnerability carries a CVSS score of 9.1 (Critical) and affects Ollama versions before 0.17.1.

CVSS Score: 9.1 (Critical) Attack Vector: Network Authentication Required: Dependent on deployment configuration

The flaw resides in Ollama's GGUF model file loader (fs/ggml/gguf.go) and is triggered via the /api/create endpoint. An attacker can supply a specially crafted GGUF file in which the declared tensor offset and size exceed the file's actual length. During quantization processing, the loader reads beyond the allocated heap buffer, enabling potential information disclosure (leaking heap memory contents), process crash (denial of service), or under specific conditions, a path toward code execution depending on heap layout.

Vulnerability Overview

Attribute	Value
CVE ID	CVE-2026-7482
CVSS Score	9.1 (Critical)
CWE	CWE-125 — Out-of-Bounds Read
Type	Heap Out-of-Bounds Read
Attack Vector	Network
Attack Complexity	Low
Privileges Required	None (on default/unauthenticated deployments)
User Interaction	None
Affected Component	GGUF model loader — `fs/ggml/gguf.go`, `server/quantization`
Trigger Endpoint	`/api/create`
Fixed Version	Ollama 0.17.1
Published	2026-05-04

Affected Products

Product	Affected Versions	Fixed Version
Ollama	All versions before 0.17.1	0.17.1

Ollama is the most widely adopted platform for running models like Llama 3, Mistral, Phi, Gemma, and other GGUF-format models locally. It is deployed across developer workstations, internal inference servers, AI application backends, and cloud-hosted instances — including many with the API exposed on public or semi-public network interfaces.

Technical Details

What Is GGUF?

GGUF (GPT-Generated Unified Format) is the binary file format used by llama.cpp and the broader LLM ecosystem to distribute quantized model weights. A GGUF file contains a header with metadata followed by tensor data blocks. Each tensor is described by a name, type, dimensions, an offset into the file, and a data size.

The Vulnerability

The Ollama GGUF loader in fs/ggml/gguf.go reads tensor metadata (offset + size) from the GGUF file header and uses those values to locate tensor data within a memory-mapped file buffer. The loader does not validate that tensor_offset + tensor_size <= file_size before performing the read operation.

When a malicious GGUF file is crafted with a tensor offset or size that exceeds the file's actual length:

Malicious GGUF tensor descriptor:
  tensor_name:   "malicious_tensor"
  tensor_offset: 0x1000          # Valid offset within file
  tensor_size:   0x99999999      # FAR exceeds actual file size
 
During quantization in server/quantization:
  src_ptr = mmap_base + tensor_offset          # Points within file mapping
  memcpy(dst, src_ptr, tensor_size)            # Reads PAST end of file mapping
                                                # → Heap out-of-bounds read

The read operation goes beyond the mapped memory region, potentially reading adjacent heap allocations. This can:

Leak heap memory contents — including other model data, API keys stored in memory, or internal process state
Crash the Ollama process — if the read reaches unmapped memory (SIGSEGV)
Enable information disclosure if the API returns processed quantization output that includes leaked heap bytes

Deployment Risk Amplifier

Many Ollama deployments expose the /api/create endpoint on 0.0.0.0:11434 with no authentication by default. This is intentional for local developer use but becomes a critical exposure when Ollama is deployed on cloud VMs, Docker containers with exposed ports, or internal servers accessible by untrusted parties. A single crafted HTTP POST with a malicious GGUF file is sufficient to trigger the vulnerability with no prior authentication.

# Example trigger (proof-of-concept, not weaponized)
curl -X POST http://[ollama-server]:11434/api/create \
  -H "Content-Type: application/json" \
  -d '{"name": "exploit", "modelfile": "FROM /path/to/malicious.gguf"}'

Impact Assessment

Impact Area	Description
Information Disclosure	Heap memory contents leaked — may include API keys, model metadata, or runtime state
Denial of Service	Ollama process crash from SIGSEGV on unmapped read — disrupts AI inference services
Memory Corruption Path	Depending on heap layout, heap OOB reads can escalate to write conditions in some exploit chains
Model Poisoning Vector	An attacker who can load arbitrary GGUF files can potentially influence model behavior
Backend API Exposure	If Ollama is a backend for an AI application, disrupting it can cascade to dependent services

Recommendations

Immediate Actions

Update Ollama to version 0.17.1 or later — this is the only complete fix:

# Linux/macOS update
curl -fsSL https://ollama.com/install.sh | sh
 
# Or via package manager if installed that way
brew upgrade ollama   # macOS with Homebrew

Verify the running version:

ollama --version
# Should report 0.17.1 or higher

Defense-in-Depth (Even After Patching)

Restrict /api/create access — if model creation from external inputs is not required, block this endpoint at the network layer or via a reverse proxy:
```
location /api/create {
    allow 127.0.0.1;
    deny all;
}
```
Enable Ollama authentication — configure OLLAMA_ORIGINS and use API key authentication for all non-localhost access
Isolate Ollama from the internet — ensure the Ollama port (default 11434) is not exposed on WAN interfaces:
```
# Bind Ollama to localhost only
OLLAMA_HOST=127.0.0.1 ollama serve
```
Validate GGUF files before loading — implement a model integrity verification step before passing files to Ollama's /api/create

For AI Application Developers

If you are building applications on top of Ollama:

Never expose the Ollama API directly to end users — proxy requests through your application layer
Validate and sanitize any user-supplied model paths or GGUF file inputs before forwarding to Ollama
Implement rate limiting on model creation endpoints

Detection Indicators

Indicator	Description
Ollama process crash (SIGSEGV)	Failed exploitation attempt or denial of service
Unexpected `/api/create` calls from external IPs	Exploitation attempts
Abnormally large GGUF files in model creation requests	Potential malicious payload delivery
Unusual memory usage spikes on Ollama process	Possible exploitation in progress
AI inference service unavailability	Successful DoS via crash

Broader Context: AI Infrastructure Security

CVE-2026-7482 is part of a growing pattern of vulnerabilities in AI model serving infrastructure. As organizations deploy local and cloud-hosted LLM servers, the attack surface expands significantly. Related recent vulnerabilities include:

CVE-2026-5760 (SGLang CVSS 9.8) — RCE via malicious GGUF model files
LMDeploy CVE-2026-33626 — Exploited within 13 hours of disclosure
Gemini CLI RCE — Code execution via compromised model interactions

The pattern is clear: AI model files are an emerging attack vector. Treat GGUF files from untrusted sources with the same caution as executable code.

Post-Remediation Checklist

Confirm Ollama 0.17.1+ is running on all deployment nodes
Audit exposed API endpoints — verify /api/create is not internet-accessible
Review model creation logs for suspicious activity prior to patching
Rotate any credentials or API keys that were in memory during the exposure window
Implement GGUF file validation in your model deployment pipeline going forward
Monitor Ollama process health and set up alerting on unexpected crashes

References