CVE-2025-15036: MLflow Path Traversal in Archive Extraction

Executive Summary

CVE-2025-15036 is a critical path traversal vulnerability in the MLflow open-source machine learning platform, specifically in the extract_archive_to_dir function located in mlflow/pyfunc/dbconnect_artifact_cache.py. The flaw allows an attacker to supply a crafted tar archive containing member paths with directory traversal sequences (../), enabling arbitrary file writes outside the intended extraction directory on the host system.

CVSS Score: 9.6 (Critical)

All versions of MLflow before v3.7.0 are affected. Organizations using MLflow to manage model artifacts, experiments, or deployment pipelines should treat this as a priority upgrade.

Vulnerability Overview

Attribute	Value
CVE ID	CVE-2025-15036
CVSS Score	9.6 (Critical)
Type	Path Traversal / Arbitrary File Write
Attack Vector	Network
Privileges Required	Low
User Interaction	None
Scope	Changed
Confidentiality Impact	High
Integrity Impact	High
Availability Impact	High
Affected Component	`mlflow/pyfunc/dbconnect_artifact_cache.py`
Vulnerable Function	`extract_archive_to_dir()`
Fixed Version	MLflow v3.7.0

Affected Products

Product	Affected Versions	Remediation
MLflow (mlflow/mlflow)	All versions < v3.7.0	Upgrade to v3.7.0 or later

Technical Analysis

Root Cause

The vulnerability exists in the extract_archive_to_dir function within mlflow/pyfunc/dbconnect_artifact_cache.py. When MLflow extracts a tar archive as part of its artifact caching or model loading workflow, it iterates over the archive members and writes each file to the target directory. The function does not validate or sanitize the member paths before extraction.

A tar archive member with a path such as ../../../etc/cron.d/malicious will be written relative to the extraction root, traversing up the directory tree and landing in a completely different location on the filesystem — outside the intended extraction directory.

This is a classic "zip slip" or "tar slip" vulnerability pattern, where archive extraction without path validation enables directory traversal.

Vulnerable Code Flow

# Simplified conceptual representation of the vulnerable pattern:
def extract_archive_to_dir(archive_path, target_dir):
    with tarfile.open(archive_path) as tar:
        for member in tar.getmembers():
            # NO validation of member.name for path traversal
            tar.extract(member, path=target_dir)  # VULNERABLE

The safe pattern requires validating that the resolved extraction path remains within target_dir:

import os
 
def safe_extract(archive_path, target_dir):
    target_dir = os.path.realpath(target_dir)
    with tarfile.open(archive_path) as tar:
        for member in tar.getmembers():
            member_path = os.path.realpath(os.path.join(target_dir, member.name))
            if not member_path.startswith(target_dir + os.sep):
                raise ValueError(f"Path traversal detected: {member.name}")
        tar.extractall(target_dir)

Attack Scenarios

Scenario 1: Malicious Model Artifact via Remote Registry

1. Attacker publishes a model artifact to an MLflow registry or artifact store
2. The artifact archive contains crafted paths with ../../ sequences
3. MLflow server or client extracts the artifact using extract_archive_to_dir
4. Attacker's files are written to attacker-controlled paths outside the extraction dir
5. Depending on MLflow process privileges: config overwrite, cron job injection, etc.

Scenario 2: Collaborative Research Environment

1. Multi-user MLflow deployment (shared tracking server, Databricks, etc.)
2. Malicious researcher or compromised account uploads a crafted model
3. Server-side extraction via extract_archive_to_dir triggers path traversal
4. Files written to sensitive server paths — potentially achieving RCE

Scenario 3: CI/CD Pipeline Attack

1. MLflow is integrated into a CI/CD pipeline for model training/deployment
2. A compromised upstream dependency or dataset produces a crafted archive
3. MLflow extraction during pipeline execution writes attacker files to the build agent
4. Build agent compromise leads to secrets theft or pipeline backdoor

Why CVSS 9.6?

The high CVSS score reflects:

Network-reachable attack path in server deployments
No user interaction required once the archive is delivered
Scope change — impact extends beyond the MLflow process to the host filesystem
High CIA triad impact — arbitrary write enables complete host compromise in many configurations

Impact Assessment

Impact Area	Description
Arbitrary File Write	Files written to any path the MLflow process can access on disk
Remote Code Execution	Writing to cron jobs, startup scripts, or web roots may enable RCE
Data Integrity	Overwriting existing configuration or application files
Privilege Escalation	If MLflow runs as root or high-privilege account, OS-level impact
Persistent Backdoors	Attackers can write persistent malware or SSH keys
Multi-tenant Risk	Shared MLflow servers expose all tenants to exploit by any tenant

Remediation

Upgrade MLflow to v3.7.0 or Later

# Upgrade MLflow
pip install --upgrade mlflow
 
# Verify installed version
python -c "import mlflow; print(mlflow.__version__)"
# Expected: 3.7.0 or higher
 
# In a conda environment
conda update -c conda-forge mlflow

Verify Your Installation

# Check if vulnerable version is installed
pip show mlflow | grep Version
# Versions below 3.7.0 are vulnerable
 
# If using requirements.txt, update the pin
grep -r "mlflow" requirements*.txt
# Ensure version is pinned to >=3.7.0

Mitigations If Immediate Upgrade Is Not Possible

# Audit all locations where tar/archive extraction is performed in your MLflow workflows
# Ensure all archive sources are from trusted, integrity-verified sources
 
# Verify artifact integrity before extraction
import hashlib
 
def verify_artifact_hash(artifact_path, expected_hash):
    sha256 = hashlib.sha256()
    with open(artifact_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            sha256.update(chunk)
    return sha256.hexdigest() == expected_hash

Environment Hardening

# Run MLflow with a dedicated low-privilege service account
# Limit filesystem access to only necessary directories
 
# Example: restrict MLflow artifact directory
chmod 750 /opt/mlflow/artifacts
chown mlflow:mlflow /opt/mlflow/artifacts
 
# If using Docker, mount artifact volumes with limited scope
docker run --user mlflow \
  -v /data/mlflow-artifacts:/artifacts:rw \
  mlflow-server

Detection Indicators

Indicator	Description
Files created outside MLflow artifact directories	Unexpected path traversal write
Unexpected cron jobs or startup scripts appearing post-extraction	Possible exploitation
MLflow process creating files in `/etc`, `/tmp`, home directories	Path traversal indicator
Model artifacts uploaded with unusual member path patterns	Pre-exploitation attempt
Audit logs showing archive extraction failures with path validation errors	Attack blocked (post-patch)

Post-Remediation Checklist

Upgrade all MLflow installations to v3.7.0 or later
Audit artifact stores — review recently uploaded model artifacts for suspicious archive contents
Check filesystem — look for unexpected files in sensitive directories that may have been written during exploitation
Rotate credentials — if exploitation is suspected, rotate any secrets accessible by the MLflow process
Review access controls — ensure MLflow service accounts follow least-privilege principles
Enable artifact signing/verification — implement integrity checks on model artifacts in your registry
Monitor extraction paths — add runtime monitoring for unexpected file writes by MLflow processes

References