The breakneck pace of AI adoption is creating a new and sprawling attack surface — and a landmark security audit of over one million exposed AI service endpoints reveals the security posture is, by many measures, worse than the early days of cloud computing.
Researchers scanning the internet for publicly accessible self-hosted LLM infrastructure found a pattern of absent authentication, exposed API keys, open model management interfaces, and unprotected inference endpoints across a staggering number of deployments. The findings, published this week in The Hacker News, argue that the software industry's hard-won progress on secure defaults is being eroded by the rush to ship AI capabilities.
What the Scan Found
The research team identified over one million publicly reachable AI service endpoints, including inference APIs, model management dashboards, vector databases, and AI orchestration platforms. Key findings from the audit:
| Issue | Scope |
|---|---|
| Unauthenticated inference APIs | Tens of thousands of open Ollama, LM Studio, and similar endpoints accepting requests from any IP |
| Exposed model management UIs | Admin dashboards (Hugging Face local, OpenWebUI, Flowise) with no login required |
| Leaked API keys in responses | Misconfigured reverse proxies forwarding upstream provider keys in response headers |
| Open vector database instances | Qdrant, Weaviate, and Chroma instances reachable without credentials, containing embeddings of sensitive documents |
| Unprotected LangChain/LlamaIndex APIs | Orchestration endpoints accepting arbitrary tool calls with no access control |
The researchers noted that many of these services were deployed by individual developers or small teams under deadline pressure — the common thread being that security was deprioritized in favor of getting the AI feature working.
Why This Matters Beyond Chatbots
The risk isn't just that someone can make free API calls to a company's self-hosted LLM. The deeper concerns are:
Data Exfiltration via Open Inference Endpoints
Many self-hosted AI deployments are purpose-built for internal document analysis — legal contracts, medical records, financial reports. An open inference endpoint means any attacker who can reach it can query the model with prompts designed to extract training data or retrieve documents the model was fine-tuned on.
Prompt Injection at Scale
Open orchestration APIs (LangChain agents, AutoGPT-style systems, n8n AI nodes) are often wired to internal tools — databases, filesystems, email, Slack. An attacker with access to the inference API can craft prompts that cause the AI agent to exfiltrate data, send messages, or invoke destructive tool calls.
Credential Theft via Header Leakage
Multiple instances were found where misconfigured Nginx or Caddy reverse proxies forwarded the Authorization: Bearer sk-... header from upstream AI providers — effectively handing attacker the organization's OpenAI, Anthropic, or Mistral API keys in the HTTP response.
Supply Chain Risk via Poisoned Models
Open model management interfaces allow arbitrary model uploads. An attacker who can reach an admin dashboard can replace a production model with a malicious one that exfiltrates prompts, injects content into responses, or behaves differently for specific users.
The Root Cause: Secure Defaults Are Broken
The researchers point to a systemic failure in how AI tooling is packaged and documented:
- Ollama binds to
0.0.0.0by default on some platforms, making it publicly accessible if the host has a public IP and no firewall rule - OpenWebUI does not enforce authentication on the API path even when the web UI login is enabled
- Many vector databases ship with authentication disabled by default, relying on network-level isolation that developers often do not configure
- AI development tutorials routinely skip security setup steps to reduce friction, normalizing insecure configurations
This mirrors the early days of MongoDB and Elasticsearch, where thousands of unprotected databases were exposed to the internet because the default configuration assumed a trusted network — an assumption that rarely held in practice.
How Attackers Are Exploiting This
The research noted evidence of active exploitation, including:
- Cryptomining campaigns using free inference capacity on open Ollama endpoints to run GPU-accelerated workloads
- Data scraping operations extracting business-sensitive context from RAG-enabled AI systems
- Reconnaissance tooling that identifies and catalogs open AI endpoints for later targeting
Attack tooling for enumerating and exploiting open AI services is now circulating on underground forums, lowering the barrier for less sophisticated actors.
Recommendations for Organizations Deploying Self-Hosted AI
Authentication Is Non-Negotiable
Every AI inference endpoint, model management API, and vector database must require authentication — regardless of whether it is "internal only." Network perimeter assumptions are unreliable.
# Nginx basic auth for Ollama
location /api/ {
auth_basic "AI Services";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:11434;
}Bind to Localhost, Not 0.0.0.0
Self-hosted AI services should bind to 127.0.0.1 by default, with a reverse proxy handling external access. Review OLLAMA_HOST, CHROMA_HOST, and similar environment variables in your deployment.
Audit Your Exposure
Use tools like Shodan, Censys, or your cloud provider's internet-exposure scanner to identify any AI service ports reachable from the internet:
# Check what's listening on common AI ports
ss -tlnp | grep -E "11434|8080|6333|8000|3000|7860"Rotate Any Exposed Keys
If your reverse proxy or API gateway forwards upstream AI provider credentials, audit your configuration and rotate any keys that may have been exposed.
Treat AI Infrastructure as Production Systems
Developer laptops with Ollama running are not production systems. AI inference infrastructure that touches business data must be subject to the same security controls as any other production service: authentication, network segmentation, logging, and patch management.