Introduction
Docker containers aren't secure by default. Out of the box, a container running as root with unrestricted capabilities and an open network poses significant risk — a container escape or compromised application can quickly become a host compromise. The good news: Docker exposes a rich set of security controls that, when applied systematically, dramatically shrink your attack surface.
This guide walks through the most impactful hardening steps in a logical order — from image hygiene and runtime flags to host-level controls and automated auditing with Docker Bench Security. Each step is independent; you can apply them incrementally to existing environments without a full rebuild.
Prerequisites
Before you begin, ensure the following:
- Docker Engine 20.10 or later is installed and running on a Linux host
- You have
sudoor root access on the Docker host - The
dockerCLI is available in yourPATH - (Optional) Docker Compose v2 if you manage multi-container stacks
Verify your Docker version:
docker version --format '{{.Server.Version}}'Step 1 — Run Containers as Non-Root
The single highest-impact change: never run processes inside a container as UID 0. A process running as root inside a container is root on the host if a container escape occurs.
Option A: Specify a User in the Dockerfile
FROM debian:12-slim
# Create a dedicated app user
RUN groupadd --gid 1001 appgroup && \
useradd --uid 1001 --gid appgroup --shell /bin/sh --create-home appuser
# Drop to non-root for all subsequent layers
USER appuser:appgroup
WORKDIR /app
COPY --chown=appuser:appgroup . .
ENTRYPOINT ["./entrypoint.sh"]Option B: Override at Runtime
If you can't modify the image, pass --user at runtime:
docker run --user 1001:1001 nginx:alpineVerify the Running UID
docker exec <container_name> id
# Expected: uid=1001(appuser) gid=1001(appgroup)Step 2 — Drop Linux Capabilities
Docker grants containers a default set of Linux capabilities that most applications don't need. Stripping unnecessary capabilities limits what a compromised process can do.
Drop All, Add Only What's Required
# Drop all capabilities first, then add back only what the app needs
docker run \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
my-web-app:latestCommon capability reference:
| Capability | Purpose | Drop if... |
|---|---|---|
NET_RAW | Raw socket access | Not running a packet capture tool |
SYS_PTRACE | Process tracing | Not debugging live processes |
SYS_ADMIN | Broad sysadmin ops | Almost always — very dangerous |
NET_BIND_SERVICE | Bind to ports < 1024 | Running on port ≥ 1024 |
CHOWN | Change file ownership | App doesn't need chown |
SETUID / SETGID | Change process UIDs | App doesn't need privilege escalation |
In Docker Compose
services:
webapp:
image: my-web-app:latest
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICEStep 3 — Enable read-only Root Filesystem
Mount the container's root filesystem as read-only. Applications that need to write data should do so only to explicitly defined volumes or tmpfs mounts.
docker run \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64m \
--mount type=volume,source=app-data,target=/var/lib/app \
my-app:latestThe --tmpfs flag carves out in-memory writable space for temp files while keeping everything else immutable. The noexec and nosuid options on tmpfs prevent it from becoming an execution staging area.
Docker Compose equivalent
services:
webapp:
image: my-web-app:latest
read_only: true
tmpfs:
- /tmp:rw,noexec,nosuid,size=64m
volumes:
- app-data:/var/lib/appStep 4 — Prevent Privilege Escalation
The no-new-privileges flag prevents processes inside the container from gaining additional privileges via setuid or setgid binaries — even if the container is running as root.
docker run --security-opt no-new-privileges:true my-app:latestThis is a cheap, zero-downtime control that should be on by default for every container that doesn't explicitly need privilege escalation.
Step 5 — Enforce Resource Limits
An unrestrained container can exhaust CPU, memory, and file descriptors on the host — either through a runaway bug or a resource-exhaustion attack. Always set explicit limits.
docker run \
--memory=512m \
--memory-swap=512m \
--cpus=1.0 \
--pids-limit=100 \
--ulimit nofile=1024:1024 \
my-app:latest| Flag | What it limits |
|---|---|
--memory | RAM ceiling |
--memory-swap | RAM + swap (set equal to --memory to disable swap) |
--cpus | CPU core fraction (e.g. 0.5 = half a core) |
--pids-limit | Max processes/threads — limits fork bombs |
--ulimit nofile | Open file descriptor limit |
In Docker Compose
services:
webapp:
image: my-web-app:latest
deploy:
resources:
limits:
memory: 512m
cpus: "1.0"
pids_limit: 100
ulimits:
nofile:
soft: 1024
hard: 1024Step 6 — Isolate Container Networks
By default, all containers on the default bridge network can communicate with each other. Create dedicated networks and restrict cross-container traffic to only what's necessary.
# Create isolated networks per service tier
docker network create --driver bridge frontend-net
docker network create --driver bridge backend-net
docker network create --driver bridge db-net
# Web app can reach backend, but not the DB directly
docker run --network frontend-net --name webapp my-web-app:latest
docker run --network backend-net --network frontend-net --name api my-api:latest
docker run --network db-net --name postgres postgres:16-alpineDisable Inter-Container Communication on the Default Bridge
Edit /etc/docker/daemon.json:
{
"icc": false,
"iptables": true
}Reload Docker after editing daemon.json:
sudo systemctl reload dockerWarning: Setting
icc: falsebreaks any containers relying on direct inter-container communication on the default bridge. Test in a staging environment first and migrate to named networks before applying to production.
Step 7 — Harden Docker Images
Secure images are the foundation. Small, minimal images have fewer packages — and fewer CVEs.
Use Distroless or Slim Base Images
# Instead of ubuntu or debian full, use slim variants or distroless
FROM gcr.io/distroless/static-debian12:nonroot
# Or for Node.js
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
FROM gcr.io/distroless/nodejs22-debian12:nonroot
COPY --from=builder /app /app
WORKDIR /app
CMD ["server.js"]Pin Exact Image Digests in Production
Tags are mutable — latest can change under you. Pin by digest:
# Pull with the digest pinned
docker pull nginx@sha256:67682bda769fae1ccf5183192b8daf37b64cae99c6c3302650f6f8bf5f0f95dfIn a Dockerfile:
FROM nginx@sha256:67682bda769fae1ccf5183192b8daf37b64cae99c6c3302650f6f8bf5f0f95dfScan Images for CVEs with Trivy
# Install Trivy
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
# Scan an image before deploying
trivy image --severity HIGH,CRITICAL my-app:latestFix HIGH and CRITICAL findings before pushing to production. Integrate scanning into your CI pipeline to catch vulnerabilities at build time.
Step 8 — Manage Secrets Properly
Never bake secrets into images or pass them via environment variables in plain text — they're visible in docker inspect output and process tables.
Use Docker Secrets (Swarm Mode)
# Create a secret from a file
echo "super-secret-db-password" | docker secret create db_password -
# Reference in a service
docker service create \
--secret db_password \
--name webapp \
my-app:latestSecrets are mounted at /run/secrets/<secret_name> inside the container — readable only by the process user, never in env vars.
For Compose (Non-Swarm) — Use an Env File with Restricted Permissions
# .env.production — chmod 600, never committed to git
chmod 600 .env.production# docker-compose.yml
services:
webapp:
image: my-app:latest
env_file: .env.productionFor production workloads, integrate with HashiCorp Vault or a cloud secrets manager (AWS Secrets Manager, Azure Key Vault) and inject secrets at runtime via an init container or sidecar.
Step 9 — Enable AppArmor or seccomp Profiles
Apply a seccomp Profile
seccomp profiles filter which syscalls a container can make. Docker ships with a default seccomp profile that blocks ~44 dangerous syscalls. For tighter control, create a custom profile:
# Use the default Docker seccomp profile explicitly
docker run \
--security-opt seccomp=/etc/docker/seccomp/default.json \
my-app:latest
# Or disable only specific syscalls (advanced — requires testing)
# Download the default and customize it:
curl -o /tmp/seccomp-default.json \
https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.jsonApply an AppArmor Profile
# Load a custom AppArmor profile
apparmor_parser -r -W /etc/apparmor.d/docker-my-app
# Apply it at runtime
docker run \
--security-opt apparmor=docker-my-app \
my-app:latest
# Verify the profile is active
docker exec <container_id> cat /proc/1/attr/currentStep 10 — Audit with Docker Bench Security
Docker Bench Security is an automated CIS Docker Benchmark checker. Run it against your host to get a comprehensive audit report.
docker run --rm \
--net host \
--pid host \
--userns host \
--cap-add audit_control \
-e DOCKER_CONTENT_TRUST=$DOCKER_CONTENT_TRUST \
-v /etc:/etc:ro \
-v /lib/systemd/system:/lib/systemd/system:ro \
-v /usr/bin/containerd:/usr/bin/containerd:ro \
-v /usr/bin/runc:/usr/bin/runc:ro \
-v /usr/lib/systemd:/usr/lib/systemd:ro \
-v /var/lib:/var/lib:ro \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
--label docker_bench_security \
docker/docker-bench-securityThe output is organized by CIS benchmark section. Focus on [WARN] items first — [INFO] items are informational. Target a clean pass on sections 1 (Host Configuration), 2 (Docker Daemon Configuration), and 4 (Container Images).
Verification
After applying the hardening steps, verify your configuration:
# 1. Confirm container is running as non-root
docker exec <container_name> id
# 2. Check dropped capabilities
docker inspect <container_name> | jq '.[].HostConfig.CapDrop'
# 3. Verify read-only filesystem
docker inspect <container_name> | jq '.[].HostConfig.ReadonlyRootfs'
# 4. Check no-new-privileges is set
docker inspect <container_name> | jq '.[].HostConfig.SecurityOpt'
# 5. Verify memory limits
docker stats <container_name> --no-stream
# 6. Scan the running container's filesystem for CVEs
trivy image --input $(docker export <container_name> | trivy image --input -)
# 7. Run Docker Bench Security (see Step 10)A hardened container should show:
uid=<non-zero>in theidoutput["ALL"]inCapDroptrueforReadonlyRootfs["no-new-privileges:true"]inSecurityOpt
Troubleshooting
Container crashes with "permission denied" after adding --read-only
The application writes to a path that isn't covered by a tmpfs or volume mount. Identify which paths need to be writable:
# Run with strace to find write attempts (requires SYS_PTRACE cap temporarily)
docker run --cap-add SYS_PTRACE --read-only my-app:latest strace -e trace=write -p 1 2>&1 | grep "Permission denied"Add a --tmpfs mount for each writable path that doesn't need persistence, or a named volume for paths that do.
App breaks after dropping capabilities
Start with all caps dropped and add them back one at a time. Use docker run --cap-drop=ALL --cap-add=<CAP> ... and check application logs after each addition.
Docker Bench reports "icc" warning even after setting daemon.json
Ensure Docker was fully restarted (not just reloaded) after editing daemon.json:
sudo systemctl restart docker
# Verify the config was picked up
docker info | grep -i "icc"seccomp profile blocks a legitimate syscall
Check the container logs for EPERM errors and cross-reference the syscall number. Use strace to identify the blocked call, then add it to an allow-list in your custom seccomp JSON.
Secrets still visible in docker inspect
If you're using environment variables, migrate to Docker Secrets or a secrets manager. Environment variables set via --env or env_file are visible in inspect output — that's by design. There is no way to retroactively hide them short of migrating to the secrets API.
Summary
Docker security hardening is a layered discipline — no single control is sufficient on its own, but each layer you add meaningfully increases the cost and complexity of an attack.
The highest-ROI controls in order of impact:
- Non-root user — eliminates host UID 0 exposure from container escapes
--cap-drop=ALL— removes Linux capability attack surface--read-only— prevents in-container persistence and lateral movement staging--security-opt no-new-privileges:true— blocks setuid/setgid abuse- Network isolation — limits blast radius from a compromised container
- Resource limits — prevents DoS from runaway processes
- Image scanning — keeps CVE count low before deployment
- Docker Bench Security — continuously validates host and daemon posture
Apply these controls at image build time where possible (Dockerfile USER, distroless bases) and enforce the rest via a hardened docker run wrapper or a Compose template your team uses as a standard baseline. Integrate Trivy into CI to catch vulnerabilities before they reach production.
For teams running Kubernetes, the same principles apply — translate these controls to Pod Security Admission, SecurityContext, and Network Policies within your cluster.