Overview
Self-hosting at scale is mostly a problem of organization. Once you cross 20-ish containers, a single docker-compose.yml becomes unmaintainable, services stomp on each other's certificates, and any change risks accidental downtime for the rest of the stack. This writeup walks through how the CosmicBytez homelab keeps 72 containers stable across multiple compose stacks, fronted by a single Traefik instance with automatic Let's Encrypt TLS and SSO via Authentik.
Stack
- Orchestration: Docker Compose 2.20+
- Reverse Proxy: Traefik 3.6 (CloudFlare DNS challenge, wildcard TLS)
- Authentication: Authentik SSO (forward auth middleware)
- Monitoring: Prometheus + Grafana + Loki + Promtail
- Dashboard: Homepage v1.10.1 (gethomepage.dev)
- IDS/IPS: CrowdSec v1.7.6
- Databases: PostgreSQL 16-17, MariaDB 10.11.16, Redis 7.4.7
Compose Stack Layout
| Stack File | Services |
|---|---|
compose.yml | Master file (includes all stacks) |
stack-core-infra.yml | Traefik, CrowdSec, Homepage, Portainer |
stack-arr.yml | Sonarr, Radarr, Lidarr, Bazarr, Prowlarr, Seerr |
stack-media-books.yml | AudioBookshelf, Calibre-Web, Bookshelf |
stack-monitoring.yml | Prometheus, Grafana, Loki, Promtail, exporters |
authentik/docker-compose.yml | SSO (server, worker, PostgreSQL, Redis) |
wireguard/docker-compose.yml | VPN + qBittorrent (network namespace tunneling) |
nextcloud/docker-compose.yml | File sync (Nextcloud, MariaDB, Redis) |
jellyfin/docker-compose.yml | Media server |
mealie/docker-compose.yml | Recipe manager |
frigate/docker-compose.yml | NVR and object detection |
Traefik Middleware
| File | Purpose |
|---|---|
middlewares-authentik.yml | Forward auth to Authentik outpost + chain-no-auth |
middlewares-crowdsec.yml | CrowdSec bouncer plugin |
middlewares-headers.yml | Security headers (HSTS, X-Frame-Options, CSP) |
external-services.yml | Routes to host-network services like Frigate |
Networks
| Network | Purpose |
|---|---|
proxy | External — all Traefik-routed services |
wireguard_lan_static | Internal LAN access |
Scheduled Tasks (Cron)
| Schedule | Task |
|---|---|
0 2 * * 0 | Kometa weekly (franchise collections) |
0 3 * * 1-6 | Kometa nightly |
0 4 * * * | docker system prune -f (daily cleanup) |
0 5 * * 0 | update.sh (weekly stack update) |
*/15 * * * * | qBittorrent memory watchdog (8GB threshold) |
Lessons Learned
- qBittorrent inside a network namespace:
network_mode: service:wireguardforces traffic through the VPN. Traefik labels go on the WireGuard service, not qBittorrent. - qBittorrent memory leak: libtorrent leaks memory; a watchdog auto-restarts at the 8GB threshold.
- Nextcloud trusted proxies: must match the Traefik subnet exactly, or login redirects loop.
- Frigate uses
network_mode: host: routed viaexternal-services.ymlbecause Traefik can't see it on a Docker network. - Loki is distroless: no shell, wget, or curl, so Docker healthchecks must be omitted.
- All images pinned to exact patch versions:
:latestis banned in this stack — exceptions are forks tracked by branch.
Security Hardening
no-new-privileges:trueon every service- Docker socket mounted read-only for Traefik and Portainer
- CloudFlare API token stored as a Docker secret
- All
.envfiles gitignored - cAdvisor uses
cap_addinstead ofprivileged: true - Every UI service sits behind Authentik forward auth — no anonymous Grafana, no anonymous Loki