Overview
This checklist gates every new application release before it reaches production. Work through each section in order — all items must be complete or formally waived with a recorded justification before go-live. Use it alongside your CI/CD pipeline as a mandatory sign-off document.
1. Pre-Deployment Security Review
Complete this section before any code reaches staging. Each item requires a named reviewer and a timestamp to count as done.
-
Threat model reviewed and updated — Confirm the STRIDE/PASTA model reflects the current architecture, new attack surfaces, and trust boundaries introduced in this release
# Open the current threat model document # Verify last-reviewed date is within 90 days -
OWASP Top 10 walkthrough complete — Walk each category (injection, broken auth, XSS, IDOR, etc.) against the new code and record findings or clearances
-
Code review sign-off recorded — Minimum two reviewers; at least one must be a senior engineer outside the feature team; review comment thread closed
-
Static analysis (SAST) passed — Zero unresolved high/critical findings from the SAST tool
# Example: Semgrep semgrep --config=auto --severity=ERROR src/ # Example: CodeQL (GitHub Actions) gh run view --log | grep "SARIF results" -
Penetration test results addressed — If a pentest was scoped for this release, all critical and high findings are remediated or risk-accepted in writing
-
Sensitive data flow documented — PII, PHI, and payment card data flows are mapped and confirmed to meet applicable compliance scope (GDPR, PCI-DSS, HIPAA)
2. Dependency & Supply Chain
Software supply chain attacks are a primary vector. Lock down every external dependency before shipping.
-
npm/pip/cargo audit clean — Run the package manager audit and resolve all critical and high vulnerabilities
# Node.js npm audit --audit-level=high # Python pip-audit -r requirements.txt # Rust cargo audit -
Dependency versions pinned — No floating ranges (
^,~,*) in production lock files; use exact versions or commit-pinned SHAs# Verify no floating versions remain grep -E '"\^|"~|"\*' package.json -
License compliance verified — Run a license scanner and confirm all third-party licenses are approved for commercial use
npx license-checker --onlyAllow "MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC" -
SRI hashes applied to all CDN-loaded assets — Every externally hosted script and stylesheet uses
integrityandcrossoriginattributes<!-- Example --> <script src="https://cdn.example.com/lib.js" integrity="sha384-<hash>" crossorigin="anonymous"></script> -
Container base image pinned and scanned — Docker images reference an exact digest, not
latest; Trivy or Grype scan shows no critical CVEsdocker pull node:20.11.0-alpine3.19 trivy image node:20.11.0-alpine3.19 --severity CRITICAL,HIGH -
Signed commits enforced on release branch — GPG or SSH signing required; CI rejects unsigned merge commits
git log --show-signature -5
3. Secret Management
A single leaked secret can compromise an entire environment. Verify the full secret lifecycle before deploying.
-
No hardcoded secrets in source — Run a secret scanner across all commits in the release branch, not just the HEAD
# Gitleaks gitleaks detect --source . --log-opts="origin/main..HEAD" # truffleHog trufflehog git file://. --since-commit origin/main -
Vault or secrets manager integration verified — Application retrieves secrets at runtime from HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or equivalent — no secrets baked into images or config files
-
Secret rotation schedule documented — Each secret has an owner, a maximum age (≤90 days for API keys, ≤365 days for certs), and an automated or calendared rotation trigger
-
.envfiles excluded from all build artifacts — Confirm.env,.env.local, and.env.productionare in.gitignoreand.dockerignoregit check-ignore -v .env .env.local .env.production -
CI/CD secret injection reviewed — Pipeline secrets are stored in the CI provider's secret store (not in YAML), scoped to the minimum required jobs, and masked in logs
-
Database credentials rotated pre-launch — Production database credentials are fresh and not shared with any non-production environment
4. Infrastructure & Configuration
Misconfiguration is the leading cause of cloud data breaches. Verify every perimeter control before traffic arrives.
-
TLS certificate valid and auto-renewing — Certificate chain is complete, expiry is >30 days out, and ACME/cert-manager renewal is confirmed working
openssl s_client -connect yourdomain.com:443 -servername yourdomain.com </dev/null 2>/dev/null \ | openssl x509 -noout -dates -
Security headers configured and tested — CSP, HSTS (min 1 year + includeSubDomains), X-Content-Type-Options, X-Frame-Options, Referrer-Policy; validate with securityheaders.com or equivalent
curl -sI https://yourdomain.com | grep -iE "strict-transport|content-security|x-frame|x-content-type" -
CORS policy locked down —
Access-Control-Allow-Originis not*on any authenticated endpoint; allowed origins are an explicit allowlist -
Rate limiting and request throttling active — Per-IP and per-user rate limits enforced at the edge (WAF, reverse proxy, or application middleware); thresholds load-tested
-
Firewall/security group rules reviewed — Only ports 80 and 443 open inbound to the application; admin ports (SSH, DB) not exposed to the public internet
# AWS example aws ec2 describe-security-groups --group-ids sg-xxxx \ --query 'SecurityGroups[*].IpPermissions' -
WAF rules enabled and tuned — Web Application Firewall is active with at minimum OWASP CRS; custom rules in place for known attack patterns against this application
-
Database network access restricted — Production database reachable only from the application's VPC/subnet; no public endpoint exposed
5. Monitoring & Observability
You cannot respond to what you cannot see. Confirm every telemetry layer is live before routing production traffic.
-
APM agent deployed and reporting — Application Performance Monitoring (Datadog, New Relic, Elastic APM) is sending traces; P50/P95/P99 latency visible in the dashboard
-
Error tracking configured — Sentry or equivalent is capturing exceptions with source maps, environment tags, and release version; test by triggering a known error in staging
# Verify Sentry DSN is set printenv | grep SENTRY_DSN -
Health and readiness endpoints live —
/healthreturns 200 with service status;/readyconfirms downstream dependencies (DB, cache, external APIs) are reachablecurl -sf https://yourdomain.com/health | jq . -
Structured logs shipping to aggregator — JSON logs flowing to your log management platform (Loki, CloudWatch, Datadog Logs); log retention meets compliance minimums
-
Alerting thresholds set — PagerDuty/OpsGenie rules configured for: error rate >1%, P95 latency >2 s, health check failure >2 consecutive, CPU/memory >80% sustained 5 min
-
Uptime monitoring active — External synthetic monitor (Uptime Robot, Better Stack, or similar) pinging the primary URL every 60 s from at least two regions
-
Dashboard created and shared — A shared ops dashboard covering request rate, error rate, latency, and infrastructure metrics is bookmarked for the on-call rotation
6. Rollback & Recovery
Every deployment must have a tested escape hatch. Define the rollback procedure before you deploy, not during an incident.
-
Deployment strategy documented — Blue-green, canary, or rolling update strategy is written down, rehearsed in staging, and the cutover/rollback steps are in the runbook
-
Previous version artifacts retained — The last known-good container image or build artifact is tagged and accessible in the registry for immediate rollback
# Example: list recent image tags docker images yourdomain/app --format "{{.Tag}}\t{{.CreatedAt}}" | head -5 -
Database migration rollback script tested — Every schema migration has a corresponding
downmigration that has been executed successfully in a staging environment# Prisma example npx prisma migrate status # Flyway example flyway -url=... undo -
Feature flags in place for high-risk changes — New features behind flags (LaunchDarkly, Unleash, or custom) so they can be disabled without a redeploy
-
Rollback SLA defined — Maximum acceptable time-to-rollback is documented (e.g., "rollback complete within 15 minutes of go decision") and achievable with the current tooling
-
Data backup verified pre-deployment — Confirmed a successful backup exists from within the last 24 hours; restore was tested within the last 30 days
# Verify latest backup timestamp aws s3 ls s3://your-backup-bucket/ --recursive | sort | tail -5
7. Post-Deployment Verification
The deployment is not done when the pipeline goes green. Complete these steps within 30 minutes of go-live.
-
Smoke test suite executed — Automated smoke tests covering all critical user paths (login, core workflow, payment if applicable) pass against production
# Playwright example npx playwright test --project=smoke --reporter=line -
Dynamic security scan (DAST) run — OWASP ZAP or Nuclei scan against the live production URL; no new critical findings
docker run --rm zaproxy/zap-stable zap-baseline.py \ -t https://yourdomain.com -r zap-report.html -
Performance baseline captured — Run a 5-minute load test and record P50/P95/P99 latency and error rate as the new baseline for this release
# k6 example k6 run --vus 50 --duration 5m smoke-load.js -
SSL Labs / security posture verified — SSL Labs scan returns A or A+ rating; no mixed-content warnings in browser console
curl -s "https://api.ssllabs.com/api/v3/analyze?host=yourdomain.com" | jq '.grade' -
Dependency graph and SBOM published — Software Bill of Materials generated and attached to the release in the artifact store or GitHub release
# CycloneDX for Node.js npx @cyclonedx/cyclonedx-npm --output-file sbom.json -
Runbook and on-call rotation updated — On-call engineer has been briefed on new failure modes, the runbook reflects the new deployment, and PagerDuty escalation policy is correct
-
Incident readiness confirmed — War-room channel created and linked in the runbook, rollback decision authority is named, and the team has confirmed they can reach each other out-of-hours
Quick Reference
| Section | Gate Owner | Blocking? |
|---|---|---|
| Pre-Deployment Security Review | Security Team | Yes |
| Dependency & Supply Chain | Platform Engineering | Yes |
| Secret Management | Security Team | Yes |
| Infrastructure & Configuration | Infrastructure | Yes |
| Monitoring & Observability | Platform Engineering | Yes |
| Rollback & Recovery | Delivery Lead | Yes |
| Post-Deployment Verification | Delivery Lead | Yes — must complete within 30 min of go-live |