The Problem: Secret Sprawl
Every production application needs secrets: database passwords, API keys, TLS certificates, signing tokens. The path of least resistance — hardcoding them in source files, committing .env files, or sharing them in Slack — creates security risks that compound over time.
Common failure patterns:
- Secrets in source control — a
.envcommitted to a public GitHub repo exposes credentials to anyone who has ever cloned the repo, even after deletion - Long-lived static secrets — an AWS key that never rotates is a ticking time bomb; once leaked, it's usable forever
- No audit trail — no record of who accessed which secret, when, and from where
- Secret sprawl — the same database password stored in 5 different places, making rotation a multi-hour manual process
Proper secret management centralizes storage, enforces access control, provides an audit trail, and enables automated rotation.
Secret Storage Solutions
HashiCorp Vault
Vault is the most feature-complete secret management system. It provides:
- KV secrets engine — store arbitrary key-value secrets with versioning
- Dynamic secrets — generate short-lived database credentials on demand
- PKI secrets engine — issue TLS certificates as a CA
- Audit logging — every secret access is logged
# Store a secret
vault kv put secret/myapp/production \
DATABASE_URL='postgresql://user:pass@db:5432/myapp' \
SECRET_KEY='your-django-secret-key'
# Read a secret
vault kv get secret/myapp/production
# Read specific field
vault kv get -field=DATABASE_URL secret/myapp/production
Dynamic database credentials (Vault generates a unique DB user per request):
# Enable database secrets engine
vault secrets enable database
# Configure PostgreSQL connection
vault write database/config/myapp \
plugin_name=postgresql-database-plugin \
allowed_roles='app-role' \
connection_url='postgresql://vault:{{password}}@db:5432/myapp?sslmode=disable' \
username='vault' \
password='vault-admin-password'
# Define role with TTL
vault write database/roles/app-role \
db_name=myapp \
creation_statements="CREATE ROLE '{{name}}' WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';" \
default_ttl='1h' \
max_ttl='24h'
# Application fetches fresh credentials at startup
vault read database/creds/app-role
AWS Secrets Manager
Fully managed, integrates with IAM, and supports automatic rotation via Lambda:
# Store a secret
aws secretsmanager create-secret \
--name 'myapp/production/database' \
--secret-string '{"username":"app","password":"secret123"}'
# Retrieve in application
aws secretsmanager get-secret-value \
--secret-id 'myapp/production/database' \
--query SecretString \
--output text | jq -r .password
# Python SDK
import boto3
import json
def get_secret(secret_name: str) -> dict:
client = boto3.client('secretsmanager', region_name='us-east-1')
response = client.get_secret_value(SecretId=secret_name)
return json.loads(response['SecretString'])
config = get_secret('myapp/production/database')
DATABASE_URL = f"postgresql://{config['username']}:{config['password']}@db/myapp"
1Password CLI (op)
1Password is a developer-friendly option for teams already using it for password management:
# Retrieve a specific field
op item get 'project-myapp' --vault dev --fields database_url
# Inject secrets into environment and run command
op run --env-file=.env.tpl -- gunicorn config.wsgi
# .env.tpl (template, safe to commit)
DATABASE_URL=op://dev/project-myapp/database_url
SECRET_KEY=op://dev/project-myapp/secret_key
SENTRY_DSN=op://dev/project-myapp/sentry_dsn
op run resolves op:// references and injects them as real environment variables for the child process. The template file is safe to commit; the actual values never touch disk.
Injection Patterns
Environment Variables
The 12-Factor App methodology recommends environment variables as the primary injection mechanism. They are process-scoped (not world-readable like files) and work across all runtimes:
# systemd service with secrets injected
# /etc/systemd/system/gunicorn.service
[Service]
EnvironmentFile=/var/www/myapp/.env.prod
ExecStart=/var/www/myapp/.venv/bin/gunicorn config.wsgi
The .env.prod file on the server should be:
- Owned by root, readable only by the service user (
chmod 640) - Never committed to source control
- Populated from the secret manager during deployment
Mounted Secret Files
Kubernetes Secrets can be mounted as files — useful for TLS certificates and multi-line secrets:
spec:
volumes:
- name: tls-certs
secret:
secretName: myapp-tls
containers:
- name: app
volumeMounts:
- name: tls-certs
mountPath: /etc/certs
readOnly: true
Certificate Management
TLS certificates are time-limited secrets with hard expiry. An expired certificate causes immediate, user-visible failures.
Automated Renewal with Certbot
# Install certbot
sudo apt install certbot python3-certbot-nginx
# Obtain certificate (Cloudflare DNS challenge for wildcard)
certbot certonly \
--dns-cloudflare \
--dns-cloudflare-credentials ~/.cloudflare/credentials.ini \
-d 'example.com' \
-d '*.example.com'
# Renewal runs automatically via systemd timer
systemctl status certbot.timer
# Test renewal without actually renewing
certbot renew --dry-run
Kubernetes cert-manager
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: myapp-tls
spec:
secretName: myapp-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- myapp.example.com
cert-manager automatically renews certificates 30 days before expiry and updates the Kubernetes Secret, which triggers a rolling restart of pods mounting the certificate.
Rotation Strategy
Zero-Downtime Rotation
Rotating a secret requires both old and new values to be valid simultaneously during the transition — the dual-key period:
Phase 1: Add new secret alongside old (both valid)
Phase 2: Deploy new application version that reads new secret
Phase 3: Verify all instances use new secret
Phase 4: Invalidate old secret
Phase 5: Remove old secret from store
For API keys, this means having the provider support two simultaneous active keys during rotation. AWS IAM supports two access keys per user for this reason.
Emergency Revocation
If a secret is compromised, the priority is revocation speed:
# AWS: immediately disable a key
aws iam update-access-key \
--access-key-id AKIA... \
--status Inactive
# Vault: revoke a specific secret lease
vault lease revoke lease/id/here
# Vault: revoke all leases for a path (emergency)
vault lease revoke -prefix secret/myapp/production
Audit Logging
Every production secret store should have audit logging enabled:
# Vault audit log
vault audit enable file file_path=/var/log/vault/audit.log
# AWS CloudTrail (enabled by default in most accounts)
aws cloudtrail describe-trails
Audit logs should capture: who accessed the secret, when, from which IP/role, and whether the access was granted or denied. Set up alerts for unusual access patterns — off-hours access, unexpected IP ranges, or bulk reads.
A summary of secret management options:
| Tool | Best For | Cost | Rotation |
|---|---|---|---|
| HashiCorp Vault | Large teams, dynamic secrets | OSS free, Enterprise paid | Automated |
| AWS Secrets Manager | AWS-native apps | $0.40/secret/month | Lambda-based |
| 1Password Teams | Developer teams | Per-user SaaS | Manual |
| Doppler | Simple SaaS option | Free tier | Manual + webhooks |
| Kubernetes Secrets | K8s deployments | Free (built-in) | cert-manager |