Secret Management for API Keys, Tokens, and Certificates

The Problem: Secret Sprawl

Every production application needs secrets: database passwords, API keys, TLS certificates, signing tokens. The path of least resistance — hardcoding them in source files, committing .env files, or sharing them in Slack — creates security risks that compound over time.

Common failure patterns:

Secrets in source control — a .env committed to a public GitHub repo exposes credentials to anyone who has ever cloned the repo, even after deletion
Long-lived static secrets — an AWS key that never rotates is a ticking time bomb; once leaked, it's usable forever
No audit trail — no record of who accessed which secret, when, and from where
Secret sprawl — the same database password stored in 5 different places, making rotation a multi-hour manual process

Proper secret management centralizes storage, enforces access control, provides an audit trail, and enables automated rotation.

Secret Storage Solutions

HashiCorp Vault

Vault is the most feature-complete secret management system. It provides:

KV secrets engine — store arbitrary key-value secrets with versioning
Dynamic secrets — generate short-lived database credentials on demand
PKI secrets engine — issue TLS certificates as a CA
Audit logging — every secret access is logged

# Store a secret
vault kv put secret/myapp/production \
  DATABASE_URL='postgresql://user:pass@db:5432/myapp' \
  SECRET_KEY='your-django-secret-key'

# Read a secret
vault kv get secret/myapp/production

# Read specific field
vault kv get -field=DATABASE_URL secret/myapp/production

Dynamic database credentials (Vault generates a unique DB user per request):

# Enable database secrets engine
vault secrets enable database

# Configure PostgreSQL connection
vault write database/config/myapp \
  plugin_name=postgresql-database-plugin \
  allowed_roles='app-role' \
  connection_url='postgresql://vault:{{password}}@db:5432/myapp?sslmode=disable' \
  username='vault' \
  password='vault-admin-password'

# Define role with TTL
vault write database/roles/app-role \
  db_name=myapp \
  creation_statements="CREATE ROLE '{{name}}' WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';" \
  default_ttl='1h' \
  max_ttl='24h'

# Application fetches fresh credentials at startup
vault read database/creds/app-role

AWS Secrets Manager

Fully managed, integrates with IAM, and supports automatic rotation via Lambda:

# Store a secret
aws secretsmanager create-secret \
  --name 'myapp/production/database' \
  --secret-string '{"username":"app","password":"secret123"}'

# Retrieve in application
aws secretsmanager get-secret-value \
  --secret-id 'myapp/production/database' \
  --query SecretString \
  --output text | jq -r .password

# Python SDK
import boto3
import json

def get_secret(secret_name: str) -> dict:
    client = boto3.client('secretsmanager', region_name='us-east-1')
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response['SecretString'])

config = get_secret('myapp/production/database')
DATABASE_URL = f"postgresql://{config['username']}:{config['password']}@db/myapp"

1Password CLI (op)

1Password is a developer-friendly option for teams already using it for password management:

# Retrieve a specific field
op item get 'project-myapp' --vault dev --fields database_url

# Inject secrets into environment and run command
op run --env-file=.env.tpl -- gunicorn config.wsgi

# .env.tpl (template, safe to commit)
DATABASE_URL=op://dev/project-myapp/database_url
SECRET_KEY=op://dev/project-myapp/secret_key
SENTRY_DSN=op://dev/project-myapp/sentry_dsn

op run resolves op:// references and injects them as real environment variables for the child process. The template file is safe to commit; the actual values never touch disk.

Injection Patterns

Environment Variables

The 12-Factor App methodology recommends environment variables as the primary injection mechanism. They are process-scoped (not world-readable like files) and work across all runtimes:

# systemd service with secrets injected
# /etc/systemd/system/gunicorn.service
[Service]
EnvironmentFile=/var/www/myapp/.env.prod
ExecStart=/var/www/myapp/.venv/bin/gunicorn config.wsgi

The .env.prod file on the server should be:

Owned by root, readable only by the service user (chmod 640)
Never committed to source control
Populated from the secret manager during deployment

Mounted Secret Files

Kubernetes Secrets can be mounted as files — useful for TLS certificates and multi-line secrets:

spec:
  volumes:
  - name: tls-certs
    secret:
      secretName: myapp-tls
  containers:
  - name: app
    volumeMounts:
    - name: tls-certs
      mountPath: /etc/certs
      readOnly: true

Certificate Management

TLS certificates are time-limited secrets with hard expiry. An expired certificate causes immediate, user-visible failures.

Automated Renewal with Certbot

# Install certbot
sudo apt install certbot python3-certbot-nginx

# Obtain certificate (Cloudflare DNS challenge for wildcard)
certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials ~/.cloudflare/credentials.ini \
  -d 'example.com' \
  -d '*.example.com'

# Renewal runs automatically via systemd timer
systemctl status certbot.timer

# Test renewal without actually renewing
certbot renew --dry-run

Kubernetes cert-manager

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: myapp-tls
spec:
  secretName: myapp-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - myapp.example.com

cert-manager automatically renews certificates 30 days before expiry and updates the Kubernetes Secret, which triggers a rolling restart of pods mounting the certificate.

Rotation Strategy

Zero-Downtime Rotation

Rotating a secret requires both old and new values to be valid simultaneously during the transition — the dual-key period:

Phase 1: Add new secret alongside old (both valid)
Phase 2: Deploy new application version that reads new secret
Phase 3: Verify all instances use new secret
Phase 4: Invalidate old secret
Phase 5: Remove old secret from store

For API keys, this means having the provider support two simultaneous active keys during rotation. AWS IAM supports two access keys per user for this reason.

Emergency Revocation

If a secret is compromised, the priority is revocation speed:

# AWS: immediately disable a key
aws iam update-access-key \
  --access-key-id AKIA... \
  --status Inactive

# Vault: revoke a specific secret lease
vault lease revoke lease/id/here

# Vault: revoke all leases for a path (emergency)
vault lease revoke -prefix secret/myapp/production

Audit Logging

Every production secret store should have audit logging enabled:

# Vault audit log
vault audit enable file file_path=/var/log/vault/audit.log

# AWS CloudTrail (enabled by default in most accounts)
aws cloudtrail describe-trails

Audit logs should capture: who accessed the secret, when, from which IP/role, and whether the access was granted or denied. Set up alerts for unusual access patterns — off-hours access, unexpected IP ranges, or bulk reads.

A summary of secret management options:

Tool	Best For	Cost	Rotation
HashiCorp Vault	Large teams, dynamic secrets	OSS free, Enterprise paid	Automated
AWS Secrets Manager	AWS-native apps	$0.40/secret/month	Lambda-based
1Password Teams	Developer teams	Per-user SaaS	Manual
Doppler	Simple SaaS option	Free tier	Manual + webhooks
Kubernetes Secrets	K8s deployments	Free (built-in)	cert-manager