Tools: Stop Putting Credentials in Environment Variables: Secret Management for DevOps Teams

Tools: Stop Putting Credentials in Environment Variables: Secret Management for DevOps Teams

Stop Putting Credentials in Environment Variables

The Env Var Illusion

The Incident: 11 Seconds From Disaster

The Secret Management Stack for Production

Layer 1: HashiCorp Vault (or Your Cloud Provider's Equivalent)

Layer 2: Vault Agent Sidecar (Dynamic Injection)

Layer 3: Automatic Rotation

Layer 4: Secret Scanning (Defense in Depth)

Migration Path: Env Vars to Vault in 4 Steps

Cost: Less Than You Think

Common Mistakes During Migration

Secrets in Multi-Cloud Environments

Frequently Asked Questions

Related Reading Environment variables aren't secret management. They're secret broadcasting. Here's what production teams actually use. Every "Getting Started" tutorial ends the same way: It works. It's simple. And it's a ticking time bomb. Environment variables are visible to every process in the container. They show up in docker inspect. They appear in crash dumps. They get logged by overeager monitoring tools. They persist in shell history. They get committed to .env files that end up in git history. We run 84 containers in production. After a near-miss incident where a debug log accidentally captured an AWS key from os.environ, we rebuilt our entire secrets pipeline. Here's the production-grade approach that replaced env vars — and the incident that convinced us. A developer added debug logging to trace a connection timeout: That log line captured every environment variable — including AWS_SECRET_ACCESS_KEY, DATABASE_URL with embedded credentials, and our Stripe API key. The logs shipped to our centralized logging stack (Loki), which is accessible to the entire engineering team. Our secret scanner (trufflehog running as a pre-commit hook + a post-deploy log scanner) caught it in 11 seconds. The alert fired, and our automated rotation script revoked the AWS key and issued a new one before any human saw the log entry. If we hadn't had that scanner? The credentials would have been sitting in Loki for anyone with dashboard access to find. And Loki retains logs for 30 days. This is the fundamental problem with env vars: they're ambient. Any code running in the process can read them, and there's no audit trail of who accessed what. Vault is the source of truth for all secrets. Every credential lives in Vault. Nothing lives in env vars, .env files, or Kubernetes secrets (which are just base64-encoded, not encrypted). Each service gets its own Vault policy with least-privilege access. The API service can read API secrets and shared database credentials. It cannot read billing secrets. This is impossible with env vars — there's no access control. For teams not ready for Vault's operational overhead: Instead of injecting secrets at container startup, Vault Agent runs as a sidecar and writes secrets to a tmpfs volume that only the application can read: The application reads secrets from /run/secrets/database.json: Why tmpfs? Secrets live only in memory. They're never written to disk. Container restart = secrets re-fetched from Vault. If the container is compromised, the attacker gets the current secret — but they can't persist it across restarts, and Vault's audit log shows the access. Static secrets are a liability. We rotate database credentials every 24 hours using Vault's database secrets engine: The Vault Agent sidecar detects when credentials are about to expire and fetches new ones. The application picks up the new credentials without restarting — we use a file watcher that reloads the database connection pool: 24-hour rotation means even if a credential leaks, it's useless within 24 hours. Compare this to env vars, where the same DATABASE_URL might live unchanged for months. Despite all the above, secrets still leak. A developer hardcodes a test credential. An error message includes a connection string. A log line captures more than intended. We run detection at three levels: The runtime scanner is the last line of defense — and it's the one that caught our incident in 11 seconds. You don't have to migrate everything at once. Week 1: Install Vault (single-node is fine to start). Migrate your 3 most sensitive secrets: database credentials, cloud provider keys, payment provider tokens. Week 2: Set up Vault Agent sidecars for production services. Keep env vars as fallback — the application checks /run/secrets/ first, falls back to os.environ. Week 3: Enable dynamic database credentials. This is the biggest security win — every service gets unique, short-lived credentials. Week 4: Remove env var fallback. Enable secret scanning in CI. Celebrate. For teams in Latin America and Mexico where the nearshoring boom means rapid team scaling, this migration path is especially important. New developers joining frequently means more potential for accidental credential exposure. Vault's audit log gives you visibility that env vars never will. Compare that to the cost of a single credential breach. IBM's 2025 Cost of a Data Breach report puts the average at $4.88M. Even for a startup, a leaked AWS key can generate a $50K bill in hours from cryptomining. $20/month for secret management vs $50K+ for a breach. The math works. Teams migrating from env vars to Vault make predictable mistakes. Here are the ones we see most often. Mistake 1: Big-bang migration. Trying to move all 50 secrets to Vault in one weekend. Something breaks, nobody can debug it because nobody knows Vault yet, and the team rolls back to env vars forever. Use the 4-week phased approach above. Start with 3 secrets. Build muscle memory. Mistake 2: Vault as a single point of failure. Vault OSS runs as a single node by default. If it goes down, no service can fetch secrets. Solution: either run Vault in HA mode (3 nodes minimum) or implement a local cache. Vault Agent caches secrets locally — if the Vault server is temporarily unreachable, services continue using cached credentials until they expire. Mistake 3: Not testing secret rotation under load. Rotation works perfectly in staging. In production, when 40 services simultaneously try to reconnect with new credentials, your database connection pool explodes. Test rotation during peak load, not during a quiet maintenance window. We discovered this the hard way at 2pm on a Tuesday. Mistake 4: Forgetting CI/CD pipelines. Your application services now use Vault, but your CI/CD pipeline still has secrets in GitHub Actions secrets or environment variables. CI secrets are a common blind spot — and they're especially dangerous because CI logs are often more widely accessible than production logs. Use Vault's AppRole auth or GitHub's OIDC integration to fetch CI secrets dynamically. Mistake 5: Not securing the Vault unsealing process. Vault starts sealed. Someone needs to unseal it after every restart. If you store unseal keys in a .txt file on the same server (we've seen this), you've replaced one insecure pattern with another. Use auto-unseal with a cloud KMS (AWS KMS, GCP Cloud KMS) or Shamir's Secret Sharing with keys distributed to 3+ team members. If you're running services across multiple cloud providers — a pattern we analyze in our multi-cloud pitfalls guide — secret management gets significantly harder. Each cloud has its own secrets service with its own API, access control model, and rotation mechanism. Running Vault as a unified secrets layer across all clouds is one of the few genuinely good reasons to add a cloud-agnostic tool to your stack. Vault authenticates each cloud's services using their native identity mechanisms (AWS IAM roles, GCP service accounts, Azure Managed Identities) and provides a single API for secret retrieval regardless of where the service runs. This is one of the cases where the build vs buy framework clearly points to "buy" (or rather, "adopt open-source"): building a cross-cloud secrets layer is never core to your product, and the mature solution already exists. Q: Can't I just encrypt my .env files and call it secure? Encrypted .env files are better than plaintext, but they still have fundamental problems: the decrypted values end up in memory as environment variables (back to square one), there's no access control (any process can read them), and there's no audit trail. It's a band-aid, not a solution. Q: What about Docker secrets (docker secret create)? Docker Swarm secrets are better than env vars — they're stored encrypted and mounted as files. But they're limited to Docker Swarm orchestration, they don't rotate automatically, and there's no access control granularity. If you're already on Swarm and not ready for Vault, they're a reasonable intermediate step. For Kubernetes, the native Secrets resource is base64-encoded (not encrypted at rest by default) — use the Vault CSI provider or sealed-secrets instead. Q: We're a 3-person startup. Is Vault overkill? For a 3-person team, yes — Vault's operational overhead isn't justified yet. Use your cloud provider's native secrets service (AWS Secrets Manager, GCP Secret Manager) with IAM-based access control. It's $0.40/secret/month, zero operational overhead, and leagues better than env vars. Graduate to Vault when you cross 20+ services or need cross-cloud support. We help DevOps teams audit their secret management practices and migrate from env vars to production-grade solutions. Get a free security audit → Subscribe to our newsletter for weekly deep-dives into production security practices. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

export DATABASE_URL=postgres://admin:supersecret@db:5432/prod export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY docker-compose up export DATABASE_URL=postgres://admin:supersecret@db:5432/prod export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY docker-compose up export DATABASE_URL=postgres://admin:supersecret@db:5432/prod export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY docker-compose up logger.debug(f"Connection config: {dict(os.environ)}") logger.debug(f"Connection config: {dict(os.environ)}") logger.debug(f"Connection config: {dict(os.environ)}") # Vault policy for the API service path "secret/data/api/*" { capabilities = ["read"] } path "secret/data/shared/database" { capabilities = ["read"] } # No access to other services' secrets path "secret/data/billing/*" { capabilities = ["deny"] } # Vault policy for the API service path "secret/data/api/*" { capabilities = ["read"] } path "secret/data/shared/database" { capabilities = ["read"] } # No access to other services' secrets path "secret/data/billing/*" { capabilities = ["deny"] } # Vault policy for the API service path "secret/data/api/*" { capabilities = ["read"] } path "secret/data/shared/database" { capabilities = ["read"] } # No access to other services' secrets path "secret/data/billing/*" { capabilities = ["deny"] } # docker-compose.yml services: api: image: registry.local/api:latest volumes: - secrets-vol:/run/secrets:ro depends_on: - vault-agent vault-agent: image: hashicorp/vault:latest command: vault agent -config=/etc/vault-agent.hcl volumes: - secrets-vol:/run/secrets volumes: secrets-vol: driver: local driver_opts: type: tmpfs device: tmpfs # docker-compose.yml services: api: image: registry.local/api:latest volumes: - secrets-vol:/run/secrets:ro depends_on: - vault-agent vault-agent: image: hashicorp/vault:latest command: vault agent -config=/etc/vault-agent.hcl volumes: - secrets-vol:/run/secrets volumes: secrets-vol: driver: local driver_opts: type: tmpfs device: tmpfs # docker-compose.yml services: api: image: registry.local/api:latest volumes: - secrets-vol:/run/secrets:ro depends_on: - vault-agent vault-agent: image: hashicorp/vault:latest command: vault agent -config=/etc/vault-agent.hcl volumes: - secrets-vol:/run/secrets volumes: secrets-vol: driver: local driver_opts: type: tmpfs device: tmpfs import json from pathlib import Path def get_db_url(): secret = json.loads(Path("/run/secrets/database.json").read_text()) return f"postgres://{secret['username']}:{secret['password']}@{secret['host']}:{secret['port']}/{secret['dbname']}" import json from pathlib import Path def get_db_url(): secret = json.loads(Path("/run/secrets/database.json").read_text()) return f"postgres://{secret['username']}:{secret['password']}@{secret['host']}:{secret['port']}/{secret['dbname']}" import json from pathlib import Path def get_db_url(): secret = json.loads(Path("/run/secrets/database.json").read_text()) return f"postgres://{secret['username']}:{secret['password']}@{secret['host']}:{secret['port']}/{secret['dbname']}" # Vault database secrets engine config resource "vault_database_secret_backend_role" "api_db" { name = "api-readonly" backend = "database" creation_statements = [ "CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';", "GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" ] default_ttl = "24h" max_ttl = "48h" } # Vault database secrets engine config resource "vault_database_secret_backend_role" "api_db" { name = "api-readonly" backend = "database" creation_statements = [ "CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';", "GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" ] default_ttl = "24h" max_ttl = "48h" } # Vault database secrets engine config resource "vault_database_secret_backend_role" "api_db" { name = "api-readonly" backend = "database" creation_statements = [ "CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';", "GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" ] default_ttl = "24h" max_ttl = "48h" } from watchdog.observers import Observer from watchdog.events import FileModifiedEvent class SecretReloader: def on_modified(self, event): if event.src_path == "/run/secrets/database.json": self.reconnect_database() from watchdog.observers import Observer from watchdog.events import FileModifiedEvent class SecretReloader: def on_modified(self, event): if event.src_path == "/run/secrets/database.json": self.reconnect_database() from watchdog.observers import Observer from watchdog.events import FileModifiedEvent class SecretReloader: def on_modified(self, event): if event.src_path == "/run/secrets/database.json": self.reconnect_database() # .pre-commit-config.yaml repos: - repo: https://github.com/trufflesecurity/trufflehog rev: v3.63.0 hooks: - id: trufflehog entry: trufflehog git file://. --only-verified --fail # .pre-commit-config.yaml repos: - repo: https://github.com/trufflesecurity/trufflehog rev: v3.63.0 hooks: - id: trufflehog entry: trufflehog git file://. --only-verified --fail # .pre-commit-config.yaml repos: - repo: https://github.com/trufflesecurity/trufflehog rev: v3.63.0 hooks: - id: trufflehog entry: trufflehog git file://. --only-verified --fail # vault-agent cache configuration cache { use_auto_auth_token = true } listener "tcp" { address = "127.0.0.1:8200" tls_disable = true } # vault-agent cache configuration cache { use_auto_auth_token = true } listener "tcp" { address = "127.0.0.1:8200" tls_disable = true } # vault-agent cache configuration cache { use_auto_auth_token = true } listener "tcp" { address = "127.0.0.1:8200" tls_disable = true } [AWS services] → Vault (central) ← [GCP services] ↑ [Azure services] [AWS services] → Vault (central) ← [GCP services] ↑ [Azure services] [AWS services] → Vault (central) ← [GCP services] ↑ [Azure services] - AWS: Use Secrets Manager + IAM roles (not env vars, not Parameter Store for secrets) - GCP: Use Secret Manager + Workload Identity - Azure: Use Key Vault + Managed Identities - Pre-commit: trufflehog scans every commit before it's pushed - CI: gitleaks runs on every PR - Runtime: a log scanner watches Loki for patterns matching credentials - Vault OSS: Free. Runs on a single VM. - Vault Enterprise (HA + namespaces): $0.03/hour per node - AWS Secrets Manager: $0.40/secret/month + $0.05 per 10K API calls - Our setup (Vault OSS + 1 VM): ~$20/month total - CI/CD Pipeline Optimization — securing secrets in fast CI pipelines - Multi-Cloud Strategy Pitfalls — why cross-cloud secret management is one of the hidden costs - Self-Hosted LLMs vs API — securing API keys and model credentials at scale