$ .cursor/rules/ terraform.mdc # globs: ["**/*.tf", "**/*.tfvars"] -weight: 500;">docker.mdc # globs: ["**/Dockerfile*", "**/-weight: 500;">docker-compose*.yml"] kubernetes.mdc # globs: ["k8s/**/*.yaml", "helm/**/*.yaml"] github-actions.mdc # globs: [".github/workflows/*.yml"] monitoring.mdc # globs: ["**/prometheus*.yml", "**/alerts/*.yaml"]
.cursor/rules/ terraform.mdc # globs: ["**/*.tf", "**/*.tfvars"] -weight: 500;">docker.mdc # globs: ["**/Dockerfile*", "**/-weight: 500;">docker-compose*.yml"] kubernetes.mdc # globs: ["k8s/**/*.yaml", "helm/**/*.yaml"] github-actions.mdc # globs: [".github/workflows/*.yml"] monitoring.mdc # globs: ["**/prometheus*.yml", "**/alerts/*.yaml"]
.cursor/rules/ terraform.mdc # globs: ["**/*.tf", "**/*.tfvars"] -weight: 500;">docker.mdc # globs: ["**/Dockerfile*", "**/-weight: 500;">docker-compose*.yml"] kubernetes.mdc # globs: ["k8s/**/*.yaml", "helm/**/*.yaml"] github-actions.mdc # globs: [".github/workflows/*.yml"] monitoring.mdc # globs: ["**/prometheus*.yml", "**/alerts/*.yaml"]
- Remote backend required. S3 with DynamoDB lock for AWS; GCS with terraform-lock for GCP; Azure Blob for Azure. No `local` backend in committed configs, ever.
- Backend config NEVER contains secrets. Use `-backend-config=...` at init time, or environment vars.
- State files are encrypted at rest (SSE-KMS) and access-logged.
- Every resource has a `tags` block (owner, environment, cost-center).
- Destructive changes (delete, replace) require a separate PR labeled `destructive:`.
- Remote backend required. S3 with DynamoDB lock for AWS; GCS with terraform-lock for GCP; Azure Blob for Azure. No `local` backend in committed configs, ever.
- Backend config NEVER contains secrets. Use `-backend-config=...` at init time, or environment vars.
- State files are encrypted at rest (SSE-KMS) and access-logged.
- Every resource has a `tags` block (owner, environment, cost-center).
- Destructive changes (delete, replace) require a separate PR labeled `destructive:`.
- Remote backend required. S3 with DynamoDB lock for AWS; GCS with terraform-lock for GCP; Azure Blob for Azure. No `local` backend in committed configs, ever.
- Backend config NEVER contains secrets. Use `-backend-config=...` at init time, or environment vars.
- State files are encrypted at rest (SSE-KMS) and access-logged.
- Every resource has a `tags` block (owner, environment, cost-center).
- Destructive changes (delete, replace) require a separate PR labeled `destructive:`.
- FROM pins a digest or a specific minor version: FROM node:20.11.1-alpine@sha256:abc... NOT: FROM node:latest or FROM node:20
- Multi-stage: `builder` has compilers/dev-deps, `runtime` is minimal. Final stage is distroless, -alpine, or -slim.
- RUN `USER app` (or numeric UID ≥ 10000) before CMD. Never run as root.
- .dockerignore excludes .-weight: 500;">git, node_modules, .env, tests, docs.
- COPY only what's needed; never `COPY . .` into the runtime stage.
- HEALTHCHECK and STOPSIGNAL SIGTERM on every -weight: 500;">service image.
- Image is scanned (Trivy/Grype) and signed (cosign) in CI.
- FROM pins a digest or a specific minor version: FROM node:20.11.1-alpine@sha256:abc... NOT: FROM node:latest or FROM node:20
- Multi-stage: `builder` has compilers/dev-deps, `runtime` is minimal. Final stage is distroless, -alpine, or -slim.
- RUN `USER app` (or numeric UID ≥ 10000) before CMD. Never run as root.
- .dockerignore excludes .-weight: 500;">git, node_modules, .env, tests, docs.
- COPY only what's needed; never `COPY . .` into the runtime stage.
- HEALTHCHECK and STOPSIGNAL SIGTERM on every -weight: 500;">service image.
- Image is scanned (Trivy/Grype) and signed (cosign) in CI.
- FROM pins a digest or a specific minor version: FROM node:20.11.1-alpine@sha256:abc... NOT: FROM node:latest or FROM node:20
- Multi-stage: `builder` has compilers/dev-deps, `runtime` is minimal. Final stage is distroless, -alpine, or -slim.
- RUN `USER app` (or numeric UID ≥ 10000) before CMD. Never run as root.
- .dockerignore excludes .-weight: 500;">git, node_modules, .env, tests, docs.
- COPY only what's needed; never `COPY . .` into the runtime stage.
- HEALTHCHECK and STOPSIGNAL SIGTERM on every -weight: 500;">service image.
- Image is scanned (Trivy/Grype) and signed (cosign) in CI.
Every Deployment / StatefulSet / DaemonSet manifest has: resources: requests: { cpu: "100m", memory: "128Mi" } # scheduling limits: { cpu: "500m", memory: "512Mi" } # cgroup cap securityContext: runAsNonRoot: true runAsUser: 10001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: { drop: [ALL] } livenessProbe: { httpGet: { path: /healthz, port: http }, periodSeconds: 10 }
readinessProbe: { httpGet: { path: /ready, port: http }, periodSeconds: 5 } No `image: myapp:latest` — always a pinned digest or SHA tag.
Every namespace has a NetworkPolicy; default deny + explicit allow.
Every Deployment / StatefulSet / DaemonSet manifest has: resources: requests: { cpu: "100m", memory: "128Mi" } # scheduling limits: { cpu: "500m", memory: "512Mi" } # cgroup cap securityContext: runAsNonRoot: true runAsUser: 10001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: { drop: [ALL] } livenessProbe: { httpGet: { path: /healthz, port: http }, periodSeconds: 10 }
readinessProbe: { httpGet: { path: /ready, port: http }, periodSeconds: 5 } No `image: myapp:latest` — always a pinned digest or SHA tag.
Every namespace has a NetworkPolicy; default deny + explicit allow.
Every Deployment / StatefulSet / DaemonSet manifest has: resources: requests: { cpu: "100m", memory: "128Mi" } # scheduling limits: { cpu: "500m", memory: "512Mi" } # cgroup cap securityContext: runAsNonRoot: true runAsUser: 10001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: { drop: [ALL] } livenessProbe: { httpGet: { path: /healthz, port: http }, periodSeconds: 10 }
readinessProbe: { httpGet: { path: /ready, port: http }, periodSeconds: 5 } No `image: myapp:latest` — always a pinned digest or SHA tag.
Every namespace has a NetworkPolicy; default deny + explicit allow.
- Third-party actions are pinned by commit SHA, not tag: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 NOT: uses: actions/checkout@v4
- `permissions:` declared at workflow OR job level; default to `contents: read`. Only grant write to the specific job that needs it.
- NEVER `permissions: write-all`.
- Use `id-token: write` + OIDC federation to AWS/GCP/Azure. No long- lived PATs. No access keys in `secrets`.
- Every `run:` step with untrusted input (issue titles, PR bodies) uses an env var, not ${{ ... }} inline (script injection risk).
- Third-party actions are pinned by commit SHA, not tag: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 NOT: uses: actions/checkout@v4
- `permissions:` declared at workflow OR job level; default to `contents: read`. Only grant write to the specific job that needs it.
- NEVER `permissions: write-all`.
- Use `id-token: write` + OIDC federation to AWS/GCP/Azure. No long- lived PATs. No access keys in `secrets`.
- Every `run:` step with untrusted input (issue titles, PR bodies) uses an env var, not ${{ ... }} inline (script injection risk).
- Third-party actions are pinned by commit SHA, not tag: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 NOT: uses: actions/checkout@v4
- `permissions:` declared at workflow OR job level; default to `contents: read`. Only grant write to the specific job that needs it.
- NEVER `permissions: write-all`.
- Use `id-token: write` + OIDC federation to AWS/GCP/Azure. No long- lived PATs. No access keys in `secrets`.
- Every `run:` step with untrusted input (issue titles, PR bodies) uses an env var, not ${{ ... }} inline (script injection risk).
# BAD — ${{ github.event.issue.title }} is interpolated into shell
- run: echo "New issue: ${{ github.event.issue.title }}" # GOOD — passed via env, shell sees a literal string
- env: TITLE: ${{ github.event.issue.title }} run: echo "New issue: $TITLE"
# BAD — ${{ github.event.issue.title }} is interpolated into shell
- run: echo "New issue: ${{ github.event.issue.title }}" # GOOD — passed via env, shell sees a literal string
- env: TITLE: ${{ github.event.issue.title }} run: echo "New issue: $TITLE"
# BAD — ${{ github.event.issue.title }} is interpolated into shell
- run: echo "New issue: ${{ github.event.issue.title }}" # GOOD — passed via env, shell sees a literal string
- env: TITLE: ${{ github.event.issue.title }} run: echo "New issue: $TITLE"
Every infra PR has `terraform plan` (or `helm diff`, `-weight: 500;">kubectl diff`)
output posted as a comment by CI: - Added resources (green). - Changed resources (yellow). - Replaced / destroyed resources (red, requires extra review). PRs with red output require: 1. A second reviewer. 2. The PR description explaining the blast radius. 3. A rollback plan. Plan is run in a CI job with read-only credentials; apply runs
separately with elevated privileges only after merge + manual
approval.
Every infra PR has `terraform plan` (or `helm diff`, `-weight: 500;">kubectl diff`)
output posted as a comment by CI: - Added resources (green). - Changed resources (yellow). - Replaced / destroyed resources (red, requires extra review). PRs with red output require: 1. A second reviewer. 2. The PR description explaining the blast radius. 3. A rollback plan. Plan is run in a CI job with read-only credentials; apply runs
separately with elevated privileges only after merge + manual
approval.
Every infra PR has `terraform plan` (or `helm diff`, `-weight: 500;">kubectl diff`)
output posted as a comment by CI: - Added resources (green). - Changed resources (yellow). - Replaced / destroyed resources (red, requires extra review). PRs with red output require: 1. A second reviewer. 2. The PR description explaining the blast radius. 3. A rollback plan. Plan is run in a CI job with read-only credentials; apply runs
separately with elevated privileges only after merge + manual
approval.
Alerts fire on user-facing symptoms: - Latency: p95 response time > SLO for 5 min. - Errors: error rate > SLO for 5 min. - Saturation: queue length / dropped messages (the thing that will cause the symptom if it continues). Alerts NEVER fire on: - CPU/memory utilization (capacity-plan, don't page). - Disk > 80% alone (rate of fill matters, not snapshot). - Pod -weight: 500;">restart count alone (restarts are normal). Every alert has a runbook link in its annotation: annotations: runbook: "https://runbooks.internal/<alert-name>" summary: "p95 latency {{ $value }}s exceeds 500ms SLO"
Alerts fire on user-facing symptoms: - Latency: p95 response time > SLO for 5 min. - Errors: error rate > SLO for 5 min. - Saturation: queue length / dropped messages (the thing that will cause the symptom if it continues). Alerts NEVER fire on: - CPU/memory utilization (capacity-plan, don't page). - Disk > 80% alone (rate of fill matters, not snapshot). - Pod -weight: 500;">restart count alone (restarts are normal). Every alert has a runbook link in its annotation: annotations: runbook: "https://runbooks.internal/<alert-name>" summary: "p95 latency {{ $value }}s exceeds 500ms SLO"
Alerts fire on user-facing symptoms: - Latency: p95 response time > SLO for 5 min. - Errors: error rate > SLO for 5 min. - Saturation: queue length / dropped messages (the thing that will cause the symptom if it continues). Alerts NEVER fire on: - CPU/memory utilization (capacity-plan, don't page). - Disk > 80% alone (rate of fill matters, not snapshot). - Pod -weight: 500;">restart count alone (restarts are normal). Every alert has a runbook link in its annotation: annotations: runbook: "https://runbooks.internal/<alert-name>" summary: "p95 latency {{ $value }}s exceeds 500ms SLO"
Every deployment pipeline has a matching rollback pipeline, tested: - Kubernetes: `-weight: 500;">kubectl rollout undo deployment/<name>` works because the previous ReplicaSet is still there (revisionHistoryLimit >= 5). - Terraform: previous state is versioned in S3; rollback = apply the prior state snapshot, reviewed like any other apply. - Docker/serverless: previous image tag is always deployable; aliases/traffic-splitting allow 0→100% rollback in one command. Before any release is marked "done," the rollback has been dry-run
at least once on a staging environment.
Every deployment pipeline has a matching rollback pipeline, tested: - Kubernetes: `-weight: 500;">kubectl rollout undo deployment/<name>` works because the previous ReplicaSet is still there (revisionHistoryLimit >= 5). - Terraform: previous state is versioned in S3; rollback = apply the prior state snapshot, reviewed like any other apply. - Docker/serverless: previous image tag is always deployable; aliases/traffic-splitting allow 0→100% rollback in one command. Before any release is marked "done," the rollback has been dry-run
at least once on a staging environment.
Every deployment pipeline has a matching rollback pipeline, tested: - Kubernetes: `-weight: 500;">kubectl rollout undo deployment/<name>` works because the previous ReplicaSet is still there (revisionHistoryLimit >= 5). - Terraform: previous state is versioned in S3; rollback = apply the prior state snapshot, reviewed like any other apply. - Docker/serverless: previous image tag is always deployable; aliases/traffic-splitting allow 0→100% rollback in one command. Before any release is marked "done," the rollback has been dry-run
at least once on a staging environment.
---
description: DevOps baseline applied to infra, CI, and container files.
globs: - "**/*.tf" - "**/Dockerfile*" - "k8s/**/*.yaml" - ".github/workflows/*.yml"
alwaysApply: false
--- # Non-negotiables (security + reliability)
- Never commit secrets, PATs, keys, or state files.
- Pin everything: images by digest, actions by SHA, modules by version.
- Least privilege by default: IAM, GH permissions, K8s RBAC.
- Non-root containers, read-only rootfs, dropped capabilities.
- Resource requests AND limits on every workload.
- Remote Terraform state with lock + encryption. # Operability
- Every -weight: 500;">service has liveness, readiness, and a /healthz endpoint.
- Every alert fires on symptoms, has a runbook, and is actionable.
- Every deployment has a tested rollback.
---
description: DevOps baseline applied to infra, CI, and container files.
globs: - "**/*.tf" - "**/Dockerfile*" - "k8s/**/*.yaml" - ".github/workflows/*.yml"
alwaysApply: false
--- # Non-negotiables (security + reliability)
- Never commit secrets, PATs, keys, or state files.
- Pin everything: images by digest, actions by SHA, modules by version.
- Least privilege by default: IAM, GH permissions, K8s RBAC.
- Non-root containers, read-only rootfs, dropped capabilities.
- Resource requests AND limits on every workload.
- Remote Terraform state with lock + encryption. # Operability
- Every -weight: 500;">service has liveness, readiness, and a /healthz endpoint.
- Every alert fires on symptoms, has a runbook, and is actionable.
- Every deployment has a tested rollback.
---
description: DevOps baseline applied to infra, CI, and container files.
globs: - "**/*.tf" - "**/Dockerfile*" - "k8s/**/*.yaml" - ".github/workflows/*.yml"
alwaysApply: false
--- # Non-negotiables (security + reliability)
- Never commit secrets, PATs, keys, or state files.
- Pin everything: images by digest, actions by SHA, modules by version.
- Least privilege by default: IAM, GH permissions, K8s RBAC.
- Non-root containers, read-only rootfs, dropped capabilities.
- Resource requests AND limits on every workload.
- Remote Terraform state with lock + encryption. # Operability
- Every -weight: 500;">service has liveness, readiness, and a /healthz endpoint.
- Every alert fires on symptoms, has a runbook, and is actionable.
- Every deployment has a tested rollback.