Tools: Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

Tools: Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

How Cursor Rules Work for Infrastructure Code

Rule 1: Terraform — State, Backends, and No local State

Rule 2: Docker — Multi-Stage, Non-Root, Pinned, Small

Rule 4: GitHub Actions — Pinned by SHA, Least Privilege, No PAT Sprawl

Rule 5: Infrastructure as Code Review — Plan Output in Every PR

Rule 6: Monitoring — Alerts on Symptoms, Not Causes

Rule 7: Rollback First, Deploy Second

The DevOps Cursor Setup — Quick Start

Want the full DevOps pack? Ask Cursor for a Dockerfile and you get FROM node:latest, COPY . . before the install, CMD npm start running as root. Ask for a Terraform module and you get aws_s3_bucket with no versioning, no encryption, no block-public-access. Ask for a GitHub Actions workflow and you get uses: actions/checkout@v4 (unpinned), with: { token: ${{ secrets.PAT }} } exposed to every job, and permissions: write-all at the top. Ask for a Kubernetes Deployment and you get no resource limits, no liveness probe, and imagePullPolicy: Always pointing at the :latest tag. Every one of those defaults is what pages someone at 3am. DevOps is the discipline where mistakes don't fail locally — they fail in production, under load, with a blast radius. AI assistants trained on a decade of public infrastructure code have seen every anti-pattern, and they will happily reproduce them. Seven rules below for Terraform, Docker, Kubernetes, GitHub Actions, and monitoring — each with the failure mode and the fix. Use .cursor/rules/*.mdc with globs targeting infra file patterns: alwaysApply: false for all — these should fire only when the relevant files are open. Now the rules. The default Terraform quick-start stores state in terraform.tfstate next to your .tf files. Commit it by accident, and anyone who clones the repo sees every secret, every resource ID, every aws_access_key. Don't commit it, and two engineers run terraform apply on different machines and overwrite each other's changes. Before: backend "local" in committed code, terraform apply from a dev laptop.

After: S3 backend with DynamoDB lock, CI is the only thing that runs apply, and a failing terraform plan blocks the PR. The five Docker antipatterns Cursor reproduces by default: FROM <lang>:latest, single-stage builds with dev dependencies in production, running as root, COPY . . before install (so the layer cache invalidates on every code change), and no HEALTHCHECK. A production Node image following this pattern is 80–150 MB. The FROM node:latest single-stage equivalent is 1.4 GB. The small one deploys faster, has a smaller attack surface, and costs less to store. Cursor writes Kubernetes manifests with no resources.limits, no probes, and securityContext: {}. The pod runs, schedules anywhere, and OOMs under load; kubelet has no way to know it's unhealthy; the container runs as UID 0 with every Linux capability. Without limits, one pod evicts the entire node. Without probes, rolling updates route traffic to pods that aren't ready. Without a non-root securityContext, container escape is root-on-host. Every uses: actions/checkout@v4 is "whatever commit they decide to tag as v4 tomorrow." Every permissions: write-all gives every job write access to everything — code, packages, deployments. Every PAT in secrets is a long-lived credential with no expiry and no audit trail. Injection example Cursor will reproduce: Infra PRs merged without reviewing the plan are how you get "oh, it also recreated the database." The diff in .tf doesn't tell you the blast radius — only terraform plan does, and only for the exact current state. Never terraform apply from a laptop. Apply = CI only, after merge, with a human approval gate on destructive changes. Cursor writes Prometheus rules like alert: HighCPU expr: cpu_usage > 0.8. That's an alert on a cause, not a symptom. CPU at 80% doesn't matter if the service is meeting its SLO. Users care about latency and errors, not CPU. The 3am page should be actionable. "CPU high" isn't — it tells you nothing about what to do. Every deployment has a rollback plan. Cursor writes CD pipelines that deploy forward but have no scripted way back. When the deploy breaks prod, someone improvises under pressure. The test isn't "can we deploy?" The test is "can we un-deploy in 60 seconds?" .cursor/rules/devops-baseline.mdc: That's the spine. Cursor now writes Docker images you'd put in production, Terraform you'd actually apply, Kubernetes manifests that survive a node failure, and GitHub Actions workflows that don't leak your AWS keys to a third-party action's next release. We maintain a Cursor Rules pack with production-ready rules for Terraform, Docker, Kubernetes, Helm, GitHub Actions, Ansible, and Prometheus — every rule tested, pinned, and scoped so your AI-written infra looks like it came from an SRE, not from Stack Overflow circa 2018. Get the Cursor Rules pack on Gumroad → Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ .cursor/rules/ terraform.mdc # globs: ["**/*.tf", "**/*.tfvars"] -weight: 500;">docker.mdc # globs: ["**/Dockerfile*", "**/-weight: 500;">docker-compose*.yml"] kubernetes.mdc # globs: ["k8s/**/*.yaml", "helm/**/*.yaml"] github-actions.mdc # globs: [".github/workflows/*.yml"] monitoring.mdc # globs: ["**/prometheus*.yml", "**/alerts/*.yaml"] .cursor/rules/ terraform.mdc # globs: ["**/*.tf", "**/*.tfvars"] -weight: 500;">docker.mdc # globs: ["**/Dockerfile*", "**/-weight: 500;">docker-compose*.yml"] kubernetes.mdc # globs: ["k8s/**/*.yaml", "helm/**/*.yaml"] github-actions.mdc # globs: [".github/workflows/*.yml"] monitoring.mdc # globs: ["**/prometheus*.yml", "**/alerts/*.yaml"] .cursor/rules/ terraform.mdc # globs: ["**/*.tf", "**/*.tfvars"] -weight: 500;">docker.mdc # globs: ["**/Dockerfile*", "**/-weight: 500;">docker-compose*.yml"] kubernetes.mdc # globs: ["k8s/**/*.yaml", "helm/**/*.yaml"] github-actions.mdc # globs: [".github/workflows/*.yml"] monitoring.mdc # globs: ["**/prometheus*.yml", "**/alerts/*.yaml"] - Remote backend required. S3 with DynamoDB lock for AWS; GCS with terraform-lock for GCP; Azure Blob for Azure. No `local` backend in committed configs, ever. - Backend config NEVER contains secrets. Use `-backend-config=...` at init time, or environment vars. - State files are encrypted at rest (SSE-KMS) and access-logged. - Every resource has a `tags` block (owner, environment, cost-center). - Destructive changes (delete, replace) require a separate PR labeled `destructive:`. - Remote backend required. S3 with DynamoDB lock for AWS; GCS with terraform-lock for GCP; Azure Blob for Azure. No `local` backend in committed configs, ever. - Backend config NEVER contains secrets. Use `-backend-config=...` at init time, or environment vars. - State files are encrypted at rest (SSE-KMS) and access-logged. - Every resource has a `tags` block (owner, environment, cost-center). - Destructive changes (delete, replace) require a separate PR labeled `destructive:`. - Remote backend required. S3 with DynamoDB lock for AWS; GCS with terraform-lock for GCP; Azure Blob for Azure. No `local` backend in committed configs, ever. - Backend config NEVER contains secrets. Use `-backend-config=...` at init time, or environment vars. - State files are encrypted at rest (SSE-KMS) and access-logged. - Every resource has a `tags` block (owner, environment, cost-center). - Destructive changes (delete, replace) require a separate PR labeled `destructive:`. - FROM pins a digest or a specific minor version: FROM node:20.11.1-alpine@sha256:abc... NOT: FROM node:latest or FROM node:20 - Multi-stage: `builder` has compilers/dev-deps, `runtime` is minimal. Final stage is distroless, -alpine, or -slim. - RUN `USER app` (or numeric UID ≥ 10000) before CMD. Never run as root. - .dockerignore excludes .-weight: 500;">git, node_modules, .env, tests, docs. - COPY only what's needed; never `COPY . .` into the runtime stage. - HEALTHCHECK and STOPSIGNAL SIGTERM on every -weight: 500;">service image. - Image is scanned (Trivy/Grype) and signed (cosign) in CI. - FROM pins a digest or a specific minor version: FROM node:20.11.1-alpine@sha256:abc... NOT: FROM node:latest or FROM node:20 - Multi-stage: `builder` has compilers/dev-deps, `runtime` is minimal. Final stage is distroless, -alpine, or -slim. - RUN `USER app` (or numeric UID ≥ 10000) before CMD. Never run as root. - .dockerignore excludes .-weight: 500;">git, node_modules, .env, tests, docs. - COPY only what's needed; never `COPY . .` into the runtime stage. - HEALTHCHECK and STOPSIGNAL SIGTERM on every -weight: 500;">service image. - Image is scanned (Trivy/Grype) and signed (cosign) in CI. - FROM pins a digest or a specific minor version: FROM node:20.11.1-alpine@sha256:abc... NOT: FROM node:latest or FROM node:20 - Multi-stage: `builder` has compilers/dev-deps, `runtime` is minimal. Final stage is distroless, -alpine, or -slim. - RUN `USER app` (or numeric UID ≥ 10000) before CMD. Never run as root. - .dockerignore excludes .-weight: 500;">git, node_modules, .env, tests, docs. - COPY only what's needed; never `COPY . .` into the runtime stage. - HEALTHCHECK and STOPSIGNAL SIGTERM on every -weight: 500;">service image. - Image is scanned (Trivy/Grype) and signed (cosign) in CI. Every Deployment / StatefulSet / DaemonSet manifest has: resources: requests: { cpu: "100m", memory: "128Mi" } # scheduling limits: { cpu: "500m", memory: "512Mi" } # cgroup cap securityContext: runAsNonRoot: true runAsUser: 10001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: { drop: [ALL] } livenessProbe: { httpGet: { path: /healthz, port: http }, periodSeconds: 10 } readinessProbe: { httpGet: { path: /ready, port: http }, periodSeconds: 5 } No `image: myapp:latest` — always a pinned digest or SHA tag. Every namespace has a NetworkPolicy; default deny + explicit allow. Every Deployment / StatefulSet / DaemonSet manifest has: resources: requests: { cpu: "100m", memory: "128Mi" } # scheduling limits: { cpu: "500m", memory: "512Mi" } # cgroup cap securityContext: runAsNonRoot: true runAsUser: 10001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: { drop: [ALL] } livenessProbe: { httpGet: { path: /healthz, port: http }, periodSeconds: 10 } readinessProbe: { httpGet: { path: /ready, port: http }, periodSeconds: 5 } No `image: myapp:latest` — always a pinned digest or SHA tag. Every namespace has a NetworkPolicy; default deny + explicit allow. Every Deployment / StatefulSet / DaemonSet manifest has: resources: requests: { cpu: "100m", memory: "128Mi" } # scheduling limits: { cpu: "500m", memory: "512Mi" } # cgroup cap securityContext: runAsNonRoot: true runAsUser: 10001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: { drop: [ALL] } livenessProbe: { httpGet: { path: /healthz, port: http }, periodSeconds: 10 } readinessProbe: { httpGet: { path: /ready, port: http }, periodSeconds: 5 } No `image: myapp:latest` — always a pinned digest or SHA tag. Every namespace has a NetworkPolicy; default deny + explicit allow. - Third-party actions are pinned by commit SHA, not tag: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 NOT: uses: actions/checkout@v4 - `permissions:` declared at workflow OR job level; default to `contents: read`. Only grant write to the specific job that needs it. - NEVER `permissions: write-all`. - Use `id-token: write` + OIDC federation to AWS/GCP/Azure. No long- lived PATs. No access keys in `secrets`. - Every `run:` step with untrusted input (issue titles, PR bodies) uses an env var, not ${{ ... }} inline (script injection risk). - Third-party actions are pinned by commit SHA, not tag: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 NOT: uses: actions/checkout@v4 - `permissions:` declared at workflow OR job level; default to `contents: read`. Only grant write to the specific job that needs it. - NEVER `permissions: write-all`. - Use `id-token: write` + OIDC federation to AWS/GCP/Azure. No long- lived PATs. No access keys in `secrets`. - Every `run:` step with untrusted input (issue titles, PR bodies) uses an env var, not ${{ ... }} inline (script injection risk). - Third-party actions are pinned by commit SHA, not tag: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 NOT: uses: actions/checkout@v4 - `permissions:` declared at workflow OR job level; default to `contents: read`. Only grant write to the specific job that needs it. - NEVER `permissions: write-all`. - Use `id-token: write` + OIDC federation to AWS/GCP/Azure. No long- lived PATs. No access keys in `secrets`. - Every `run:` step with untrusted input (issue titles, PR bodies) uses an env var, not ${{ ... }} inline (script injection risk). # BAD — ${{ github.event.issue.title }} is interpolated into shell - run: echo "New issue: ${{ github.event.issue.title }}" # GOOD — passed via env, shell sees a literal string - env: TITLE: ${{ github.event.issue.title }} run: echo "New issue: $TITLE" # BAD — ${{ github.event.issue.title }} is interpolated into shell - run: echo "New issue: ${{ github.event.issue.title }}" # GOOD — passed via env, shell sees a literal string - env: TITLE: ${{ github.event.issue.title }} run: echo "New issue: $TITLE" # BAD — ${{ github.event.issue.title }} is interpolated into shell - run: echo "New issue: ${{ github.event.issue.title }}" # GOOD — passed via env, shell sees a literal string - env: TITLE: ${{ github.event.issue.title }} run: echo "New issue: $TITLE" Every infra PR has `terraform plan` (or `helm diff`, `-weight: 500;">kubectl diff`) output posted as a comment by CI: - Added resources (green). - Changed resources (yellow). - Replaced / destroyed resources (red, requires extra review). PRs with red output require: 1. A second reviewer. 2. The PR description explaining the blast radius. 3. A rollback plan. Plan is run in a CI job with read-only credentials; apply runs separately with elevated privileges only after merge + manual approval. Every infra PR has `terraform plan` (or `helm diff`, `-weight: 500;">kubectl diff`) output posted as a comment by CI: - Added resources (green). - Changed resources (yellow). - Replaced / destroyed resources (red, requires extra review). PRs with red output require: 1. A second reviewer. 2. The PR description explaining the blast radius. 3. A rollback plan. Plan is run in a CI job with read-only credentials; apply runs separately with elevated privileges only after merge + manual approval. Every infra PR has `terraform plan` (or `helm diff`, `-weight: 500;">kubectl diff`) output posted as a comment by CI: - Added resources (green). - Changed resources (yellow). - Replaced / destroyed resources (red, requires extra review). PRs with red output require: 1. A second reviewer. 2. The PR description explaining the blast radius. 3. A rollback plan. Plan is run in a CI job with read-only credentials; apply runs separately with elevated privileges only after merge + manual approval. Alerts fire on user-facing symptoms: - Latency: p95 response time > SLO for 5 min. - Errors: error rate > SLO for 5 min. - Saturation: queue length / dropped messages (the thing that will cause the symptom if it continues). Alerts NEVER fire on: - CPU/memory utilization (capacity-plan, don't page). - Disk > 80% alone (rate of fill matters, not snapshot). - Pod -weight: 500;">restart count alone (restarts are normal). Every alert has a runbook link in its annotation: annotations: runbook: "https://runbooks.internal/<alert-name>" summary: "p95 latency {{ $value }}s exceeds 500ms SLO" Alerts fire on user-facing symptoms: - Latency: p95 response time > SLO for 5 min. - Errors: error rate > SLO for 5 min. - Saturation: queue length / dropped messages (the thing that will cause the symptom if it continues). Alerts NEVER fire on: - CPU/memory utilization (capacity-plan, don't page). - Disk > 80% alone (rate of fill matters, not snapshot). - Pod -weight: 500;">restart count alone (restarts are normal). Every alert has a runbook link in its annotation: annotations: runbook: "https://runbooks.internal/<alert-name>" summary: "p95 latency {{ $value }}s exceeds 500ms SLO" Alerts fire on user-facing symptoms: - Latency: p95 response time > SLO for 5 min. - Errors: error rate > SLO for 5 min. - Saturation: queue length / dropped messages (the thing that will cause the symptom if it continues). Alerts NEVER fire on: - CPU/memory utilization (capacity-plan, don't page). - Disk > 80% alone (rate of fill matters, not snapshot). - Pod -weight: 500;">restart count alone (restarts are normal). Every alert has a runbook link in its annotation: annotations: runbook: "https://runbooks.internal/<alert-name>" summary: "p95 latency {{ $value }}s exceeds 500ms SLO" Every deployment pipeline has a matching rollback pipeline, tested: - Kubernetes: `-weight: 500;">kubectl rollout undo deployment/<name>` works because the previous ReplicaSet is still there (revisionHistoryLimit >= 5). - Terraform: previous state is versioned in S3; rollback = apply the prior state snapshot, reviewed like any other apply. - Docker/serverless: previous image tag is always deployable; aliases/traffic-splitting allow 0→100% rollback in one command. Before any release is marked "done," the rollback has been dry-run at least once on a staging environment. Every deployment pipeline has a matching rollback pipeline, tested: - Kubernetes: `-weight: 500;">kubectl rollout undo deployment/<name>` works because the previous ReplicaSet is still there (revisionHistoryLimit >= 5). - Terraform: previous state is versioned in S3; rollback = apply the prior state snapshot, reviewed like any other apply. - Docker/serverless: previous image tag is always deployable; aliases/traffic-splitting allow 0→100% rollback in one command. Before any release is marked "done," the rollback has been dry-run at least once on a staging environment. Every deployment pipeline has a matching rollback pipeline, tested: - Kubernetes: `-weight: 500;">kubectl rollout undo deployment/<name>` works because the previous ReplicaSet is still there (revisionHistoryLimit >= 5). - Terraform: previous state is versioned in S3; rollback = apply the prior state snapshot, reviewed like any other apply. - Docker/serverless: previous image tag is always deployable; aliases/traffic-splitting allow 0→100% rollback in one command. Before any release is marked "done," the rollback has been dry-run at least once on a staging environment. --- description: DevOps baseline applied to infra, CI, and container files. globs: - "**/*.tf" - "**/Dockerfile*" - "k8s/**/*.yaml" - ".github/workflows/*.yml" alwaysApply: false --- # Non-negotiables (security + reliability) - Never commit secrets, PATs, keys, or state files. - Pin everything: images by digest, actions by SHA, modules by version. - Least privilege by default: IAM, GH permissions, K8s RBAC. - Non-root containers, read-only rootfs, dropped capabilities. - Resource requests AND limits on every workload. - Remote Terraform state with lock + encryption. # Operability - Every -weight: 500;">service has liveness, readiness, and a /healthz endpoint. - Every alert fires on symptoms, has a runbook, and is actionable. - Every deployment has a tested rollback. --- description: DevOps baseline applied to infra, CI, and container files. globs: - "**/*.tf" - "**/Dockerfile*" - "k8s/**/*.yaml" - ".github/workflows/*.yml" alwaysApply: false --- # Non-negotiables (security + reliability) - Never commit secrets, PATs, keys, or state files. - Pin everything: images by digest, actions by SHA, modules by version. - Least privilege by default: IAM, GH permissions, K8s RBAC. - Non-root containers, read-only rootfs, dropped capabilities. - Resource requests AND limits on every workload. - Remote Terraform state with lock + encryption. # Operability - Every -weight: 500;">service has liveness, readiness, and a /healthz endpoint. - Every alert fires on symptoms, has a runbook, and is actionable. - Every deployment has a tested rollback. --- description: DevOps baseline applied to infra, CI, and container files. globs: - "**/*.tf" - "**/Dockerfile*" - "k8s/**/*.yaml" - ".github/workflows/*.yml" alwaysApply: false --- # Non-negotiables (security + reliability) - Never commit secrets, PATs, keys, or state files. - Pin everything: images by digest, actions by SHA, modules by version. - Least privilege by default: IAM, GH permissions, K8s RBAC. - Non-root containers, read-only rootfs, dropped capabilities. - Resource requests AND limits on every workload. - Remote Terraform state with lock + encryption. # Operability - Every -weight: 500;">service has liveness, readiness, and a /healthz endpoint. - Every alert fires on symptoms, has a runbook, and is actionable. - Every deployment has a tested rollback.