Tools
Tools: I Built a Bot That Updates My EKS Nodes While I Sleep — Here's How - Analysis
The problem every EKS team hits eventually
The solution in one sentence
Architecture: Three clean phases
Phase 1 — Detection
Phase 2 — AI Analysis + Pull Request
Phase 3 — GitOps Deployment
What the PR actually looks like
CloudFormation: everything in one stack
Prerequisites checklist
Testing it without waiting for an AMI release
Common issues and how to fix them
Why this architecture is worth copying
Cleaning up
What I'd add next
Get the code
Wrapping up TL;DR: Manual EKS AMI updates are slow, risky, and easy to forget. I wired together EventBridge, Lambda, Amazon Bedrock (Claude 3.5 Haiku), GitHub PRs, ArgoCD, and Karpenter into a pipeline that detects new AMIs, runs AI risk analysis, opens a PR for human review, and rolls out nodes automatically — zero downtime, full audit trail. You're running production Kubernetes on AWS. You know you're supposed to keep worker nodes patched. But between sprints, incidents, and everything else — checking for new EKS-optimized AMIs falls through the cracks. When you finally do an update, there's a whole ritual: find the new AMI ID, read through the release notes, assess any CVEs, draft a PR, wait for approvals, then carefully roll out nodes without taking down your workloads. It's not rocket science — it's just slow, manual, and one of those tasks that always feels lower priority than the thing currently on fire. What if the whole thing ran itself? Twice a day, a Lambda checks for new EKS AMIs. If one exists, Bedrock analyzes the risk and opens a GitHub PR. A human reviews it. Merging the PR triggers ArgoCD + Karpenter to roll out the new nodes with zero downtime. The magic is that the only thing a human needs to do is read the AI's analysis and merge (or close) the PR. Everything else — detection, analysis, branch creation, notification, node rollout — is automated. An EventBridge scheduled rule fires at 9 AM and 9 PM UTC every day. It triggers a Lambda that: No new AMI? The Lambda exits quietly. Nothing else happens. This is where it gets interesting. AWS Step Functions orchestrates three Lambda functions in sequence: Lambda 1 — bedrock-analyzer Fetches the real AMI release notes from GitHub (awslabs/amazon-eks-ami) and sends them to Amazon Bedrock running Claude 3.5 Haiku with this prompt: The output is a structured JSON object with a risk score and a ready-to-paste PR description. Lambda 2 — gitops-updater Uses GitHub App credentials (stored in AWS Secrets Manager) to: Lambda 3 — send-notification Fires an SNS email to the team: "New AMI detected, PR #N is open for your review." Includes the PR link and the one-line AI summary. The human's job: Read the AI analysis. Check the YAML diff (it's literally one line — the AMI ID). Merge to approve, close to reject. After the PR is merged: The whole rollout happens without anyone touching kubectl. This is what your team sees in GitHub: Your reviewer doesn't need to dig through release notes. The AI already did it. The whole solution deploys from a single CloudFormation template. Here's what it provisions: Takes about 2–3 minutes. Confirm the SNS subscription email when it arrives. Before deploying, you need: Important: Fork the aws-samples repository to your own account — you need write access to configure the GitHub App. Deploy your EC2NodeClass config to the repo before running the stack. Don't want to wait up to 12 hours for the schedule to fire? Trigger it manually: Check your inbox. You should get an SNS email with the risk analysis and PR link within a couple of minutes. After merging, verify the ArgoCD sync: SNS subscription not confirmed — Check your spam folder. The confirmation email comes from AWS and sometimes gets filtered. GitHub App auth failure — Double-check the App is installed on the correct repository with read/write permissions. Regenerate the private key in GitHub if needed and re-run the CloudFormation update. Bedrock access denied — Go to the Amazon Bedrock console → Model access → enable Claude 3.5 Haiku in your region. This is a manual step that's easy to miss. ArgoCD not syncing — Verify the Application resource has spec.syncPolicy.automated set. Check that the repo URL and path match exactly. Step Functions failures — Check CloudWatch Logs for the failing Lambda. 99% of the time it's an IAM permission issue or a missing secret. A few design decisions I want to highlight: GitHub PRs as the approval interface — Engineers already live in GitHub. Using a PR as the human gate means no new tool to learn, built-in commenting, and a permanent audit trail in Git history. The PR description IS the change record. AI analysis on real release notes — The Bedrock prompt fetches actual release notes from the awslabs/amazon-eks-ami repo. It's not making things up — it's summarizing real content. The risk score is grounded in actual CVE and package data. Karpenter over managed node groups — Karpenter watches the EC2NodeClass for changes and handles the node lifecycle automatically. You don't need to write any drain/cordon scripts. Least-privilege IAM — Each Lambda has its own role with only the permissions it needs. The CF template provisions five separate roles. This matters in production. Guardrails on Bedrock — The solution includes a Bedrock Guardrail for content filtering on the AI output. Belt and suspenders. A few things that would make this even better: Fork the repo, follow the README, and deploy: 👉 GitHub: suryansh639/sample-eks-ami-gitops-pipeline The CloudFormation template, Lambda code, and example Karpenter configs are all there. The goal wasn't to remove humans from the loop — it was to remove the boring part of the loop. The AI reads the release notes. The AI writes the PR description. The human decides. The automation executes. That's the right split. And it means your nodes actually get updated on time, every time, with a full audit trail and no 2 AM surprises. If you try this out, drop a comment — I'd love to hear what customizations you make. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse
EKS AMI Update — ami-04b406d4e6eaca578 **AI Risk Score: 2/10 — APPROVE**
What changed
- Go updated to 1.25.9- Kernel updated to 6.12.79-101.147.amzn2023- No new CVEs introduced
CVE AssessmentNo critical or high-severity CVEs in this update. Two previouslyknown CVEs (CVE-2024-XXXX, CVE-2024-YYYY) are patched.
Review guidanceThis is a routine kernel + runtime update. Low risk. Recommendmerging during business hours with normal monitoring in place. ---*Merge this PR to trigger ArgoCD + Karpenter rollout.*
*Close this PR to skip this AMI version.*
EKS AMI Update — ami-04b406d4e6eaca578 **AI Risk Score: 2/10 — APPROVE**
What changed
- Go updated to 1.25.9- Kernel updated to 6.12.79-101.147.amzn2023- No new CVEs introduced
CVE AssessmentNo critical or high-severity CVEs in this update. Two previouslyknown CVEs (CVE-2024-XXXX, CVE-2024-YYYY) are patched.
Review guidanceThis is a routine kernel + runtime update. Low risk. Recommendmerging during business hours with normal monitoring in place. ---*Merge this PR to trigger ArgoCD + Karpenter rollout.*
*Close this PR to skip this AMI version.*
EKS AMI Update — ami-04b406d4e6eaca578 **AI Risk Score: 2/10 — APPROVE**
What changed
- Go updated to 1.25.9- Kernel updated to 6.12.79-101.147.amzn2023- No new CVEs introduced
CVE AssessmentNo critical or high-severity CVEs in this update. Two previouslyknown CVEs (CVE-2024-XXXX, CVE-2024-YYYY) are patched.
Review guidanceThis is a routine kernel + runtime update. Low risk. Recommendmerging during business hours with normal monitoring in place. ---*Merge this PR to trigger ArgoCD + Karpenter rollout.*
*Close this PR to skip this AMI version.*